* KVM guest crashes @ 2009-01-20 15:49 Alexander Graf 2009-01-20 20:07 ` Avi Kivity 0 siblings, 1 reply; 18+ messages in thread From: Alexander Graf @ 2009-01-20 15:49 UTC (permalink / raw) To: kvm@vger.kernel.org; +Cc: Avi Kivity, Marcelo Tosatti, Joerg Roedel, Sheng Yang Hi list, recently I've been hitting some KVM bugs others seem to have reported as well, including - CIFS timeouts - Stuck ?? errors - Random segmentation faults in the guest so I figured, I'll put together a stress test that can be used to reproduce these issues. This is done by using a CIFS mount on the host and unpacking data from that mount to the mount. I have been able to bring kvm down to its knees a lot just by doing this. Simply run the test in an endless-loop. FWIW enabling NPT helps triggering the issue. The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 (2.6.27) kernels. Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 I'm somewhat lost on the reason for these failures, so if you do have some time on your hands, please give me a hand debugging this! If I'd had to guess, I'd say it's either an APIC issue and/or guest memory corruption. Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-20 15:49 KVM guest crashes Alexander Graf @ 2009-01-20 20:07 ` Avi Kivity 2009-01-20 20:20 ` Alexander Graf 2009-01-21 8:14 ` Alexander Graf 0 siblings, 2 replies; 18+ messages in thread From: Avi Kivity @ 2009-01-20 20:07 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Alexander Graf wrote: > Hi list, > > recently I've been hitting some KVM bugs others seem to have reported as > well, including > > - CIFS timeouts > - Stuck ?? errors > - Random segmentation faults in the guest > > so I figured, I'll put together a stress test that can be used to > reproduce these issues. This is done by using a CIFS mount on the host > and unpacking data from that mount to the mount. I have been able to > bring kvm down to its knees a lot just by doing this. > Simply run the test in an endless-loop. FWIW enabling NPT helps > triggering the issue. > > Are the problems specific to AMD? What does "helps triggering" mean - does it happen with NPT disabled? > The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 > (2.6.27) kernels. > > Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 > And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 > > I'm somewhat lost on the reason for these failures, so if you do have > some time on your hands, please give me a hand debugging this! If I'd > had to guess, I'd say it's either an APIC issue and/or guest memory > corruption. > I'd guess memory corruption. Does running a uniprocessor guest help? What about a uniprocessor guest pinned to one host core? -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-20 20:07 ` Avi Kivity @ 2009-01-20 20:20 ` Alexander Graf 2009-01-21 8:14 ` Alexander Graf 1 sibling, 0 replies; 18+ messages in thread From: Alexander Graf @ 2009-01-20 20:20 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang On 20.01.2009, at 21:07, Avi Kivity <avi@redhat.com> wrote: > Alexander Graf wrote: >> Hi list, >> >> recently I've been hitting some KVM bugs others seem to have >> reported as >> well, including >> >> - CIFS timeouts >> - Stuck ?? errors >> - Random segmentation faults in the guest >> >> so I figured, I'll put together a stress test that can be used to >> reproduce these issues. This is done by using a CIFS mount on the >> host >> and unpacking data from that mount to the mount. I have been able to >> bring kvm down to its knees a lot just by doing this. >> Simply run the test in an endless-loop. FWIW enabling NPT helps >> triggering the issue. >> >> > > Are the problems specific to AMD? I don't know, as all machines I tried it on were AMD so far. But judging from user reports on the ml, it happens on Intel too. > What does "helps triggering" mean - does it happen with NPT disabled? It seems like the chances for breakage are higher with NPT enabled. I do see them without as well though. > > >> The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 >> (2.6.27) kernels. >> >> Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 >> And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 >> >> I'm somewhat lost on the reason for these failures, so if you do have >> some time on your hands, please give me a hand debugging this! If I'd >> had to guess, I'd say it's either an APIC issue and/or guest memory >> corruption. >> > > I'd guess memory corruption. > > Does running a uniprocessor guest help? What about a uniprocessor > guest pinned to one host core? I'll try to start tests tomorrow. Alex > > > -- > Do not meddle in the internals of kernels, for they are subtle and > quick to panic. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-20 20:07 ` Avi Kivity 2009-01-20 20:20 ` Alexander Graf @ 2009-01-21 8:14 ` Alexander Graf 2009-01-21 9:05 ` Avi Kivity 1 sibling, 1 reply; 18+ messages in thread From: Alexander Graf @ 2009-01-21 8:14 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Avi Kivity wrote: > Alexander Graf wrote: >> The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 >> (2.6.27) kernels. >> >> Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 >> And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 >> >> I'm somewhat lost on the reason for these failures, so if you do have >> some time on your hands, please give me a hand debugging this! If I'd >> had to guess, I'd say it's either an APIC issue and/or guest memory >> corruption. >> > > I'd guess memory corruption. > > Does running a uniprocessor guest help? What about a uniprocessor > guest pinned to one host core? So last night I started several guests with -smp 8 but without network to see if IO load is causing the problems. All VMs are down, but one panic log is rather new: Stuck ?? Stuck ?? Stuck ?? Stuck ?? Stuck ?? Stuck ?? BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff80237454>] cpu_attach_domain+0x84/0x207 PGD 0 Oops: 0000 [1] SMP last sysfs file: CPU 1 Modules linked in: Supported: Yes Pid: 1, comm: swapper Tainted: G S 2.6.27.11-1-default #1 RIP: 0010:[<ffffffff80237454>] [<ffffffff80237454>] cpu_attach_domain+0x84/0x207 RSP: 0018:ffff88007a419c50 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff880001077a60 RCX: ffff88007a419c40 RDX: 000000000000044d RSI: 0000000000000200 RDI: 0000000000000000 RBP: ffff88007a419c90 R08: 0000000000000000 R09: 0000000000000200 R10: 0000000000000008 R11: 0000000000018600 R12: ffff8800010778d0 R13: ffff880001077a78 R14: ffff8800010775b0 R15: ffff88000107f700 FS: 0000000000000000(0000) GS:ffff88007afeb540(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 1, threadinfo ffff88007a418000, task ffff88007a406040) Stack: 000000047a4616c0 ffff88007a548000 0000002f0000044d 0000000000000004 ffffffff80a275b0 0000000000000000 ffff88007a460e00 ffff88007a45c140 ffff88007a419ec0 ffffffff80238190 ffff88007a419dc0 ffff88007a419e00 Call Trace: [<ffffffff80238190>] __build_sched_domains+0xbb9/0xbf5 [<ffffffff80981ae4>] sched_init_smp+0xa9/0x1d8 [<ffffffff8096b850>] kernel_init+0x74/0xea [<ffffffff8020cf79>] child_rip+0xa/0x11 Code: 00 4c 89 ef 89 45 d4 8b 83 88 00 00 00 89 45 d0 e8 d1 05 13 00 ff c8 74 5d 8b 93 88 00 00 00 f7 c2 8f 02 00 00 74 0d 48 8b 43 10 <48> 3b 00 0f 85 24 01 00 00 80 e2 70 0f 85 1b 01 00 00 eb 37 48 RIP [<ffffffff80237454>] cpu_attach_domain+0x84/0x207 RSP <ffff88007a419c50> CR2: 0000000000000000 ---[ end trace 4eaa2a86a8e2da22 ]--- Kernel panic - not syncing: Attempted to kill init! >From what I've seen it's always related to IPIs, but that's just a guess. I'll start UP testing now. Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-21 8:14 ` Alexander Graf @ 2009-01-21 9:05 ` Avi Kivity 2009-01-21 9:36 ` Avi Kivity 0 siblings, 1 reply; 18+ messages in thread From: Avi Kivity @ 2009-01-21 9:05 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Alexander Graf wrote: > Avi Kivity wrote: > >> Alexander Graf wrote: >> >>> The guest kernels included here are openSUSE 11.0 (2.6.25) and 11.1 >>> (2.6.27) kernels. >>> >>> Find the tests here: http://alex.csgraf.de/kvm-tests.tar.bz2 >>> And some logs here (NPT enabled): http://alex.csgraf.de/kvm-logs.tar.bz2 >>> >>> I'm somewhat lost on the reason for these failures, so if you do have >>> some time on your hands, please give me a hand debugging this! If I'd >>> had to guess, I'd say it's either an APIC issue and/or guest memory >>> corruption. >>> >>> >> I'd guess memory corruption. >> >> Does running a uniprocessor guest help? What about a uniprocessor >> guest pinned to one host core? >> > > So last night I started several guests with -smp 8 but without network > to see if IO load is causing the problems. All VMs are down, but one > panic log is rather new: > > Stuck ?? > Stuck ?? > Stuck ?? > Stuck ?? > Stuck ?? > Stuck ?? > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff80237454>] cpu_attach_domain+0x84/0x207 > This is right on startup, if I read things right. I suggest checking if you have the latest BIOS update applied. I've had bad experiences with un-updated processors. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-21 9:05 ` Avi Kivity @ 2009-01-21 9:36 ` Avi Kivity 2009-01-21 10:44 ` Alexander Graf 2009-01-22 20:29 ` Alexander Graf 0 siblings, 2 replies; 18+ messages in thread From: Avi Kivity @ 2009-01-21 9:36 UTC (permalink / raw) To: Alexander Graf Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Avi Kivity wrote: > > I suggest checking if you have the latest BIOS update applied. I've > had bad experiences with un-updated processors. > FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4 Barcelona host, happily make -j16ing an allmodconfig kernel. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-21 9:36 ` Avi Kivity @ 2009-01-21 10:44 ` Alexander Graf 2009-01-22 20:29 ` Alexander Graf 1 sibling, 0 replies; 18+ messages in thread From: Alexander Graf @ 2009-01-21 10:44 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Avi Kivity wrote: > Avi Kivity wrote: >> >> I suggest checking if you have the latest BIOS update applied. I've >> had bad experiences with un-updated processors. >> > > FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4 > Barcelona host, happily make -j16ing an allmodconfig kernel. Strange. I started the tests again with an updated BIOS now, installing an Intel machine to test on in parallel. old: # ./rdmsr /dev/cpu/0/msr $(( 0x0000008b )) 0x1000065 new: # ./rdmsr /dev/cpu/0/msr $(( 0x0000008b )) 0x1000083 But I already got one guest crashing: int3: 0000 [1] SMP last sysfs file: /sys/kernel/uevent_seqnum CPU 2 Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon Supported: Yes, External Pid: 0, comm: swapper Tainted: G S 2.6.27.7-9-default #1 RIP: 0010:[<ffffffff80a500f1>] [<ffffffff80a500f1>] per_cpu__cpu_state+0x1/0x4 RSP: 0018:ffff88007a493fa8 EFLAGS: 00000083 RAX: ffffffff806f5fa0 RBX: ffffffff80a500f0 RCX: 0000000000000000 RDX: ffff880001033200 RSI: 0000000000000000 RDI: ffffffffff5fc0b0 RBP: ffff88007a48beb0 R08: 0000000000000000 R09: ffff880001039638 R10: 00000000ffffffff R11: ffffffff8021c5d9 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fe3252e4950(0000) GS:ffff88007a461f40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 000000000062d000 CR3: 000000007c10a000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff88007a48a000, task ffff88007a488280) Stack: ffff88007a48beb0 ffffffff8020ca2e ffff88007a48beb0 <EOI> 0000007dd83ce327 0000000000000086 ffff8800010396d0 0000000002625a00 0000000000000002 000000010000eadc 0000007dd83ce327 0000000000000292 0000000000000292 Call Trace: Inexact backtrace: <IRQ> [<ffffffff8020ca2e>] ? ret_from_intr+0x0/0x29 <EOI> [<ffffffff804a6992>] ? notifier_call_chain+0x29/0x4c [<ffffffff80213465>] ? default_idle+0x38/0x54 [<ffffffff8020b34a>] ? cpu_idle+0xa9/0xf1 Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc <cc> cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc RIP [<ffffffff80a500f1>] per_cpu__cpu_state+0x1/0x4 RSP <ffff88007a493fa8> ---[ end trace 17313f34f216af07 ]--- Kernel panic - not syncing: Attempted to kill the idle task! ------------[ cut here ]------------ WARNING: at kernel/smp.c:331 smp_call_function_mask+0x38/0x1f2() Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon Supported: Yes, External Pid: 0, comm: swapper Tainted: G S D 2.6.27.7-9-default #1 Call Trace: [<ffffffff8020e42e>] show_trace_log_lvl+0x41/0x58 [<ffffffff804a1e97>] dump_stack+0x69/0x6f [<ffffffff80240eb2>] warn_on_slowpath+0x51/0x77 [<ffffffff80261fef>] smp_call_function_mask+0x38/0x1f2 [<ffffffff802621d2>] smp_call_function+0x29/0x2e [<ffffffff8021ba16>] native_smp_send_stop+0x1a/0x3f [<ffffffff804a1f59>] panic+0xbc/0x170 [<ffffffff802449e2>] do_exit+0x6b/0x334 [<ffffffff804a4b9b>] oops_begin+0x0/0x9e [<ffffffff804a524a>] do_int3+0x7d/0xa1 [<ffffffff804a46e6>] int3+0xb6/0xf0 [<ffffffff80a500f1>] per_cpu__cpu_state+0x1/0x4 DWARF2 unwinder stuck at per_cpu__cpu_state+0x1/0x4 Leftover inexact backtrace: <IRQ> [<ffffffff8020ca2e>] ret_from_intr+0x0/0x29 <EOI> [<ffffffff804a6992>] notifier_call_chain+0x29/0x4c [<ffffffff80213465>] default_idle+0x38/0x54 [<ffffffff8020b34a>] cpu_idle+0xa9/0xf1 ---[ end trace 17313f34f216af07 ]--- The UP guests seemed to work fine - will start them again now. Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-21 9:36 ` Avi Kivity 2009-01-21 10:44 ` Alexander Graf @ 2009-01-22 20:29 ` Alexander Graf 2009-01-22 20:36 ` Alexander Graf 2009-01-23 22:36 ` Marcelo Tosatti 1 sibling, 2 replies; 18+ messages in thread From: Alexander Graf @ 2009-01-22 20:29 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Avi Kivity wrote: > Avi Kivity wrote: >> >> I suggest checking if you have the latest BIOS update applied. I've >> had bad experiences with un-updated processors. >> > > FWIW, I have an 8-way F9 guest (2.6.27.5-blah) running on an 2x4 > Barcelona host, happily make -j16ing an allmodconfig kernel. > Following the discussion on IRC, I tried -no-kvm-irqchip and found some virtual machines broken after >1 day of stress testing again: + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain2 cifspass=contain2 root=cifs://contain2:contain2@172.1 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net tap,ifname=tap2,sc ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 Stuck ?? Stuck ?? BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 IP: [<ffffffff802b539a>] kfree+0x18b/0x26e PGD 0 Oops: 0000 [1] SMP last sysfs file: CPU 2 Modules linked in: Supported: Yes Pid: 0, comm: swapper Tainted: G S 2.6.27.7-9-default #1 RIP: 0010:[<ffffffff802b539a>] [<ffffffff802b539a>] kfree+0x18b/0x26e RSP: 0018:ffff88007a493e90 EFLAGS: 00010046 RAX: 0000000000000002 RBX: ffff8800010397f0 RCX: ffff88007a480778 RDX: ffffe20000000000 RSI: ffff8800010397f0 RDI: ffff88007a5ae140 RBP: 0000000000000000 R08: ffff8800010395d0 R09: ffff88007a493eb8 R10: ffffffff80a59980 R11: ffffffff8021c5d9 R12: 0000000000000001 R13: ffff88007ac04080 R14: 0000000010200042 R15: ffff88007a5ae140 FS: 0000000000000000(0000) GS:ffff88007a461f40(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff88007a48a000, task ffff88007a488280) Stack: ffffffff8023df9c ffffffff8073a108 0000000000000286 ffffffff8024a1eb ffffffff80259d80 ffff8800010397f0 0000000000000000 0000000000000001 000000000000000a 0000000010200042 0000000000000010 ffffffff802831d0 Call Trace: [<ffffffff802831d0>] __rcu_process_callbacks+0x189/0x203 [<ffffffff80283271>] rcu_process_callbacks+0x27/0x47 [<ffffffff802464ed>] __do_softirq+0x84/0x115 [<ffffffff8020dc9c>] call_softirq+0x1c/0x28 [<ffffffff8020f067>] do_softirq+0x3c/0x81 [<ffffffff80246204>] irq_exit+0x3f/0x83 [<ffffffff8021ce5f>] smp_apic_timer_interrupt+0x95/0xae [<ffffffff8020d4a3>] apic_timer_interrupt+0x83/0x90 [<ffffffff80221f1d>] native_safe_halt+0x2/0x3 [<ffffffff80213465>] default_idle+0x38/0x54 [<ffffffff8020b34a>] cpu_idle+0xa9/0xf1 Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 <8b> 55 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 RIP [<ffffffff802b539a>] kfree+0x18b/0x26e RSP <ffff88007a493e90> CR2: 0000000000000000 ---[ end trace 4eaa2a86a8e2da22 ]--- Also after two days of permanent stress testing I also got the Intel machine w/ current git down: + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:contain1@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 Stuck ?? No backtrace here though. That's all I got from the serial console. The only issues I had with the UP guests so far was this: + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 root=cifs://contain6:contain6@172.16.6.1/contain6 realroot=//172.16.6.1/users/contain6 ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 ..MP-BIOS bug: 8254 timer not connected to IO-APIC Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. which can be annoying at times too. Can't we just detect that it's the detection and give the guest its interrupts? Or should the PIT reinjection thing help here? Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-22 20:29 ` Alexander Graf @ 2009-01-22 20:36 ` Alexander Graf 2009-01-22 20:55 ` Alexander Graf 2009-01-23 22:36 ` Marcelo Tosatti 1 sibling, 1 reply; 18+ messages in thread From: Alexander Graf @ 2009-01-22 20:36 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Alexander Graf wrote: [...] > Also after two days of permanent stress testing I also got the Intel > machine w/ current git down: > > + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime > -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet > clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 > root=cifs://contain1:contain1@172.16.1.1/contain1 > realroot=//172.16.1.1/users/contain1 > ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 > dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net > tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null > qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 > Stuck ?? > > No backtrace here though. That's all I got from the serial console. > + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 root=cifs://contain1:contain1@172.16.1.1/contain1 realroot=//172.16.1.1/users/contain1 ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 Stuck ?? (qemu) info cpus * CPU #0: pc=0xffffffff80221f1d thread_id=15211 CPU #1: pc=0xffffffff80221f1d thread_id=15212 CPU #2: pc=0xffffffff80221f1d thread_id=15213 CPU #3: pc=0xffffffff80221f1d thread_id=15214 CPU #4: pc=0xffffffff8049f7d0 thread_id=15215 CPU #5: pc=0xffffffff80221f1d thread_id=15216 CPU #6: pc=0xffffffff80221f1d thread_id=15217 CPU #7: pc=0x000000000009f02c thread_id=15218 (qemu) cpu 7 (qemu) info registers EAX=00000c06 EBX=000005b8 ECX=00000000 EDX=00000000 ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 EIP=0000002c EFL=00033002 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 0000ffff 0000f300 CS =9f00 0009f000 0000ffff 0000f300 SS =0000 00000000 0000ffff 0000f300 DS =0000 00000000 0000ffff 0000f300 FS =0000 00000000 0000ffff 0000f300 GS =0000 00000000 0000ffff 0000f300 LDT=0000 00000000 0000ffff 00008200 TR =0000 fffbd000 00002088 00008b00 GDT= 00000000 0000ffff IDT= 00000000 0000ffff CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00000000 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Is that guest really seriously in BIOS code? After booting Linux? (qemu) x /2i $pc-1 0x000000000009f02b: hlt 0x000000000009f02c: jmp 0x9f02b Where is this? Looks like panic code to me. Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-22 20:36 ` Alexander Graf @ 2009-01-22 20:55 ` Alexander Graf 2009-01-23 16:36 ` Alexander Graf 0 siblings, 1 reply; 18+ messages in thread From: Alexander Graf @ 2009-01-22 20:55 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Alexander Graf wrote: > Alexander Graf wrote: > > [...] > >> Also after two days of permanent stress testing I also got the Intel >> machine w/ current git down: >> >> + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime >> -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 >> root=cifs://contain1:contain1@172.16.1.1/contain1 >> realroot=//172.16.1.1/users/contain1 >> ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net >> tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> >> No backtrace here though. That's all I got from the serial console. >> >> > > + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime > -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet > clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 > root=cifs://contain1:contain1@172.16.1.1/contain1 > realroot=//172.16.1.1/users/contain1 > ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 > dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net > tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null > qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 > Stuck ?? > > (qemu) info cpus > * CPU #0: pc=0xffffffff80221f1d thread_id=15211 > CPU #1: pc=0xffffffff80221f1d thread_id=15212 > CPU #2: pc=0xffffffff80221f1d thread_id=15213 > CPU #3: pc=0xffffffff80221f1d thread_id=15214 > CPU #4: pc=0xffffffff8049f7d0 thread_id=15215 > CPU #5: pc=0xffffffff80221f1d thread_id=15216 > CPU #6: pc=0xffffffff80221f1d thread_id=15217 > CPU #7: pc=0x000000000009f02c thread_id=15218 > > (qemu) cpu 7 > (qemu) info registers > EAX=00000c06 EBX=000005b8 ECX=00000000 EDX=00000000 > ESI=00000000 EDI=00000000 EBP=00000000 ESP=00000000 > EIP=0000002c EFL=00033002 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0 > ES =0000 00000000 0000ffff 0000f300 > CS =9f00 0009f000 0000ffff 0000f300 > SS =0000 00000000 0000ffff 0000f300 > DS =0000 00000000 0000ffff 0000f300 > FS =0000 00000000 0000ffff 0000f300 > GS =0000 00000000 0000ffff 0000f300 > LDT=0000 00000000 0000ffff 00008200 > TR =0000 fffbd000 00002088 00008b00 > GDT= 00000000 0000ffff > IDT= 00000000 0000ffff > CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 > DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 > DR6=ffff0ff0 DR7=00000400 > FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00000000 > FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 > FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 > FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 > FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 > XMM00=00000000000000000000000000000000 > XMM01=00000000000000000000000000000000 > XMM02=00000000000000000000000000000000 > XMM03=00000000000000000000000000000000 > XMM04=00000000000000000000000000000000 > XMM05=00000000000000000000000000000000 > XMM06=00000000000000000000000000000000 > XMM07=00000000000000000000000000000000 > > Is that guest really seriously in BIOS code? After booting Linux? > > (qemu) x /2i $pc-1 > 0x000000000009f02b: hlt > 0x000000000009f02c: jmp 0x9f02b > > Where is this? Looks like panic code to me. > 0x000000000009f000: cli 0x000000000009f001: xor %ax,%ax 0x000000000009f003: mov %ax,%ds 0x000000000009f005: mov $0x510,%ebx 0x000000000009f00b: addr32 mov (%ebx),%ecx 0x000000000009f00f: test %ecx,%ecx 0x000000000009f012: je 0x9f026 0x000000000009f014: addr32 mov 0x4(%ebx),%eax 0x000000000009f019: addr32 mov 0x8(%ebx),%edx 0x000000000009f01e: wrmsr 0x000000000009f020: add $0xc,%ebx 0x000000000009f024: jmp 0x9f00b 0x000000000009f026: lock incw 1856 0x000000000009f02b: hlt 0x000000000009f02c: jmp 0x9f02b Looks a lot like this: smp_ap_boot_code_start: cli xor %ax, %ax mov %ax, %ds mov $SMP_MSR_ADDR, %ebx 11: mov 0(%ebx), %ecx test %ecx, %ecx jz 12f mov 4(%ebx), %eax mov 8(%ebx), %edx wrmsr add $12, %ebx jmp 11b 12: lock incw smp_cpus 1: hlt jmp 1b But that code shouldn't run after Linux booted, right? And without at least a "Power Off" message I'd expect Linux to still be up. The only thing the host's dmesg was saying is this: Ignoring delivery mode 3 (repeated often) Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-22 20:55 ` Alexander Graf @ 2009-01-23 16:36 ` Alexander Graf 0 siblings, 0 replies; 18+ messages in thread From: Alexander Graf @ 2009-01-23 16:36 UTC (permalink / raw) To: Avi Kivity; +Cc: kvm@vger.kernel.org, Marcelo Tosatti, Joerg Roedel, Sheng Yang Alexander Graf wrote: > Alexander Graf wrote: > >> Alexander Graf wrote: >> >> [...] >> >> >>> Also after two days of permanent stress testing I also got the Intel >>> machine w/ current git down: >>> >>> + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime >>> -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >>> clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 >>> root=cifs://contain1:contain1@172.16.1.1/contain1 >>> realroot=//172.16.1.1/users/contain1 >>> ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 >>> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net >>> tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null >>> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >>> Stuck ?? >>> >>> No backtrace here though. That's all I got from the serial console. >>> >>> >>> >> + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime >> -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 >> root=cifs://contain1:contain1@172.16.1.1/contain1 >> realroot=//172.16.1.1/users/contain1 >> ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net >> tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> [...] In order to provide you with more dumps that might point to some direction (I'm still lost on figuring where to look), here's another AMD NPT guest crash with current git. It somehow looks as if the guest pagetable is corrupted. + sudo -u contain3 env -i /usr/local/bin/qemu-system-x86_64 -localtime -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm cifsuser=con tain3 cifspass=contain3 root=cifs://contain3:contain3@172.16.3.1/contain3 realroot=//172.16.3.1/users/contain3 ip=172.16.3.2:172.16.3.1::255.255.255.0::eth0:none console=tty S0 dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:3 -net tap,ifname=tap3,script=/bin/true -m 2000 -nographic -smp 8 -no-kvm-irqchip /dev/null qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 pci 0000:00:01.0: PIIX3: Enabling Passive Release IP-Config: Device `eth0' not found. doing fast boot Creating device nodes with udev ^MBoot logging started on /dev/ttyS0(/dev/console) at Thu Jan 22 23:05:55 2009^M [NETWORK] using static config based on ip=172.16.3.2:172.16.3.1::255.255.255.0::eth0:none^M Trying manual resume from /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1^M resume device /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1 not found (ignoring)^M Trying manual resume from /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1^M resume device /dev/disk/by-id/ata-ST380815AS_5RW3M74V-part1 not found (ignoring)^M node name not found^M Mounting root //172.16.3.1/contain3^M RTNETLINK answers: File exists^M 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN ^M link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00^M inet 127.0.0.1/8 scope host lo^M 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000^M link/ether 52:54:00:12:34:03 brd ff:ff:ff:ff:ff:ff^M inet 172.16.3.2 peer 172.16.3.1/24 scope global eth0^M BUG: unable to handle kernel paging request at 0000000000100100 IP: [<ffffffff8036a603>] strnlen+0x10/0x19 PGD 7c596067 PUD 7c9ed067 PMD 0 Oops: 0000 [1] SMP last sysfs file: /sys/kernel/uevent_seqnum CPU 7 Modules linked in: nls_utf8 cifs(X) af_packet virtio_net virtio_pci virtio_ring virtio edd ext3 mbcache jbd fan ide_pci_generic ide_core ata_generic sata_nv libata scsi_mod dock thermal processor thermal_sys hwmon Supported: Yes, External Pid: 782, comm: halt Tainted: G S 2.6.27.7-9-default #1 RIP: 0010:[<ffffffff8036a603>] [<ffffffff8036a603>] strnlen+0x10/0x19 RSP: 0018:ffff88007c46da70 EFLAGS: 00010082 RAX: 0000000000100100 RBX: 0000000000000000 RCX: 00000000ffffffff RDX: 0000000000100100 RSI: fffffffffffffffe RDI: 0000000000100100 RBP: ffffffff80ae0fad R08: 00000000ffffffff R09: 0000000000000000 R10: 000000000000000a R11: 0000000000000000 R12: 0000000000100100 R13: 00000000ffffffff R14: ffffffff80ae13a0 R15: 00000000ffffffff FS: 00007f0b2aee06f0(0000) GS:ffff88007a57bf40(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000100100 CR3: 000000007c4e5000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process halt (pid: 782, threadinfo ffff88007c46c000, task ffff88007c17e0c0) Stack: ffffffff8036b39d ffff88007c46ddb8 ffffffff80ae0fad ffffffff805d7e29 0000000000000000 00000000ffffffff ffffffff8036b6f6 00007f0b2ace27e0 ffff88007c595ab0 ffff88007c0624a8 0000000000000400 ffffffff80ae0fa0 Call Trace: [<ffffffff8036b39d>] string+0x34/0x91 [<ffffffff8036b6f6>] vsnprintf+0x2fc/0x574 [<ffffffff8036ba56>] vscnprintf+0x9/0x17 [<ffffffff80241a12>] vprintk+0x12b/0x2df [<ffffffff80240e2f>] warn_slowpath+0x9f/0xd1 [<ffffffff80366da2>] kobject_put+0x2f/0x42 [<ffffffff8024fe90>] kernel_power_off+0xe/0x3b [<ffffffff80250108>] sys_reboot+0xf8/0x179 [<ffffffff8020c37a>] system_call_fastpath+0x16/0x1b [<00007f0b2aa3aa26>] 0x7f0b2aa3aa26 Code: d5 70 80 20 75 eb 48 89 f8 c3 48 89 f8 eb 03 48 ff c0 80 38 00 75 f8 48 29 f8 c3 48 89 f8 eb 03 48 ff c0 48 85 f6 74 08 48 ff ce <80> 38 00 75 f0 48 29 f8 c3 31 c0 eb 12 41 38 c8 74 0a 48 ff c2 RIP [<ffffffff8036a603>] strnlen+0x10/0x19 RSP <ffff88007c46da70> CR2: 0000000000100100 ---[ end trace 1c45144e9c9b5946 ]--- boot/84-builder.sh: line 30: 782 Killed halt -fp^M Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-22 20:29 ` Alexander Graf 2009-01-22 20:36 ` Alexander Graf @ 2009-01-23 22:36 ` Marcelo Tosatti 2009-01-24 7:42 ` Alexander Graf 2009-01-26 15:53 ` Alexander Graf 1 sibling, 2 replies; 18+ messages in thread From: Marcelo Tosatti @ 2009-01-23 22:36 UTC (permalink / raw) To: Alexander Graf; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang Hi Alexander, On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote: > Following the discussion on IRC, I tried -no-kvm-irqchip and found some > virtual machines broken after >1 day of stress testing again: > > + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel > -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm > cifsuser=contain2 cifspass=contain2 root=cifs://contain2:contain2@172.1 > 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 > ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 > dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net > tap,ifname=tap2,sc > ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null > qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 > Stuck ?? > Stuck ?? > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 > IP: [<ffffffff802b539a>] kfree+0x18b/0x26e > PGD 0 > Oops: 0000 [1] SMP > last sysfs file: > CPU 2 > Modules linked in: > Supported: Yes > Pid: 0, comm: swapper Tainted: G S 2.6.27.7-9-default #1 > RIP: 0010:[<ffffffff802b539a>] [<ffffffff802b539a>] kfree+0x18b/0x26e > RSP: 0018:ffff88007a493e90 EFLAGS: 00010046 > RAX: 0000000000000002 RBX: ffff8800010397f0 RCX: ffff88007a480778 > RDX: ffffe20000000000 RSI: ffff8800010397f0 RDI: ffff88007a5ae140 > RBP: 0000000000000000 R08: ffff8800010395d0 R09: ffff88007a493eb8 > R10: ffffffff80a59980 R11: ffffffff8021c5d9 R12: 0000000000000001 > R13: ffff88007ac04080 R14: 0000000010200042 R15: ffff88007a5ae140 > FS: 0000000000000000(0000) GS:ffff88007a461f40(0000) knlGS:0000000000000000 > CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Process swapper (pid: 0, threadinfo ffff88007a48a000, task ffff88007a488280) > Stack: ffffffff8023df9c ffffffff8073a108 0000000000000286 ffffffff8024a1eb > ffffffff80259d80 ffff8800010397f0 0000000000000000 0000000000000001 > 000000000000000a 0000000010200042 0000000000000010 ffffffff802831d0 > Call Trace: > [<ffffffff802831d0>] __rcu_process_callbacks+0x189/0x203 > [<ffffffff80283271>] rcu_process_callbacks+0x27/0x47 > [<ffffffff802464ed>] __do_softirq+0x84/0x115 > [<ffffffff8020dc9c>] call_softirq+0x1c/0x28 > [<ffffffff8020f067>] do_softirq+0x3c/0x81 > [<ffffffff80246204>] irq_exit+0x3f/0x83 > [<ffffffff8021ce5f>] smp_apic_timer_interrupt+0x95/0xae > [<ffffffff8020d4a3>] apic_timer_interrupt+0x83/0x90 > [<ffffffff80221f1d>] native_safe_halt+0x2/0x3 > [<ffffffff80213465>] default_idle+0x38/0x54 > [<ffffffff8020b34a>] cpu_idle+0xa9/0xf1 > > > Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08 > 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 <8b> 55 > 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 > RIP [<ffffffff802b539a>] kfree+0x18b/0x26e > RSP <ffff88007a493e90> > CR2: 0000000000000000 > ---[ end trace 4eaa2a86a8e2da22 ]--- > > > Also after two days of permanent stress testing I also got the Intel > machine w/ current git down: > > + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime > -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet > clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 > root=cifs://contain1:contain1@172.16.1.1/contain1 > realroot=//172.16.1.1/users/contain1 > ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 > dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net > tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null > qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 > Stuck ?? > > No backtrace here though. That's all I got from the serial console. > > The only issues I had with the UP guests so far was this: > > + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel > virtio-kernel -initrd virtio-initrd -nographic -append 'quiet > clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 > root=cifs://contain6:contain6@172.16.6.1/contain6 > realroot=//172.16.6.1/users/contain6 > ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 > dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net > tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null > qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 > ..MP-BIOS bug: 8254 timer not connected to IO-APIC > Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with > apic=debug and send a report. Then try booting with the 'noapic' option. > > which can be annoying at times too. Can't we just detect that it's the > detection and give the guest its interrupts? Or should the PIT > reinjection thing help here? There are a number of problems that can result in this error, and the problems are possibly different between the in-kernel PIT and userspace PIT emulation (note it also happens with in-kernel PIT, just much more rarely now). You can use the no_timer_check kernel option to bypass it. Regarding the corruption problem, I have a few questions: - It is SMP specific (ie both kernel/userspace irqchip fail). - which means UP guests are stable with both kernel/user irqchip. The "Stuck ??" messages seem to be coming from smpboot.c. So for some reason vcpu's are being reset. Don't seem to be a triple fault because in that case all vcpu's would be reset (so yes, the vcpu was really on BIOS code). Suggest the following: - Confirm the problem happens with root on ext3 filesystem (can't you mount the CIFS and copy the data over to a local guest disk to simulate similar load?). - Check that the kernel text is not corrupted. Save the "good" kernel text with QEMU's "pmemsave" or "memsave" (you can see start/end in the symbols _text/_etext, /proc/kallsyms) after booting. After you see the crash, save the "bad" kernel text, compare. This can give additional clues (or not). Also, you mentioned "other reports" previously, can you point to them, please? Thanks ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-23 22:36 ` Marcelo Tosatti @ 2009-01-24 7:42 ` Alexander Graf 2009-01-24 13:06 ` Marcelo Tosatti 2009-01-26 15:53 ` Alexander Graf 1 sibling, 1 reply; 18+ messages in thread From: Alexander Graf @ 2009-01-24 7:42 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang Hi Marcelo, On 23.01.2009, at 23:36, Marcelo Tosatti wrote: > Hi Alexander, > > On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote: > >> Following the discussion on IRC, I tried -no-kvm-irqchip and found >> some >> virtual machines broken after >1 day of stress testing again: >> >> + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel >> -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm >> cifsuser=contain2 cifspass=contain2 root=cifs://contain2:contain2@172.1 >> 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 >> ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 - >> net >> tap,ifname=tap2,sc >> ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> Stuck ?? >> BUG: unable to handle kernel NULL pointer dereference at >> 0000000000000000 >> IP: [<ffffffff802b539a>] kfree+0x18b/0x26e >> PGD 0 >> Oops: 0000 [1] SMP >> last sysfs file: >> CPU 2 >> Modules linked in: >> Supported: Yes >> Pid: 0, comm: swapper Tainted: G S 2.6.27.7-9-default #1 >> RIP: 0010:[<ffffffff802b539a>] [<ffffffff802b539a>] kfree+0x18b/ >> 0x26e >> RSP: 0018:ffff88007a493e90 EFLAGS: 00010046 >> RAX: 0000000000000002 RBX: ffff8800010397f0 RCX: ffff88007a480778 >> RDX: ffffe20000000000 RSI: ffff8800010397f0 RDI: ffff88007a5ae140 >> RBP: 0000000000000000 R08: ffff8800010395d0 R09: ffff88007a493eb8 >> R10: ffffffff80a59980 R11: ffffffff8021c5d9 R12: 0000000000000001 >> R13: ffff88007ac04080 R14: 0000000010200042 R15: ffff88007a5ae140 >> FS: 0000000000000000(0000) GS:ffff88007a461f40(0000) knlGS: >> 0000000000000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff88007a48a000, task >> ffff88007a488280) >> Stack: ffffffff8023df9c ffffffff8073a108 0000000000000286 >> ffffffff8024a1eb >> ffffffff80259d80 ffff8800010397f0 0000000000000000 0000000000000001 >> 000000000000000a 0000000010200042 0000000000000010 ffffffff802831d0 >> Call Trace: >> [<ffffffff802831d0>] __rcu_process_callbacks+0x189/0x203 >> [<ffffffff80283271>] rcu_process_callbacks+0x27/0x47 >> [<ffffffff802464ed>] __do_softirq+0x84/0x115 >> [<ffffffff8020dc9c>] call_softirq+0x1c/0x28 >> [<ffffffff8020f067>] do_softirq+0x3c/0x81 >> [<ffffffff80246204>] irq_exit+0x3f/0x83 >> [<ffffffff8021ce5f>] smp_apic_timer_interrupt+0x95/0xae >> [<ffffffff8020d4a3>] apic_timer_interrupt+0x83/0x90 >> [<ffffffff80221f1d>] native_safe_halt+0x2/0x3 >> [<ffffffff80213465>] default_idle+0x38/0x54 >> [<ffffffff8020b34a>] cpu_idle+0xa9/0xf1 >> >> >> Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 >> dd 08 >> 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 <8b> >> 55 >> 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 >> RIP [<ffffffff802b539a>] kfree+0x18b/0x26e >> RSP <ffff88007a493e90> >> CR2: 0000000000000000 >> ---[ end trace 4eaa2a86a8e2da22 ]--- >> >> >> Also after two days of permanent stress testing I also got the Intel >> machine w/ current git down: >> >> + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 - >> localtime >> -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 >> root=cifs://contain1:contain1@172.16.1.1/contain1 >> realroot=//172.16.1.1/users/contain1 >> ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 - >> net >> tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> >> No backtrace here though. That's all I got from the serial console. >> >> The only issues I had with the UP guests so far was this: >> >> + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel >> virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 >> root=cifs://contain6:contain6@172.16.6.1/contain6 >> realroot=//172.16.6.1/users/contain6 >> ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 - >> net >> tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> ..MP-BIOS bug: 8254 timer not connected to IO-APIC >> Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with >> apic=debug and send a report. Then try booting with the 'noapic' >> option. >> >> which can be annoying at times too. Can't we just detect that it's >> the >> detection and give the guest its interrupts? Or should the PIT >> reinjection thing help here? > > There are a number of problems that can result in this error, and the > problems are possibly different between the in-kernel PIT and > userspace > PIT emulation (note it also happens with in-kernel PIT, just much more > rarely now). You can use the no_timer_check kernel option to bypass > it. Ok :-). Thanks. The logic in the kernel for this is really stupid (basing timing on clock speed). What about disabling the check if we detect KVM? > Regarding the corruption problem, I have a few questions: > > - It is SMP specific (ie both kernel/userspace irqchip fail). > - which means UP guests are stable with both kernel/user > irqchip. I have not been able to reproduce any of my issues with UP. I have to admit that I only tried UP with in-kernel irqchip. > The "Stuck ??" messages seem to be coming from smpboot.c. So for some > reason vcpu's are being reset. Don't seem to be a triple fault because > in that case all vcpu's would be reset (so yes, the vcpu was really on > BIOS code). Hm. I know that OSX turns off CPUs it doesn't need as an alternative to deep-sleep. Does Linux do that too? > Suggest the following: > - Confirm the problem happens with root on ext3 filesystem (can't you > mount the CIFS and copy the data over to a local guest disk to > simulate similar load?). I had Stuck ?? messages without networking, but if it helps I can try that too. In the project we're using this for we do things over cifs, so that's why I built the test case around it. > - Check that the kernel text is not corrupted. Save the "good" kernel > text with QEMU's "pmemsave" or "memsave" (you can see start/end in > the symbols _text/_etext, /proc/kallsyms) after booting. After you > see the crash, save the "bad" kernel text, compare. This can give > additional clues (or not). Good idea - I'll try. > Also, you mentioned "other reports" previously, can you point to them, > please? Yes, will do later. I gotta run now! Thanks for the reply - it's good to know this isn't getting ignored :-). Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-24 7:42 ` Alexander Graf @ 2009-01-24 13:06 ` Marcelo Tosatti 2009-01-24 14:30 ` Alexander Graf 0 siblings, 1 reply; 18+ messages in thread From: Marcelo Tosatti @ 2009-01-24 13:06 UTC (permalink / raw) To: Alexander Graf; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote: >> rarely now). You can use the no_timer_check kernel option to bypass >> it. > > Ok :-). Thanks. The logic in the kernel for this is really stupid > (basing timing on clock speed). What about disabling the check if we > detect KVM? Yes, this is an option. We've talked about it before, but no patch was merged. The RHEL5.3 kernel skips those checks when it detects VMWare or KVM hypervisors. We should understand what is happening to fix the fullvirt/old guest case. For the in-kernel PIT, I believe there is a bug somewhere, either in PIT itself or in the interaction with IOAPIC (failure to inject interrupts for some reason). I started debugging it by constantly reboot'ing an SMP guest but my testbox died. Hope to get back to it soon. >> Regarding the corruption problem, I have a few questions: >> >> - It is SMP specific (ie both kernel/userspace irqchip fail). >> - which means UP guests are stable with both kernel/user >> irqchip. > > I have not been able to reproduce any of my issues with UP. I have to > admit that I only tried UP with in-kernel irqchip. OK. >> The "Stuck ??" messages seem to be coming from smpboot.c. So for some >> reason vcpu's are being reset. Don't seem to be a triple fault because >> in that case all vcpu's would be reset (so yes, the vcpu was really on >> BIOS code). > > Hm. I know that OSX turns off CPUs it doesn't need as an alternative to > deep-sleep. Does Linux do that too? Not that I know of, unless you offline CPU's manually, which does not seem to be the case. >> Suggest the following: >> - Confirm the problem happens with root on ext3 filesystem (can't you >> mount the CIFS and copy the data over to a local guest disk to >> simulate similar load?). > > I had Stuck ?? messages without networking, but if it helps I can try > that too. In the project we're using this for we do things over cifs, so > that's why I built the test case around it. OK. Just trying to decrease the variables involved. I'll setup a machine to run a similar load next week. >> - Check that the kernel text is not corrupted. Save the "good" kernel >> text with QEMU's "pmemsave" or "memsave" (you can see start/end in >> the symbols _text/_etext, /proc/kallsyms) after booting. After you >> see the crash, save the "bad" kernel text, compare. This can give >> additional clues (or not). > > Good idea - I'll try. > >> Also, you mentioned "other reports" previously, can you point to them, >> please? > > Yes, will do later. I gotta run now! Thanks for the reply - it's good to > know this isn't getting ignored :-). Have a good weekend. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-24 13:06 ` Marcelo Tosatti @ 2009-01-24 14:30 ` Alexander Graf 0 siblings, 0 replies; 18+ messages in thread From: Alexander Graf @ 2009-01-24 14:30 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang On 24.01.2009, at 14:06, Marcelo Tosatti wrote: > On Sat, Jan 24, 2009 at 08:42:06AM +0100, Alexander Graf wrote: >>> rarely now). You can use the no_timer_check kernel option to bypass >>> it. >> >> Ok :-). Thanks. The logic in the kernel for this is really stupid >> (basing timing on clock speed). What about disabling the check if we >> detect KVM? > > Yes, this is an option. We've talked about it before, but no patch was > merged. The RHEL5.3 kernel skips those checks when it detects VMWare > or KVM hypervisors. That sounds clever. But I doubt I'll get anything as intrusive into the SLES11 kernel at this point in time :-(. > We should understand what is happening to fix the fullvirt/old guest > case. For the in-kernel PIT, I believe there is a bug somewhere, > either > in PIT itself or in the interaction with IOAPIC (failure to inject > interrupts for some reason). I started debugging it by constantly > reboot'ing an SMP guest but my testbox died. Hope to get back to it > soon. Hm. If I ever get tracing working again, I can try to create one too :-). >>> The "Stuck ??" messages seem to be coming from smpboot.c. So for >>> some >>> reason vcpu's are being reset. Don't seem to be a triple fault >>> because >>> in that case all vcpu's would be reset (so yes, the vcpu was >>> really on >>> BIOS code). >> >> Hm. I know that OSX turns off CPUs it doesn't need as an >> alternative to >> deep-sleep. Does Linux do that too? > > Not that I know of, unless you offline CPU's manually, which does not > seem to be the case. Nope, I don't hotplug anything (though the acpihp module is loaded). >>> Suggest the following: >>> - Confirm the problem happens with root on ext3 filesystem (can't >>> you >>> mount the CIFS and copy the data over to a local guest disk to >>> simulate similar load?). >> >> I had Stuck ?? messages without networking, but if it helps I can try >> that too. In the project we're using this for we do things over >> cifs, so >> that's why I built the test case around it. > > OK. Just trying to decrease the variables involved. I'll setup a > machine > to run a similar load next week. Sounds good :-). I put all the files I tested with online with a link in the first mail of this thread. So feel free to take that as an inspiration. For non-network testing I simply put -net none there, but still had the initrd boot and kill the machine. >>> Also, you mentioned "other reports" previously, can you point to >>> them, >>> please? >> >> Yes, will do later. I gotta run now! Thanks for the reply - it's >> good to >> know this isn't getting ignored :-). > > Have a good weekend. Same to you. I was running for a first-aid course though, not the weekend :-). I was mainly talking here about the thread "Guest Hang Bugs". Though with 2.6.25 guests I did get "BUG: soft lockup - CPU#x stuck for ns!" messages instead of the "Stuck ??" FWIW. Originally I created the whole test case to debug this exact bug we encountered as well: http://article.gmane.org/gmane.comp.emulators.kvm.devel/21828/ Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-23 22:36 ` Marcelo Tosatti 2009-01-24 7:42 ` Alexander Graf @ 2009-01-26 15:53 ` Alexander Graf 2009-01-26 16:21 ` Marcelo Tosatti 1 sibling, 1 reply; 18+ messages in thread From: Alexander Graf @ 2009-01-26 15:53 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang Marcelo Tosatti wrote: > Hi Alexander, > > On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote: > > >> Following the discussion on IRC, I tried -no-kvm-irqchip and found some >> virtual machines broken after >1 day of stress testing again: >> >> + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel >> -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm >> cifsuser=contain2 cifspass=contain2 root=cifs://contain2:contain2@172.1 >> 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 >> ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net >> tap,ifname=tap2,sc >> ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> Stuck ?? >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 >> IP: [<ffffffff802b539a>] kfree+0x18b/0x26e >> PGD 0 >> Oops: 0000 [1] SMP >> last sysfs file: >> CPU 2 >> Modules linked in: >> Supported: Yes >> Pid: 0, comm: swapper Tainted: G S 2.6.27.7-9-default #1 >> RIP: 0010:[<ffffffff802b539a>] [<ffffffff802b539a>] kfree+0x18b/0x26e >> RSP: 0018:ffff88007a493e90 EFLAGS: 00010046 >> RAX: 0000000000000002 RBX: ffff8800010397f0 RCX: ffff88007a480778 >> RDX: ffffe20000000000 RSI: ffff8800010397f0 RDI: ffff88007a5ae140 >> RBP: 0000000000000000 R08: ffff8800010395d0 R09: ffff88007a493eb8 >> R10: ffffffff80a59980 R11: ffffffff8021c5d9 R12: 0000000000000001 >> R13: ffff88007ac04080 R14: 0000000010200042 R15: ffff88007a5ae140 >> FS: 0000000000000000(0000) GS:ffff88007a461f40(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff88007a48a000, task ffff88007a488280) >> Stack: ffffffff8023df9c ffffffff8073a108 0000000000000286 ffffffff8024a1eb >> ffffffff80259d80 ffff8800010397f0 0000000000000000 0000000000000001 >> 000000000000000a 0000000010200042 0000000000000010 ffffffff802831d0 >> Call Trace: >> [<ffffffff802831d0>] __rcu_process_callbacks+0x189/0x203 >> [<ffffffff80283271>] rcu_process_callbacks+0x27/0x47 >> [<ffffffff802464ed>] __do_softirq+0x84/0x115 >> [<ffffffff8020dc9c>] call_softirq+0x1c/0x28 >> [<ffffffff8020f067>] do_softirq+0x3c/0x81 >> [<ffffffff80246204>] irq_exit+0x3f/0x83 >> [<ffffffff8021ce5f>] smp_apic_timer_interrupt+0x95/0xae >> [<ffffffff8020d4a3>] apic_timer_interrupt+0x83/0x90 >> [<ffffffff80221f1d>] native_safe_halt+0x2/0x3 >> [<ffffffff80213465>] default_idle+0x38/0x54 >> [<ffffffff8020b34a>] cpu_idle+0xa9/0xf1 >> >> >> Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08 >> 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 <8b> 55 >> 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 >> RIP [<ffffffff802b539a>] kfree+0x18b/0x26e >> RSP <ffff88007a493e90> >> CR2: 0000000000000000 >> ---[ end trace 4eaa2a86a8e2da22 ]--- >> >> >> Also after two days of permanent stress testing I also got the Intel >> machine w/ current git down: >> >> + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime >> -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 >> root=cifs://contain1:contain1@172.16.1.1/contain1 >> realroot=//172.16.1.1/users/contain1 >> ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net >> tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> >> No backtrace here though. That's all I got from the serial console. >> >> The only issues I had with the UP guests so far was this: >> >> + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel >> virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 >> root=cifs://contain6:contain6@172.16.6.1/contain6 >> realroot=//172.16.6.1/users/contain6 >> ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net >> tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> ..MP-BIOS bug: 8254 timer not connected to IO-APIC >> Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with >> apic=debug and send a report. Then try booting with the 'noapic' option. >> >> which can be annoying at times too. Can't we just detect that it's the >> detection and give the guest its interrupts? Or should the PIT >> reinjection thing help here? >> > > There are a number of problems that can result in this error, and the > problems are possibly different between the in-kernel PIT and userspace > PIT emulation (note it also happens with in-kernel PIT, just much more > rarely now). You can use the no_timer_check kernel option to bypass it. > Hm - that option disables the whole check, making it always fail. I haven't seen any way to actually disable the check, telling Linux things are OK :-(. Alex ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-26 15:53 ` Alexander Graf @ 2009-01-26 16:21 ` Marcelo Tosatti 2009-01-26 16:33 ` Alexander Graf 0 siblings, 1 reply; 18+ messages in thread From: Marcelo Tosatti @ 2009-01-26 16:21 UTC (permalink / raw) To: Alexander Graf; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang On Mon, Jan 26, 2009 at 04:53:21PM +0100, Alexander Graf wrote: > > There are a number of problems that can result in this error, and the > > problems are possibly different between the in-kernel PIT and userspace > > PIT emulation (note it also happens with in-kernel PIT, just much more > > rarely now). You can use the no_timer_check kernel option to bypass it. > > > > Hm - that option disables the whole check, making it always fail. I > haven't seen any way to actually disable the check, telling Linux things > are OK :-(. Hum, the option makes timer_irq_works always return true. Works for me with in-kernel PIT. What you see with "apic=debug no_timer_check" ? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: KVM guest crashes 2009-01-26 16:21 ` Marcelo Tosatti @ 2009-01-26 16:33 ` Alexander Graf 0 siblings, 0 replies; 18+ messages in thread From: Alexander Graf @ 2009-01-26 16:33 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Avi Kivity, kvm@vger.kernel.org, Joerg Roedel, Sheng Yang Marcelo Tosatti wrote: > On Mon, Jan 26, 2009 at 04:53:21PM +0100, Alexander Graf wrote: > >>> There are a number of problems that can result in this error, and the >>> problems are possibly different between the in-kernel PIT and userspace >>> PIT emulation (note it also happens with in-kernel PIT, just much more >>> rarely now). You can use the no_timer_check kernel option to bypass it. >>> >>> >> Hm - that option disables the whole check, making it always fail. I >> haven't seen any way to actually disable the check, telling Linux things >> are OK :-(. >> > > Hum, the option makes timer_irq_works always return true. Works for me > with in-kernel PIT. > > What you see with "apic=debug no_timer_check" ? > It does work with "noapic" for me, but that means I'm using the old PIC (which isn't necessarily bad, right?). So I can at least work around the issue for us now. It still needs to be fixed nevertheless. with "apic=debug no_apic_timer" 2.6.27 does: Setting APIC routing to flat ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... ..... (found apic 0 pin 0) ... ....... works. while 2.6.25 does: ..MP-BIOS bug: 8254 timer not connected to IO-APIC Kernel panic - not syncing: IO-APIC + timer doesn't work! Try using the 'noapic' kernel parameter ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2009-01-26 16:33 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-01-20 15:49 KVM guest crashes Alexander Graf 2009-01-20 20:07 ` Avi Kivity 2009-01-20 20:20 ` Alexander Graf 2009-01-21 8:14 ` Alexander Graf 2009-01-21 9:05 ` Avi Kivity 2009-01-21 9:36 ` Avi Kivity 2009-01-21 10:44 ` Alexander Graf 2009-01-22 20:29 ` Alexander Graf 2009-01-22 20:36 ` Alexander Graf 2009-01-22 20:55 ` Alexander Graf 2009-01-23 16:36 ` Alexander Graf 2009-01-23 22:36 ` Marcelo Tosatti 2009-01-24 7:42 ` Alexander Graf 2009-01-24 13:06 ` Marcelo Tosatti 2009-01-24 14:30 ` Alexander Graf 2009-01-26 15:53 ` Alexander Graf 2009-01-26 16:21 ` Marcelo Tosatti 2009-01-26 16:33 ` Alexander Graf
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox