From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alexander Graf Subject: Re: KVM guest crashes Date: Mon, 26 Jan 2009 16:53:21 +0100 Message-ID: <497DDC71.3000608@suse.de> References: <4975F26D.707@suse.de> <49762F13.5040507@redhat.com> <4976D954.9070901@suse.de> <4976E54C.4080407@redhat.com> <4976EC92.4010109@redhat.com> <4978D73A.6080500@suse.de> <20090123223644.GA4031@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: Avi Kivity , "kvm@vger.kernel.org" , Joerg Roedel , Sheng Yang To: Marcelo Tosatti Return-path: Received: from ns2.suse.de ([195.135.220.15]:47874 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751212AbZAZPxY (ORCPT ); Mon, 26 Jan 2009 10:53:24 -0500 In-Reply-To: <20090123223644.GA4031@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: > Hi Alexander, > > On Thu, Jan 22, 2009 at 09:29:46PM +0100, Alexander Graf wrote: > > >> Following the discussion on IRC, I tried -no-kvm-irqchip and found some >> virtual machines broken after >1 day of stress testing again: >> >> + sudo -u contain2 env -i qemu-kvm -localtime -kernel virtio-kernel >> -initrd virtio-initrd -nographic -append 'quiet clocksource=acpi_pm >> cifsuser=contain2 cifspass=contain2 root=cifs://contain2:contain2@172.1 >> 6.2.1/contain2 realroot=//172.16.2.1/users/contain2 >> ip=172.16.2.2:172.16.2.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:2 -net >> tap,ifname=tap2,sc >> ript=/bin/true -m 2000 -nographic -smp 4 -no-kvm-irqchip /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> Stuck ?? >> BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 >> IP: [] kfree+0x18b/0x26e >> PGD 0 >> Oops: 0000 [1] SMP >> last sysfs file: >> CPU 2 >> Modules linked in: >> Supported: Yes >> Pid: 0, comm: swapper Tainted: G S 2.6.27.7-9-default #1 >> RIP: 0010:[] [] kfree+0x18b/0x26e >> RSP: 0018:ffff88007a493e90 EFLAGS: 00010046 >> RAX: 0000000000000002 RBX: ffff8800010397f0 RCX: ffff88007a480778 >> RDX: ffffe20000000000 RSI: ffff8800010397f0 RDI: ffff88007a5ae140 >> RBP: 0000000000000000 R08: ffff8800010395d0 R09: ffff88007a493eb8 >> R10: ffffffff80a59980 R11: ffffffff8021c5d9 R12: 0000000000000001 >> R13: ffff88007ac04080 R14: 0000000010200042 R15: ffff88007a5ae140 >> FS: 0000000000000000(0000) GS:ffff88007a461f40(0000) knlGS:0000000000000000 >> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >> CR2: 0000000000000000 CR3: 0000000000201000 CR4: 00000000000006e0 >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >> Process swapper (pid: 0, threadinfo ffff88007a48a000, task ffff88007a488280) >> Stack: ffffffff8023df9c ffffffff8073a108 0000000000000286 ffffffff8024a1eb >> ffffffff80259d80 ffff8800010397f0 0000000000000000 0000000000000001 >> 000000000000000a 0000000010200042 0000000000000010 ffffffff802831d0 >> Call Trace: >> [] __rcu_process_callbacks+0x189/0x203 >> [] rcu_process_callbacks+0x27/0x47 >> [] __do_softirq+0x84/0x115 >> [] call_softirq+0x1c/0x28 >> [] do_softirq+0x3c/0x81 >> [] irq_exit+0x3f/0x83 >> [] smp_apic_timer_interrupt+0x95/0xae >> [] apic_timer_interrupt+0x83/0x90 >> [] native_safe_halt+0x2/0x3 >> [] default_idle+0x38/0x54 >> [] cpu_idle+0xa9/0xf1 >> >> >> Code: 01 00 00 00 e8 4c fa ff ff 48 83 3d a0 19 44 00 00 49 8b 44 dd 08 >> 48 8d 78 40 75 04 0f 0b eb fe e8 e5 cc f6 ff 90 e9 c7 00 00 00 <8b> 55 >> 00 3b 55 04 73 0f 89 d0 4c 89 7c c5 18 8d 42 01 e9 ad 00 >> RIP [] kfree+0x18b/0x26e >> RSP >> CR2: 0000000000000000 >> ---[ end trace 4eaa2a86a8e2da22 ]--- >> >> >> Also after two days of permanent stress testing I also got the Intel >> machine w/ current git down: >> >> + sudo -u contain1 env -i /usr/local/bin/qemu-system-x86_64 -localtime >> -kernel virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain1 cifspass=contain1 >> root=cifs://contain1:contain1@172.16.1.1/contain1 >> realroot=//172.16.1.1/users/contain1 >> ip=172.16.1.2:172.16.1.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:1 -net >> tap,ifname=tap1,script=/bin/true -m 2000 -nographic -smp 8 /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> Stuck ?? >> >> No backtrace here though. That's all I got from the serial console. >> >> The only issues I had with the UP guests so far was this: >> >> + taskset -c 6 sudo -u contain6 env -i qemu-kvm -localtime -kernel >> virtio-kernel -initrd virtio-initrd -nographic -append 'quiet >> clocksource=acpi_pm cifsuser=contain6 cifspass=contain6 >> root=cifs://contain6:contain6@172.16.6.1/contain6 >> realroot=//172.16.6.1/users/contain6 >> ip=172.16.6.2:172.16.6.1::255.255.255.0::eth0:none console=ttyS0 >> dhcp=off builder=1' -net nic,model=virtio,macaddr=52:54:00:12:34:6 -net >> tap,ifname=tap6,script=/bin/true -m 2000 -nographic /dev/null >> qemu: loading initrd (0x1daf359 bytes) at 0x000000007b240000 >> ..MP-BIOS bug: 8254 timer not connected to IO-APIC >> Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with >> apic=debug and send a report. Then try booting with the 'noapic' option. >> >> which can be annoying at times too. Can't we just detect that it's the >> detection and give the guest its interrupts? Or should the PIT >> reinjection thing help here? >> > > There are a number of problems that can result in this error, and the > problems are possibly different between the in-kernel PIT and userspace > PIT emulation (note it also happens with in-kernel PIT, just much more > rarely now). You can use the no_timer_check kernel option to bypass it. > Hm - that option disables the whole check, making it always fail. I haven't seen any way to actually disable the check, telling Linux things are OK :-(. Alex