From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Stevens Subject: stability issue with KVM using SMP Date: Tue, 16 Sep 2008 09:25:29 +0100 Message-ID: <48CF6D79.6090607@communitydns.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit To: kvm@vger.kernel.org Return-path: Received: from mk-outboundfilter-6-a-2.mail.uk.tiscali.com ([212.74.114.16]:62742 "EHLO mk-outboundfilter-6-a-2.mail.uk.tiscali.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752040AbYIPIZj (ORCPT ); Tue, 16 Sep 2008 04:25:39 -0400 Received: from [192.168.250.154] (Fujitsu.james [192.168.250.154]) by home.jrcs.co.uk (8.13.6/8.12.9) with ESMTP id m8G8PTiE007904 for ; Tue, 16 Sep 2008 09:25:30 +0100 Sender: kvm-owner@vger.kernel.org List-ID: Summary - I'm getting stability issues running SMP. The guest will run for, typically, 6 to 12 hours before causing a problem. The first time it happened the guest process that was running SMP went Zombie, but in a spin loop clocking huge amount of CPU time, and I had to reboot the host to clear it. Needless to say I had no access to the guest's qmeu console (Ctrl-Alt-2) and a "kill -9 " from the host had no effect. However, the kernel had been compiled with a 64Gb memory model and I've had problems with that in the past - in fact that's the first time I've seen a kernel compiled with the 64Gb memory model actually boot under KVM. So I replaced the kernel and the next time the guest simply locked up - I couldn't access the guest o/s at all (no "ping", no linux console response etc - in some cases there was a picture on the console [login prompt] but it didn't respond to any key presses), but the qemu console still worked. I tried a "system_powerdown" and that failed so I did a "system_reset" and that rebooted the client. I have now switched off SMP and the guest is working fine (just SLOW). I've had quite a few non-SMP linux guests running on this host for some time, using various kernels, and not seen a problem. Its only since I tried to introduce SMP that its all gone pear shaped. > # what cpu model (examples: Intel Core Duo, Intel Core 2 Duo, AMD > Opteron 2210). See /proc/cpuinfo if you're not sure. dual "Quad-Core AMD Opteron(tm) Processor 2352" with 32Gb RAM > # what kvm version you are using. If you're using git directly, > provide the output of 'git describe'. "kvm-72" > # the host kernel version "2.6.26.2" running on Slackware 12 > # what host kernel arch you are using (i386 or x86_64) # CONFIG_64BIT is not set CONFIG_X86_32=y # CONFIG_X86_64 is not set CONFIG_X86=y Should I be using 64 bit ? > # what guest you are using, including OS type (Linux, Windows, > Solaris, etc.), bitness (32 or 64), kernel version Slackware 11, but with a new standard (no patches) kernel - 2.6.26.2 I also have an older Slackware 7 based guest with a 2.4.29 SMP kernel that sees the "lock-up, with qemu console working" problem, but not the Zombie issue. The 2.4.29 kernel uses a 1Gb memory model. > # the qemu command line you are using to start the guest /usr/local/bin/qemu-system-x86_64 -hda /opt/kvm/machine_14/vdisk1.img \ -m 1024 -vnc :14 -k en-gb -smp 4 \ -net nic,model=e1000,macaddr=52:54:00:14:00:00 -net tap \ -net nic,vlan=2,model=e1000,macaddr=52:54:00:14:00:01 \ -net tap,vlan=2,ifname=tap55 \ > # whether the problem goes away if using the -no-kvm-irqchip or > -no-kvm-pit switch. Not tried > # whether the problem also appears with the -no-kvm switch. I don't use this switch. James