From mboxrd@z Thu Jan 1 00:00:00 1970 From: Philipp Hahn Subject: [BUG] VM stuck in interrupt-loop after suspend to/resumed from file, or no interrupts at all Date: Wed, 12 Jan 2011 15:51:13 +0100 Message-ID: <201101121551.19083.hahn@univention.de> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart15421724.pbMm4IsnYZ"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit To: kvm@vger.kernel.org Return-path: Received: from mail.univention.de ([82.198.197.8]:2196 "EHLO mail.univention.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753821Ab1ALO72 (ORCPT ); Wed, 12 Jan 2011 09:59:28 -0500 Received: from localhost (localhost [127.0.0.1]) by slugis.knut.univention.de (Postfix) with ESMTP id 60027673027 for ; Wed, 12 Jan 2011 15:51:21 +0100 (CET) Received: from localhost (localhost [127.0.0.1]) by slugis.knut.univention.de (Postfix) with ESMTP id 5708E673028 for ; Wed, 12 Jan 2011 15:51:21 +0100 (CET) Received: from mail.univention.de ([127.0.0.1]) by localhost (slugis.knut.univention.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SorODy4FCRKg for ; Wed, 12 Jan 2011 15:51:20 +0100 (CET) Received: from stave.knut.univention.de (stave.knut.univention.de [192.168.0.191]) by slugis.knut.univention.de (Postfix) with ESMTPSA id 4A6E3673027 for ; Wed, 12 Jan 2011 15:51:20 +0100 (CET) Sender: kvm-owner@vger.kernel.org List-ID: --nextPart15421724.pbMm4IsnYZ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hello, libvirt implements a manages save, which suspens a VM to a file, from which= it=20 can be resumed later. This uses Qemus/Kvms "migrate exec:" feature. This doesn't work reliable for me: In may cases the resumed VM seems to be= =20 stuck: its VNC console is restored, but no key presses or network packages= =20 are accepted. This both happens with Windows XP, 7, 2008 and Linux 2.6.32=20 systems. Using the debugging cycle described below in more detail I was able to trac= k=20 the problem down to interrupt handling: Either the Linux-guest-kernel=20 constantly receives an interrupt for the 8139cp network adapter, or no=20 interrupts at all (neither network nor keyboard nor timer); only sending a= =20 NMI works and shows that at least the Linux-Kernel is still alive. If I add the -no-kvm-irqchip Option, it seems to work; I was not able to=20 reproduce a hang. * What cpu model (examples: Intel Core Duo, Intel Core 2 Duo, AMD Opter= on=20 2210). See /proc/cpuinfo if you're not sure. Intel(R) Core(TM)2 Duo CPU L9400 @ 1.86GHz AMD Athlon(tm) II X2 250 Processor * What kvm version you are using. If you're using git directly, provide= =20 the output of 'git describe'.=20 qemu-kvm_0.12.4+dfsg-1~bpo50+1.2.201007160916 qemu-kvm_0.13.0+dfsg-2 =B1 * The host kernel version=20 linux-2.6.32.23 linux-2.6.37 * What host kernel arch you are using (i386 or x86_64) both i386(Intel) and x64_64(AMD) * What guest you are using, including OS type (Linux, Windows, Solaris,= =20 etc.), bitness (32 or 64), kernel version=20 Linux 2.6.32 i686 Windows 2003 R2 i686 7_Ultimate amd64 XP_Professional_SP3 i686 * The qemu command line you are using to start the guest=20 see below * Whether the problem goes away if using the -no-kvm-irqchip=20 or -no-kvm-pit switch.=20 Yes, with -no-kvm-irqchip I was not able to reproduce the problem. * Whether the problem also appears with the -no-kvm switch.=20 Did not test. Not knowing much on how kvm/qemu internally works I would guess that the st= ate=20 of the PIC is not stored or restored right, since I would either get an=20 interrupt storm or no interrupts anymore. Is there anything more I can do to diagnose the problem? There are similar reports, which might be related: Basically I was doing the following cycle: DEV=3Dtap0 sudo /usr/sbin/openvpn --mktun --dev "$DEV" --user "$USER" sudo /etc/kvm/kvm-ifup "$DEV" while true; do /usr/bin/kvm \ -d int \ -gdb tcp::1234 \ -M pc-0.12 \ -enable-kvm \ -m 512 \ -smp 1,sockets=3D1,cores=3D1,threads=3D1 \ -name ucs-fv-qcow \ -uuid 7a373b2a-89c1-6dfb-5fa5-c438230ebde1 \=20 -nodefaults \ -chardev stdio,id=3Dmonitor \ -mon chardev=3Dmonitor,mode=3Dreadline \ -rtc base=3Dutc \ -boot cd \ -drive=20 file=3D/var/lib/libvirt/images/ucs-fv-qcow.qcow2,if=3Dnone,id=3Ddrive-ide0-= 0-0,boot=3Don,format=3Dqcow2=20 \ -device ide-drive,bus=3Dide.0,unit=3D0,drive=3Ddrive-ide0-0-0,id=3D= ide0-0-0 \ -drive=20 file=3D/var/lib/libvirt/images/ucs_2.4-0-100829-dvd-i386.iso,if=3Dnone,medi= a=3Dcdrom,id=3Ddrive-ide0-1-0,readonly=3Don,format=3Draw=20 \ -device ide-drive,bus=3Dide.1,unit=3D0,drive=3Ddrive-ide0-1-0,id=3D= ide0-1-0 \ -device=20 rtl8139,vlan=3D0,id=3Dnet0,mac=3D52:54:00:68:f3:25,bus=3Dpci.0,addr=3D0x3 \ -net tap,vlan=3D0,name=3Dhostnet0,ifname=3D"$DEV",script=3Dno \ -usb \ -sdl \ -k de \ -vga cirrus \ -incoming exec:"dd if=3D/var/lib/libvirt/qemu/save/ucs-fv-qcow.save= =20 bs=3D4K skip=3D0 status=3Dnoxfer" \ -device virtio-balloon-pci,id=3Dballoon0,bus=3Dpci.0,addr=3D0x6 <<_= _QEMU__ # set_link hostnet0 down migrate_set_speed 4095M migrate "exec:dd of=3D/var/lib/libvirt/qemu/save/ucs-fv-qcow.save bs=3D1M" quit __QEMU__ In a second console I used a remote-gdb to investigate the VM: gdb --eval-command=3D"target remote :1234" --eval-command=3D"display/i=20 \$pc" --eval-command=3D"break *0xe0cf2102" (the address is that of cp_interrupts()) To resolve the adresses I used a clone of the instance and resolved the=20 symbols manually using /proc/kallsyms. More info (in German) is in out bugtracker at=20 . BYtE Philipp =2D-=20 Philipp Hahn Open Source Software Engineer hahn@univention.d= e =20 Univention GmbH Linux for Your Business fon: +49 421 22 232- 0 Mary-Somerville-Str.1 28359 Bremen fax: +49 421 22 232-99 http://www.univention.d= e/ --nextPart15421724.pbMm4IsnYZ Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEABECAAYFAk0tv+IACgkQYPlgoZpUDjno3wCgskGtCx+QGPuXi+OKxUektVvB tswAnibYAbjm5WDusUl2yLpRmTQL5K2m =ed8B -----END PGP SIGNATURE----- --nextPart15421724.pbMm4IsnYZ--