From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mikhail Sennikovskii Subject: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration Date: Tue, 27 Jan 2015 14:55:28 +0100 Message-ID: <54C798D0.6010600@profitbricks.com> References: <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com> <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com> <54C254AF.7010101@profitbricks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit To: kvm@vger.kernel.org Return-path: Received: from mail-we0-f177.google.com ([74.125.82.177]:34342 "EHLO mail-we0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752159AbbA0Nze (ORCPT ); Tue, 27 Jan 2015 08:55:34 -0500 Received: by mail-we0-f177.google.com with SMTP id l61so14994662wev.8 for ; Tue, 27 Jan 2015 05:55:33 -0800 (PST) Received: from [192.168.49.165] ([62.217.45.26]) by mx.google.com with ESMTPSA id j1sm1922915wjw.25.2015.01.27.05.55.32 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 27 Jan 2015 05:55:32 -0800 (PST) In-Reply-To: <54C254AF.7010101@profitbricks.com> Sender: kvm-owner@vger.kernel.org List-ID: Hi all, I've posted the bolow mail to the qemu-dev mailing list, but I've got no response there. That's why I decided to re-post it here as well, and besides that I think this could be a kvm-specific issue as well. Some additional thing to note: I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as well. I would typically use a max_downtime adjusted to 1 second instead of default 30 ms. I also noticed that the issue happens much more rarelly if I increase the migration bandwidth, i.e. like diff --git a/migration.c b/migration.c index 26f4b65..d2e3b39 100644 --- a/migration.c +++ b/migration.c @@ -36,7 +36,7 @@ enum { MIG_STATE_COMPLETED, }; -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ Like I said below, I would be glad to provide you with any additional information. Thanks, Mikhail On 23.01.2015 15:03, Mikhail Sennikovskii wrote: > Hi all, > > I'm running a slitely modified migration over tcp test in virt-test, > which does a migration from one "smp=2" VM to another on the same host > over TCP, > and exposes some dummy CPU load inside the GUEST while migration, and > after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT > BSOD inside the guest, > which happens when > " > An expected clock interrupt was not received on a secondary processor > in an > MP system within the allocated interval. This indicates that the > specified > processor is hung and not processing interrupts. > " > > This seems to happen with any qemu version I've tested (1.2 and above, > including upstream), > and I was testing it with 3.13.0-44-generic kernel on my Ubuntu > 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian > 6 with SMP6 host. > > One thing I noticed is that exposing a dummy CPU load on the HOST > (like running multiple instances of the "while true; do false; done" > script) in parallel with doing migration makes the issue to be quite > easily reproducible. > > > Looking inside the windows crash dump, the second CPU is just running > at IRQL 0, and it aparently not hung, as Windows is able to save its > state in the crash dump correctly, which assumes running some code on it. > So this aparently seems to be some timing issue (like host scheduler > does not schedule the thread executing secondary CPU's code in time). > > Could you give me some insight on this, i.e. is there a way to > customize QEMU/KVM to avoid such issue? > > If you think this might be a qemu/kvm issue, I can provide you any > info, like windows crash dumps, or the test-case to reproduce this. > > > qemu is started as: > > from-VM: > > qemu-system-x86_64 \ > -S \ > -name 'virt-tests-vm1' \ > -sandbox off \ > -M pc-1.0 \ > -nodefaults \ > -vga std \ > -chardev > socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait > \ > -mon chardev=qmp_id_qmp1,mode=control \ > -chardev > socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait > \ > -device isa-serial,chardev=serial_id_serial0 \ > -chardev > socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait > \ > -device > isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 > \ > -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ > -device > virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 > \ > -device > virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 > \ > -netdev > user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ > -m 2G \ > -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ > -cpu phenom \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -vnc :0 \ > -rtc base=localtime,clock=host,driftfix=none \ > -boot order=cdn,once=c,menu=off \ > -enable-kvm > > to-VM: > > qemu-system-x86_64 \ > -S \ > -name 'virt-tests-vm1' \ > -sandbox off \ > -M pc-1.0 \ > -nodefaults \ > -vga std \ > -chardev > socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait > \ > -mon chardev=qmp_id_qmp1,mode=control \ > -chardev > socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait > \ > -device isa-serial,chardev=serial_id_serial0 \ > -chardev > socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait > \ > -device > isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 > \ > -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ > -device > virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 > \ > -device > virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 > \ > -netdev > user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ > -m 2G \ > -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ > -cpu phenom \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -vnc :1 \ > -rtc base=localtime,clock=host,driftfix=none \ > -boot order=cdn,once=c,menu=off \ > -enable-kvm \ > -incoming tcp:0:5200 > > > Thanks, > Mikhail