* Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration [not found] ` <54C254AF.7010101@profitbricks.com> @ 2015-01-27 13:55 ` Mikhail Sennikovskii 2015-01-27 19:09 ` Jidong Xiao 0 siblings, 1 reply; 5+ messages in thread From: Mikhail Sennikovskii @ 2015-01-27 13:55 UTC (permalink / raw) To: kvm Hi all, I've posted the bolow mail to the qemu-dev mailing list, but I've got no response there. That's why I decided to re-post it here as well, and besides that I think this could be a kvm-specific issue as well. Some additional thing to note: I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as well. I would typically use a max_downtime adjusted to 1 second instead of default 30 ms. I also noticed that the issue happens much more rarelly if I increase the migration bandwidth, i.e. like diff --git a/migration.c b/migration.c index 26f4b65..d2e3b39 100644 --- a/migration.c +++ b/migration.c @@ -36,7 +36,7 @@ enum { MIG_STATE_COMPLETED, }; -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ Like I said below, I would be glad to provide you with any additional information. Thanks, Mikhail On 23.01.2015 15:03, Mikhail Sennikovskii wrote: > Hi all, > > I'm running a slitely modified migration over tcp test in virt-test, > which does a migration from one "smp=2" VM to another on the same host > over TCP, > and exposes some dummy CPU load inside the GUEST while migration, and > after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT > BSOD inside the guest, > which happens when > " > An expected clock interrupt was not received on a secondary processor > in an > MP system within the allocated interval. This indicates that the > specified > processor is hung and not processing interrupts. > " > > This seems to happen with any qemu version I've tested (1.2 and above, > including upstream), > and I was testing it with 3.13.0-44-generic kernel on my Ubuntu > 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian > 6 with SMP6 host. > > One thing I noticed is that exposing a dummy CPU load on the HOST > (like running multiple instances of the "while true; do false; done" > script) in parallel with doing migration makes the issue to be quite > easily reproducible. > > > Looking inside the windows crash dump, the second CPU is just running > at IRQL 0, and it aparently not hung, as Windows is able to save its > state in the crash dump correctly, which assumes running some code on it. > So this aparently seems to be some timing issue (like host scheduler > does not schedule the thread executing secondary CPU's code in time). > > Could you give me some insight on this, i.e. is there a way to > customize QEMU/KVM to avoid such issue? > > If you think this might be a qemu/kvm issue, I can provide you any > info, like windows crash dumps, or the test-case to reproduce this. > > > qemu is started as: > > from-VM: > > qemu-system-x86_64 \ > -S \ > -name 'virt-tests-vm1' \ > -sandbox off \ > -M pc-1.0 \ > -nodefaults \ > -vga std \ > -chardev > socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait > \ > -mon chardev=qmp_id_qmp1,mode=control \ > -chardev > socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait > \ > -device isa-serial,chardev=serial_id_serial0 \ > -chardev > socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait > \ > -device > isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 > \ > -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ > -device > virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 > \ > -device > virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 > \ > -netdev > user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ > -m 2G \ > -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ > -cpu phenom \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -vnc :0 \ > -rtc base=localtime,clock=host,driftfix=none \ > -boot order=cdn,once=c,menu=off \ > -enable-kvm > > to-VM: > > qemu-system-x86_64 \ > -S \ > -name 'virt-tests-vm1' \ > -sandbox off \ > -M pc-1.0 \ > -nodefaults \ > -vga std \ > -chardev > socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait > \ > -mon chardev=qmp_id_qmp1,mode=control \ > -chardev > socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait > \ > -device isa-serial,chardev=serial_id_serial0 \ > -chardev > socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait > \ > -device > isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 > \ > -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ > -device > virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 > \ > -device > virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 > \ > -netdev > user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ > -m 2G \ > -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ > -cpu phenom \ > -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > -vnc :1 \ > -rtc base=localtime,clock=host,driftfix=none \ > -boot order=cdn,once=c,menu=off \ > -enable-kvm \ > -incoming tcp:0:5200 > > > Thanks, > Mikhail ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration 2015-01-27 13:55 ` Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration Mikhail Sennikovskii @ 2015-01-27 19:09 ` Jidong Xiao 2015-01-28 6:42 ` Zhang Haoyu 2015-01-29 7:57 ` Mikhail Sennikovskii 0 siblings, 2 replies; 5+ messages in thread From: Jidong Xiao @ 2015-01-27 19:09 UTC (permalink / raw) To: Mikhail Sennikovskii; +Cc: KVM On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii <mikhail.sennikovskii@profitbricks.com> wrote: > Hi all, > > I've posted the bolow mail to the qemu-dev mailing list, but I've got no > response there. > That's why I decided to re-post it here as well, and besides that I think > this could be a kvm-specific issue as well. > > Some additional thing to note: > I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as > well. > I would typically use a max_downtime adjusted to 1 second instead of default > 30 ms. > I also noticed that the issue happens much more rarelly if I increase the > migration bandwidth, i.e. like > > diff --git a/migration.c b/migration.c > index 26f4b65..d2e3b39 100644 > --- a/migration.c > +++ b/migration.c > @@ -36,7 +36,7 @@ enum { > MIG_STATE_COMPLETED, > }; > > -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ > +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ > > Like I said below, I would be glad to provide you with any additional > information. > > Thanks, > Mikhail > Hi, Mikhail, So if you choose to use one vcpu, instead of smp, this issue would not happen, right? -Jidong > On 23.01.2015 15:03, Mikhail Sennikovskii wrote: >> >> Hi all, >> >> I'm running a slitely modified migration over tcp test in virt-test, which >> does a migration from one "smp=2" VM to another on the same host over TCP, >> and exposes some dummy CPU load inside the GUEST while migration, and >> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD >> inside the guest, >> which happens when >> " >> An expected clock interrupt was not received on a secondary processor in >> an >> MP system within the allocated interval. This indicates that the specified >> processor is hung and not processing interrupts. >> " >> >> This seems to happen with any qemu version I've tested (1.2 and above, >> including upstream), >> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 >> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 >> host. >> >> One thing I noticed is that exposing a dummy CPU load on the HOST (like >> running multiple instances of the "while true; do false; done" script) in >> parallel with doing migration makes the issue to be quite easily >> reproducible. >> >> >> Looking inside the windows crash dump, the second CPU is just running at >> IRQL 0, and it aparently not hung, as Windows is able to save its state in >> the crash dump correctly, which assumes running some code on it. >> So this aparently seems to be some timing issue (like host scheduler does >> not schedule the thread executing secondary CPU's code in time). >> >> Could you give me some insight on this, i.e. is there a way to customize >> QEMU/KVM to avoid such issue? >> >> If you think this might be a qemu/kvm issue, I can provide you any info, >> like windows crash dumps, or the test-case to reproduce this. >> >> >> qemu is started as: >> >> from-VM: >> >> qemu-system-x86_64 \ >> -S \ >> -name 'virt-tests-vm1' \ >> -sandbox off \ >> -M pc-1.0 \ >> -nodefaults \ >> -vga std \ >> -chardev >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait >> \ >> -mon chardev=qmp_id_qmp1,mode=control \ >> -chardev >> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait >> \ >> -device isa-serial,chardev=serial_id_serial0 \ >> -chardev >> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait >> \ >> -device >> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ >> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >> -device >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >> -device >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 >> \ >> -netdev >> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ >> -m 2G \ >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >> -cpu phenom \ >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >> -vnc :0 \ >> -rtc base=localtime,clock=host,driftfix=none \ >> -boot order=cdn,once=c,menu=off \ >> -enable-kvm >> >> to-VM: >> >> qemu-system-x86_64 \ >> -S \ >> -name 'virt-tests-vm1' \ >> -sandbox off \ >> -M pc-1.0 \ >> -nodefaults \ >> -vga std \ >> -chardev >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait >> \ >> -mon chardev=qmp_id_qmp1,mode=control \ >> -chardev >> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait >> \ >> -device isa-serial,chardev=serial_id_serial0 \ >> -chardev >> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait >> \ >> -device >> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ >> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >> -device >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >> -device >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 >> \ >> -netdev >> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ >> -m 2G \ >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >> -cpu phenom \ >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >> -vnc :1 \ >> -rtc base=localtime,clock=host,driftfix=none \ >> -boot order=cdn,once=c,menu=off \ >> -enable-kvm \ >> -incoming tcp:0:5200 >> >> >> Thanks, >> Mikhail > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration 2015-01-27 19:09 ` Jidong Xiao @ 2015-01-28 6:42 ` Zhang Haoyu 2015-01-29 7:56 ` Mikhail Sennikovskii 2015-01-29 7:57 ` Mikhail Sennikovskii 1 sibling, 1 reply; 5+ messages in thread From: Zhang Haoyu @ 2015-01-28 6:42 UTC (permalink / raw) To: Jidong Xiao, Mikhail Sennikovskii; +Cc: KVM On 2015-01-28 03:10:23, Jidong Xiao wrote: > On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii > <mikhail.sennikovskii@profitbricks.com> wrote: > > Hi all, >> > > I've posted the bolow mail to the qemu-dev mailing list, but I've got no > > response there. > > That's why I decided to re-post it here as well, and besides that I think > > this could be a kvm-specific issue as well. > > > > Some additional thing to note: > > I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as > > well. > > I would typically use a max_downtime adjusted to 1 second instead of default > > 30 ms. > > I also noticed that the issue happens much more rarelly if I increase the > > migration bandwidth, i.e. like > > > > diff --git a/migration.c b/migration.c > > index 26f4b65..d2e3b39 100644 >> --- a/migration.c > > +++ b/migration.c > > @@ -36,7 +36,7 @@ enum { > > MIG_STATE_COMPLETED, > > }; > > > > -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ > > +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ > > > > Like I said below, I would be glad to provide you with any additional > > information. > > > > Thanks, > > Mikhail > > > Hi, Mikhail, > > So if you choose to use one vcpu, instead of smp, this issue would not > happen, right? > I think you can try cpu feature hv_relaxed, like -cpu Haswell,hv_relaxed > -Jidong > > > On 23.01.2015 15:03, Mikhail Sennikovskii wrote: > >> > >> Hi all, > >> > >> I'm running a slitely modified migration over tcp test in virt-test, which > >> does a migration from one "smp=2" VM to another on the same host over TCP, > >> and exposes some dummy CPU load inside the GUEST while migration, and > >> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD > >> inside the guest, > >> which happens when >>> " > >> An expected clock interrupt was not received on a secondary processor in > >> an > >> MP system within the allocated interval. This indicates that the specified > >> processor is hung and not processing interrupts. > >> " > >> > >> This seems to happen with any qemu version I've tested (1.2 and above, > >> including upstream), > >> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 > >> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 > >> host. > >> > >> One thing I noticed is that exposing a dummy CPU load on the HOST (like > >> running multiple instances of the "while true; do false; done" script) in > >> parallel with doing migration makes the issue to be quite easily >>> reproducible. > >> > >> > >> Looking inside the windows crash dump, the second CPU is just running at > >> IRQL 0, and it aparently not hung, as Windows is able to save its state in > >> the crash dump correctly, which assumes running some code on it. > >> So this aparently seems to be some timing issue (like host scheduler does > >> not schedule the thread executing secondary CPU's code in time). > >> > >> Could you give me some insight on this, i.e. is there a way to customize > >> QEMU/KVM to avoid such issue? > >> > >> If you think this might be a qemu/kvm issue, I can provide you any info, > >> like windows crash dumps, or the test-case to reproduce this. > >> > >> >>> qemu is started as: > >> > >> from-VM: > >> > >> qemu-system-x86_64 \ > >> -S \ > >> -name 'virt-tests-vm1' \ > >> -sandbox off \ > >> -M pc-1.0 \ > >> -nodefaults \ > >> -vga std \ > >> -chardev > >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait > >> \ > >> -mon chardev=qmp_id_qmp1,mode=control \ > >> -chardev >>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait > >> \ > >> -device isa-serial,chardev=serial_id_serial0 \ > >> -chardev > >> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait > >> \ > >> -device > >> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ > >> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ > >> -device > >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ > >> -device > >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 > >> \ > >> -netdev >>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ > >> -m 2G \ > >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ > >> -cpu phenom \ > >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > >> -vnc :0 \ > >> -rtc base=localtime,clock=host,driftfix=none \ > >> -boot order=cdn,once=c,menu=off \ > >> -enable-kvm > >> > >> to-VM: > >> > >> qemu-system-x86_64 \ > >> -S \ > >> -name 'virt-tests-vm1' \ > >> -sandbox off \ >>> -M pc-1.0 \ > >> -nodefaults \ > >> -vga std \ > >> -chardev > >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait > >> \ > >> -mon chardev=qmp_id_qmp1,mode=control \ > >> -chardev > >> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait > >> \ > >> -device isa-serial,chardev=serial_id_serial0 \ > >> -chardev > >> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait > >> \ > >> -device > >> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ >>> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ > >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ > >> -device > >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ > >> -device > >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 > >> \ > >> -netdev > >> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ > >> -m 2G \ > >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ > >> -cpu phenom \ > >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ > >> -vnc :1 \ > >> -rtc base=localtime,clock=host,driftfix=none \ > >> -boot order=cdn,once=c,menu=off \ >>> -enable-kvm \ > >> -incoming tcp:0:5200 > >> > >> > >> Thanks, > >> Mikhail ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration 2015-01-28 6:42 ` Zhang Haoyu @ 2015-01-29 7:56 ` Mikhail Sennikovskii 0 siblings, 0 replies; 5+ messages in thread From: Mikhail Sennikovskii @ 2015-01-29 7:56 UTC (permalink / raw) To: Zhang Haoyu, Jidong Xiao; +Cc: KVM Hi Zhang, Thanks a lot for the suggestion, it indeed worked for me! I.e. after adding the hv_relaxed to the list of CPU properties I can no longer reproduce the BSOD on migration with any kernel version that I used so far. Thanks for your help, Mikhail On 28.01.2015 07:42, Zhang Haoyu wrote: > On 2015-01-28 03:10:23, Jidong Xiao wrote: >> On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii >> <mikhail.sennikovskii@profitbricks.com> wrote: >>> Hi all, >>> >>> I've posted the bolow mail to the qemu-dev mailing list, but I've got no >>> response there. >>> That's why I decided to re-post it here as well, and besides that I think >>> this could be a kvm-specific issue as well. >>> >>> Some additional thing to note: >>> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as >>> well. >>> I would typically use a max_downtime adjusted to 1 second instead of default >>> 30 ms. >>> I also noticed that the issue happens much more rarelly if I increase the >>> migration bandwidth, i.e. like >>> >>> diff --git a/migration.c b/migration.c >>> index 26f4b65..d2e3b39 100644 >>> --- a/migration.c >>> +++ b/migration.c >>> @@ -36,7 +36,7 @@ enum { >>> MIG_STATE_COMPLETED, >>> }; >>> >>> -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ >>> +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ >>> >>> Like I said below, I would be glad to provide you with any additional >>> information. >>> >>> Thanks, >>> Mikhail >>> >> Hi, Mikhail, >> >> So if you choose to use one vcpu, instead of smp, this issue would not >> happen, right? >> > I think you can try cpu feature hv_relaxed, like > -cpu Haswell,hv_relaxed > >> -Jidong >> >>> On 23.01.2015 15:03, Mikhail Sennikovskii wrote: >>>> Hi all, >>>> >>>> I'm running a slitely modified migration over tcp test in virt-test, which >>>> does a migration from one "smp=2" VM to another on the same host over TCP, >>>> and exposes some dummy CPU load inside the GUEST while migration, and >>>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD >>>> inside the guest, >>>> which happens when >>>> " >>>> An expected clock interrupt was not received on a secondary processor in >>>> an >>>> MP system within the allocated interval. This indicates that the specified >>>> processor is hung and not processing interrupts. >>>> " >>>> >>>> This seems to happen with any qemu version I've tested (1.2 and above, >>>> including upstream), >>>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 >>>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 >>>> host. >>>> >>>> One thing I noticed is that exposing a dummy CPU load on the HOST (like >>>> running multiple instances of the "while true; do false; done" script) in >>>> parallel with doing migration makes the issue to be quite easily >>>> reproducible. >>>> >>>> >>>> Looking inside the windows crash dump, the second CPU is just running at >>>> IRQL 0, and it aparently not hung, as Windows is able to save its state in >>>> the crash dump correctly, which assumes running some code on it. >>>> So this aparently seems to be some timing issue (like host scheduler does >>>> not schedule the thread executing secondary CPU's code in time). >>>> >>>> Could you give me some insight on this, i.e. is there a way to customize >>>> QEMU/KVM to avoid such issue? >>>> >>>> If you think this might be a qemu/kvm issue, I can provide you any info, >>>> like windows crash dumps, or the test-case to reproduce this. >>>> >>>> >>>> qemu is started as: >>>> >>>> from-VM: >>>> >>>> qemu-system-x86_64 \ >>>> -S \ >>>> -name 'virt-tests-vm1' \ >>>> -sandbox off \ >>>> -M pc-1.0 \ >>>> -nodefaults \ >>>> -vga std \ >>>> -chardev >>>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait >>>> \ >>>> -mon chardev=qmp_id_qmp1,mode=control \ >>>> -chardev >>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait >>>> \ >>>> -device isa-serial,chardev=serial_id_serial0 \ >>>> -chardev >>>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait >>>> \ >>>> -device >>>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ >>>> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >>>> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >>>> -device >>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >>>> -device >>>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 >>>> \ >>>> -netdev >>>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ >>>> -m 2G \ >>>> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >>>> -cpu phenom \ >>>> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >>>> -vnc :0 \ >>>> -rtc base=localtime,clock=host,driftfix=none \ >>>> -boot order=cdn,once=c,menu=off \ >>>> -enable-kvm >>>> >>>> to-VM: >>>> >>>> qemu-system-x86_64 \ >>>> -S \ >>>> -name 'virt-tests-vm1' \ >>>> -sandbox off \ >>>> -M pc-1.0 \ >>>> -nodefaults \ >>>> -vga std \ >>>> -chardev >>>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait >>>> \ >>>> -mon chardev=qmp_id_qmp1,mode=control \ >>>> -chardev >>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait >>>> \ >>>> -device isa-serial,chardev=serial_id_serial0 \ >>>> -chardev >>>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait >>>> \ >>>> -device >>>> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ >>>> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >>>> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >>>> -device >>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >>>> -device >>>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 >>>> \ >>>> -netdev >>>> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ >>>> -m 2G \ >>>> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >>>> -cpu phenom \ >>>> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >>>> -vnc :1 \ >>>> -rtc base=localtime,clock=host,driftfix=none \ >>>> -boot order=cdn,once=c,menu=off \ >>>> -enable-kvm \ >>>> -incoming tcp:0:5200 >>>> >>>> >>>> Thanks, >>>> Mikhail ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration 2015-01-27 19:09 ` Jidong Xiao 2015-01-28 6:42 ` Zhang Haoyu @ 2015-01-29 7:57 ` Mikhail Sennikovskii 1 sibling, 0 replies; 5+ messages in thread From: Mikhail Sennikovskii @ 2015-01-29 7:57 UTC (permalink / raw) To: Jidong Xiao; +Cc: KVM Hi Jidong, right, this issue is SMP-specific. Mikhail On 27.01.2015 20:09, Jidong Xiao wrote: > On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii > <mikhail.sennikovskii@profitbricks.com> wrote: >> Hi all, >> >> I've posted the bolow mail to the qemu-dev mailing list, but I've got no >> response there. >> That's why I decided to re-post it here as well, and besides that I think >> this could be a kvm-specific issue as well. >> >> Some additional thing to note: >> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as >> well. >> I would typically use a max_downtime adjusted to 1 second instead of default >> 30 ms. >> I also noticed that the issue happens much more rarelly if I increase the >> migration bandwidth, i.e. like >> >> diff --git a/migration.c b/migration.c >> index 26f4b65..d2e3b39 100644 >> --- a/migration.c >> +++ b/migration.c >> @@ -36,7 +36,7 @@ enum { >> MIG_STATE_COMPLETED, >> }; >> >> -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ >> +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ >> >> Like I said below, I would be glad to provide you with any additional >> information. >> >> Thanks, >> Mikhail >> > Hi, Mikhail, > > So if you choose to use one vcpu, instead of smp, this issue would not > happen, right? > > -Jidong > >> On 23.01.2015 15:03, Mikhail Sennikovskii wrote: >>> Hi all, >>> >>> I'm running a slitely modified migration over tcp test in virt-test, which >>> does a migration from one "smp=2" VM to another on the same host over TCP, >>> and exposes some dummy CPU load inside the GUEST while migration, and >>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD >>> inside the guest, >>> which happens when >>> " >>> An expected clock interrupt was not received on a secondary processor in >>> an >>> MP system within the allocated interval. This indicates that the specified >>> processor is hung and not processing interrupts. >>> " >>> >>> This seems to happen with any qemu version I've tested (1.2 and above, >>> including upstream), >>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 >>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 >>> host. >>> >>> One thing I noticed is that exposing a dummy CPU load on the HOST (like >>> running multiple instances of the "while true; do false; done" script) in >>> parallel with doing migration makes the issue to be quite easily >>> reproducible. >>> >>> >>> Looking inside the windows crash dump, the second CPU is just running at >>> IRQL 0, and it aparently not hung, as Windows is able to save its state in >>> the crash dump correctly, which assumes running some code on it. >>> So this aparently seems to be some timing issue (like host scheduler does >>> not schedule the thread executing secondary CPU's code in time). >>> >>> Could you give me some insight on this, i.e. is there a way to customize >>> QEMU/KVM to avoid such issue? >>> >>> If you think this might be a qemu/kvm issue, I can provide you any info, >>> like windows crash dumps, or the test-case to reproduce this. >>> >>> >>> qemu is started as: >>> >>> from-VM: >>> >>> qemu-system-x86_64 \ >>> -S \ >>> -name 'virt-tests-vm1' \ >>> -sandbox off \ >>> -M pc-1.0 \ >>> -nodefaults \ >>> -vga std \ >>> -chardev >>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait >>> \ >>> -mon chardev=qmp_id_qmp1,mode=control \ >>> -chardev >>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait >>> \ >>> -device isa-serial,chardev=serial_id_serial0 \ >>> -chardev >>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait >>> \ >>> -device >>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ >>> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >>> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >>> -device >>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >>> -device >>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 >>> \ >>> -netdev >>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ >>> -m 2G \ >>> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >>> -cpu phenom \ >>> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >>> -vnc :0 \ >>> -rtc base=localtime,clock=host,driftfix=none \ >>> -boot order=cdn,once=c,menu=off \ >>> -enable-kvm >>> >>> to-VM: >>> >>> qemu-system-x86_64 \ >>> -S \ >>> -name 'virt-tests-vm1' \ >>> -sandbox off \ >>> -M pc-1.0 \ >>> -nodefaults \ >>> -vga std \ >>> -chardev >>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait >>> \ >>> -mon chardev=qmp_id_qmp1,mode=control \ >>> -chardev >>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait >>> \ >>> -device isa-serial,chardev=serial_id_serial0 \ >>> -chardev >>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait >>> \ >>> -device >>> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ >>> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >>> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >>> -device >>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >>> -device >>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 >>> \ >>> -netdev >>> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ >>> -m 2G \ >>> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >>> -cpu phenom \ >>> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >>> -vnc :1 \ >>> -rtc base=localtime,clock=host,driftfix=none \ >>> -boot order=cdn,once=c,menu=off \ >>> -enable-kvm \ >>> -incoming tcp:0:5200 >>> >>> >>> Thanks, >>> Mikhail >> >> -- >> To unsubscribe from this list: send the line "unsubscribe kvm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-29 7:57 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com>
[not found] ` <54C254AF.7010101@profitbricks.com>
2015-01-27 13:55 ` Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration Mikhail Sennikovskii
2015-01-27 19:09 ` Jidong Xiao
2015-01-28 6:42 ` Zhang Haoyu
2015-01-29 7:56 ` Mikhail Sennikovskii
2015-01-29 7:57 ` Mikhail Sennikovskii
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox