Windows 2008 Guest BSODS with CLOCK_WATCHDOG

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
       [not found] ` <54C254AF.7010101@profitbricks.com>
@ 2015-01-27 13:55   ` Mikhail Sennikovskii
  2015-01-27 19:09     ` Jidong Xiao
  0 siblings, 1 reply; 5+ messages in thread
From: Mikhail Sennikovskii @ 2015-01-27 13:55 UTC (permalink / raw)
  To: kvm

Hi all,

I've posted the bolow mail to the qemu-dev mailing list, but I've got no 
response there.
That's why I decided to re-post it here as well, and besides that I 
think this could be a kvm-specific issue as well.

Some additional thing to note:
I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 
kernel as well.
I would typically use a max_downtime adjusted to 1 second instead of 
default 30 ms.
I also noticed that the issue happens much more rarelly if I increase 
the migration bandwidth, i.e. like

diff --git a/migration.c b/migration.c
index 26f4b65..d2e3b39 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,7 @@ enum {
      MIG_STATE_COMPLETED,
  };

-#define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
+#define MAX_THROTTLE  (90 << 20)      /* Migration speed throttling */

Like I said below, I would be glad to provide you with any additional 
information.

Thanks,
Mikhail

On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
> Hi all,
>
> I'm running a slitely modified migration over tcp test in virt-test, 
> which does a migration from one "smp=2" VM to another on the same host 
> over TCP,
> and exposes some dummy CPU load inside the GUEST while migration, and 
> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT 
> BSOD inside the guest,
> which happens when
> "
> An expected clock interrupt was not received on a secondary processor 
> in an
> MP system within the allocated interval. This indicates that the 
> specified
> processor is hung and not processing interrupts.
> "
>
> This seems to happen with any qemu version I've tested (1.2 and above, 
> including upstream),
> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 
> 14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 
> 6 with SMP6 host.
>
> One thing I noticed is that exposing a dummy CPU load on the HOST 
> (like running multiple instances of the "while true; do false; done" 
> script) in parallel with doing migration makes the issue to be quite 
> easily reproducible.
>
>
> Looking inside the windows crash dump, the second CPU is just running 
> at IRQL 0, and it aparently not hung, as Windows is able to save its 
> state in the crash dump correctly, which assumes running some code on it.
> So this aparently seems to be some timing issue (like host scheduler 
> does not schedule the thread executing secondary CPU's code in time).
>
> Could you give me some insight on this, i.e. is there a way to 
> customize QEMU/KVM to avoid such issue?
>
> If you think this might be a qemu/kvm issue, I can provide you any 
> info, like windows crash dumps, or the test-case to reproduce this.
>
>
> qemu is started as:
>
> from-VM:
>
> qemu-system-x86_64 \
>     -S  \
>     -name 'virt-tests-vm1'  \
>     -sandbox off  \
>     -M pc-1.0  \
>     -nodefaults  \
>     -vga std  \
>     -chardev 
> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait 
> \
>     -mon chardev=qmp_id_qmp1,mode=control  \
>     -chardev 
> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait 
> \
>     -device isa-serial,chardev=serial_id_serial0  \
>     -chardev 
> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait 
> \
>     -device 
> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 
> \
>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>     -device 
> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 
> \
>     -device 
> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 
> \
>     -netdev 
> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
>     -m 2G  \
>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>     -cpu phenom \
>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>     -vnc :0  \
>     -rtc base=localtime,clock=host,driftfix=none  \
>     -boot order=cdn,once=c,menu=off \
>     -enable-kvm
>
> to-VM:
>
> qemu-system-x86_64 \
>     -S  \
>     -name 'virt-tests-vm1'  \
>     -sandbox off  \
>     -M pc-1.0  \
>     -nodefaults  \
>     -vga std  \
>     -chardev 
> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait 
> \
>     -mon chardev=qmp_id_qmp1,mode=control  \
>     -chardev 
> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait 
> \
>     -device isa-serial,chardev=serial_id_serial0  \
>     -chardev 
> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait 
> \
>     -device 
> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 
> \
>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>     -device 
> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 
> \
>     -device 
> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 
> \
>     -netdev 
> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023  \
>     -m 2G  \
>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>     -cpu phenom \
>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>     -vnc :1  \
>     -rtc base=localtime,clock=host,driftfix=none  \
>     -boot order=cdn,once=c,menu=off \
>     -enable-kvm \
>     -incoming tcp:0:5200
>
>
> Thanks,
> Mikhail


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
  2015-01-27 13:55   ` Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration Mikhail Sennikovskii
@ 2015-01-27 19:09     ` Jidong Xiao
  2015-01-28  6:42       ` Zhang Haoyu
  2015-01-29  7:57       ` Mikhail Sennikovskii
  0 siblings, 2 replies; 5+ messages in thread
From: Jidong Xiao @ 2015-01-27 19:09 UTC (permalink / raw)
  To: Mikhail Sennikovskii; +Cc: KVM

On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
<mikhail.sennikovskii@profitbricks.com> wrote:
> Hi all,
>
> I've posted the bolow mail to the qemu-dev mailing list, but I've got no
> response there.
> That's why I decided to re-post it here as well, and besides that I think
> this could be a kvm-specific issue as well.
>
> Some additional thing to note:
> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
> well.
> I would typically use a max_downtime adjusted to 1 second instead of default
> 30 ms.
> I also noticed that the issue happens much more rarelly if I increase the
> migration bandwidth, i.e. like
>
> diff --git a/migration.c b/migration.c
> index 26f4b65..d2e3b39 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -36,7 +36,7 @@ enum {
>      MIG_STATE_COMPLETED,
>  };
>
> -#define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
> +#define MAX_THROTTLE  (90 << 20)      /* Migration speed throttling */
>
> Like I said below, I would be glad to provide you with any additional
> information.
>
> Thanks,
> Mikhail
>
Hi, Mikhail,

So if you choose to use one vcpu, instead of smp, this issue would not
happen, right?

-Jidong

> On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
>>
>> Hi all,
>>
>> I'm running a slitely modified migration over tcp test in virt-test, which
>> does a migration from one "smp=2" VM to another on the same host over TCP,
>> and exposes some dummy CPU load inside the GUEST while migration, and
>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
>> inside the guest,
>> which happens when
>> "
>> An expected clock interrupt was not received on a secondary processor in
>> an
>> MP system within the allocated interval. This indicates that the specified
>> processor is hung and not processing interrupts.
>> "
>>
>> This seems to happen with any qemu version I've tested (1.2 and above,
>> including upstream),
>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
>> host.
>>
>> One thing I noticed is that exposing a dummy CPU load on the HOST (like
>> running multiple instances of the "while true; do false; done" script) in
>> parallel with doing migration makes the issue to be quite easily
>> reproducible.
>>
>>
>> Looking inside the windows crash dump, the second CPU is just running at
>> IRQL 0, and it aparently not hung, as Windows is able to save its state in
>> the crash dump correctly, which assumes running some code on it.
>> So this aparently seems to be some timing issue (like host scheduler does
>> not schedule the thread executing secondary CPU's code in time).
>>
>> Could you give me some insight on this, i.e. is there a way to customize
>> QEMU/KVM to avoid such issue?
>>
>> If you think this might be a qemu/kvm issue, I can provide you any info,
>> like windows crash dumps, or the test-case to reproduce this.
>>
>>
>> qemu is started as:
>>
>> from-VM:
>>
>> qemu-system-x86_64 \
>>     -S  \
>>     -name 'virt-tests-vm1'  \
>>     -sandbox off  \
>>     -M pc-1.0  \
>>     -nodefaults  \
>>     -vga std  \
>>     -chardev
>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
>> \
>>     -mon chardev=qmp_id_qmp1,mode=control  \
>>     -chardev
>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
>> \
>>     -device isa-serial,chardev=serial_id_serial0  \
>>     -chardev
>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
>> \
>>     -device
>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
>>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>     -device
>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>     -device
>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
>> \
>>     -netdev
>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
>>     -m 2G  \
>>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>     -cpu phenom \
>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>     -vnc :0  \
>>     -rtc base=localtime,clock=host,driftfix=none  \
>>     -boot order=cdn,once=c,menu=off \
>>     -enable-kvm
>>
>> to-VM:
>>
>> qemu-system-x86_64 \
>>     -S  \
>>     -name 'virt-tests-vm1'  \
>>     -sandbox off  \
>>     -M pc-1.0  \
>>     -nodefaults  \
>>     -vga std  \
>>     -chardev
>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
>> \
>>     -mon chardev=qmp_id_qmp1,mode=control  \
>>     -chardev
>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
>> \
>>     -device isa-serial,chardev=serial_id_serial0  \
>>     -chardev
>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
>> \
>>     -device
>> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
>>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>     -device
>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>     -device
>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05
>> \
>>     -netdev
>> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023  \
>>     -m 2G  \
>>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>     -cpu phenom \
>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>     -vnc :1  \
>>     -rtc base=localtime,clock=host,driftfix=none  \
>>     -boot order=cdn,once=c,menu=off \
>>     -enable-kvm \
>>     -incoming tcp:0:5200
>>
>>
>> Thanks,
>> Mikhail
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
  2015-01-27 19:09     ` Jidong Xiao
@ 2015-01-28  6:42       ` Zhang Haoyu
  2015-01-29  7:56         ` Mikhail Sennikovskii
  2015-01-29  7:57       ` Mikhail Sennikovskii
  1 sibling, 1 reply; 5+ messages in thread
From: Zhang Haoyu @ 2015-01-28  6:42 UTC (permalink / raw)
  To: Jidong Xiao, Mikhail Sennikovskii; +Cc: KVM


On 2015-01-28 03:10:23, Jidong Xiao wrote:
> On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
> <mikhail.sennikovskii@profitbricks.com> wrote:
> > Hi all,
>>
> > I've posted the bolow mail to the qemu-dev mailing list, but I've got no
> > response there.
> > That's why I decided to re-post it here as well, and besides that I think
> > this could be a kvm-specific issue as well.
> >
> > Some additional thing to note:
> > I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
> > well.
> > I would typically use a max_downtime adjusted to 1 second instead of default
> > 30 ms.
> > I also noticed that the issue happens much more rarelly if I increase the
> > migration bandwidth, i.e. like
> >
> > diff --git a/migration.c b/migration.c
> > index 26f4b65..d2e3b39 100644
>> --- a/migration.c
> > +++ b/migration.c
> > @@ -36,7 +36,7 @@ enum {
> >      MIG_STATE_COMPLETED,
> >  };
> >
> > -#define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
> > +#define MAX_THROTTLE  (90 << 20)      /* Migration speed throttling */
> >
> > Like I said below, I would be glad to provide you with any additional
> > information.
> >
> > Thanks,
> > Mikhail
> >
> Hi, Mikhail,
>
> So if you choose to use one vcpu, instead of smp, this issue would not
> happen, right?
> 
I think you can try cpu feature hv_relaxed, like
-cpu Haswell,hv_relaxed

> -Jidong
> 
> > On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
> >>
> >> Hi all,
> >>
> >> I'm running a slitely modified migration over tcp test in virt-test, which
> >> does a migration from one "smp=2" VM to another on the same host over TCP,
> >> and exposes some dummy CPU load inside the GUEST while migration, and
> >> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
> >> inside the guest,
> >> which happens when
>>> "
> >> An expected clock interrupt was not received on a secondary processor in
> >> an
> >> MP system within the allocated interval. This indicates that the specified
> >> processor is hung and not processing interrupts.
> >> "
> >>
> >> This seems to happen with any qemu version I've tested (1.2 and above,
> >> including upstream),
> >> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
> >> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
> >> host.
> >>
> >> One thing I noticed is that exposing a dummy CPU load on the HOST (like
> >> running multiple instances of the "while true; do false; done" script) in
> >> parallel with doing migration makes the issue to be quite easily
>>> reproducible.
> >>
> >>
> >> Looking inside the windows crash dump, the second CPU is just running at
> >> IRQL 0, and it aparently not hung, as Windows is able to save its state in
> >> the crash dump correctly, which assumes running some code on it.
> >> So this aparently seems to be some timing issue (like host scheduler does
> >> not schedule the thread executing secondary CPU's code in time).
> >>
> >> Could you give me some insight on this, i.e. is there a way to customize
> >> QEMU/KVM to avoid such issue?
> >>
> >> If you think this might be a qemu/kvm issue, I can provide you any info,
> >> like windows crash dumps, or the test-case to reproduce this.
> >>
> >>
>>> qemu is started as:
> >>
> >> from-VM:
> >>
> >> qemu-system-x86_64 \
> >>     -S  \
> >>     -name 'virt-tests-vm1'  \
> >>     -sandbox off  \
> >>     -M pc-1.0  \
> >>     -nodefaults  \
> >>     -vga std  \
> >>     -chardev
> >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
> >> \
> >>     -mon chardev=qmp_id_qmp1,mode=control  \
> >>     -chardev
>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
> >> \
> >>     -device isa-serial,chardev=serial_id_serial0  \
> >>     -chardev
> >> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
> >> \
> >>     -device
> >> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
> >>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
> >>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
> >>     -device
> >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
> >>     -device
> >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
> >> \
> >>     -netdev
>>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
> >>     -m 2G  \
> >>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
> >>     -cpu phenom \
> >>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
> >>     -vnc :0  \
> >>     -rtc base=localtime,clock=host,driftfix=none  \
> >>     -boot order=cdn,once=c,menu=off \
> >>     -enable-kvm
> >>
> >> to-VM:
> >>
> >> qemu-system-x86_64 \
> >>     -S  \
> >>     -name 'virt-tests-vm1'  \
> >>     -sandbox off  \
>>>     -M pc-1.0  \
> >>     -nodefaults  \
> >>     -vga std  \
> >>     -chardev
> >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
> >> \
> >>     -mon chardev=qmp_id_qmp1,mode=control  \
> >>     -chardev
> >> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
> >> \
> >>     -device isa-serial,chardev=serial_id_serial0  \
> >>     -chardev
> >> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
> >> \
> >>     -device
> >> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
>>>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
> >>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
> >>     -device
> >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
> >>     -device
> >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05
> >> \
> >>     -netdev
> >> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023  \
> >>     -m 2G  \
> >>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
> >>     -cpu phenom \
> >>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
> >>     -vnc :1  \
> >>     -rtc base=localtime,clock=host,driftfix=none  \
> >>     -boot order=cdn,once=c,menu=off \
>>>     -enable-kvm \
> >>     -incoming tcp:0:5200
> >>
> >>
> >> Thanks,
> >> Mikhail


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
  2015-01-28  6:42       ` Zhang Haoyu
@ 2015-01-29  7:56         ` Mikhail Sennikovskii
  0 siblings, 0 replies; 5+ messages in thread
From: Mikhail Sennikovskii @ 2015-01-29  7:56 UTC (permalink / raw)
  To: Zhang Haoyu, Jidong Xiao; +Cc: KVM

Hi Zhang,

Thanks a lot for the suggestion, it indeed worked for me!
I.e. after adding the hv_relaxed to the list of CPU properties I can no 
longer reproduce the BSOD on migration with any kernel version that I 
used so far.

Thanks for your help,
Mikhail

On 28.01.2015 07:42, Zhang Haoyu wrote:
> On 2015-01-28 03:10:23, Jidong Xiao wrote:
>> On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
>> <mikhail.sennikovskii@profitbricks.com> wrote:
>>> Hi all,
>>>
>>> I've posted the bolow mail to the qemu-dev mailing list, but I've got no
>>> response there.
>>> That's why I decided to re-post it here as well, and besides that I think
>>> this could be a kvm-specific issue as well.
>>>
>>> Some additional thing to note:
>>> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
>>> well.
>>> I would typically use a max_downtime adjusted to 1 second instead of default
>>> 30 ms.
>>> I also noticed that the issue happens much more rarelly if I increase the
>>> migration bandwidth, i.e. like
>>>
>>> diff --git a/migration.c b/migration.c
>>> index 26f4b65..d2e3b39 100644
>>> --- a/migration.c
>>> +++ b/migration.c
>>> @@ -36,7 +36,7 @@ enum {
>>>       MIG_STATE_COMPLETED,
>>>   };
>>>
>>> -#define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
>>> +#define MAX_THROTTLE  (90 << 20)      /* Migration speed throttling */
>>>
>>> Like I said below, I would be glad to provide you with any additional
>>> information.
>>>
>>> Thanks,
>>> Mikhail
>>>
>> Hi, Mikhail,
>>
>> So if you choose to use one vcpu, instead of smp, this issue would not
>> happen, right?
>>
> I think you can try cpu feature hv_relaxed, like
> -cpu Haswell,hv_relaxed
>
>> -Jidong
>>
>>> On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
>>>> Hi all,
>>>>
>>>> I'm running a slitely modified migration over tcp test in virt-test, which
>>>> does a migration from one "smp=2" VM to another on the same host over TCP,
>>>> and exposes some dummy CPU load inside the GUEST while migration, and
>>>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
>>>> inside the guest,
>>>> which happens when
>>>> "
>>>> An expected clock interrupt was not received on a secondary processor in
>>>> an
>>>> MP system within the allocated interval. This indicates that the specified
>>>> processor is hung and not processing interrupts.
>>>> "
>>>>
>>>> This seems to happen with any qemu version I've tested (1.2 and above,
>>>> including upstream),
>>>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
>>>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
>>>> host.
>>>>
>>>> One thing I noticed is that exposing a dummy CPU load on the HOST (like
>>>> running multiple instances of the "while true; do false; done" script) in
>>>> parallel with doing migration makes the issue to be quite easily
>>>> reproducible.
>>>>
>>>>
>>>> Looking inside the windows crash dump, the second CPU is just running at
>>>> IRQL 0, and it aparently not hung, as Windows is able to save its state in
>>>> the crash dump correctly, which assumes running some code on it.
>>>> So this aparently seems to be some timing issue (like host scheduler does
>>>> not schedule the thread executing secondary CPU's code in time).
>>>>
>>>> Could you give me some insight on this, i.e. is there a way to customize
>>>> QEMU/KVM to avoid such issue?
>>>>
>>>> If you think this might be a qemu/kvm issue, I can provide you any info,
>>>> like windows crash dumps, or the test-case to reproduce this.
>>>>
>>>>
>>>> qemu is started as:
>>>>
>>>> from-VM:
>>>>
>>>> qemu-system-x86_64 \
>>>>      -S  \
>>>>      -name 'virt-tests-vm1'  \
>>>>      -sandbox off  \
>>>>      -M pc-1.0  \
>>>>      -nodefaults  \
>>>>      -vga std  \
>>>>      -chardev
>>>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
>>>> \
>>>>      -mon chardev=qmp_id_qmp1,mode=control  \
>>>>      -chardev
>>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
>>>> \
>>>>      -device isa-serial,chardev=serial_id_serial0  \
>>>>      -chardev
>>>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
>>>> \
>>>>      -device
>>>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
>>>>      -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>>>      -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>>>      -device
>>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>>>      -device
>>>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
>>>> \
>>>>      -netdev
>>>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
>>>>      -m 2G  \
>>>>      -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>>>      -cpu phenom \
>>>>      -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>>>      -vnc :0  \
>>>>      -rtc base=localtime,clock=host,driftfix=none  \
>>>>      -boot order=cdn,once=c,menu=off \
>>>>      -enable-kvm
>>>>
>>>> to-VM:
>>>>
>>>> qemu-system-x86_64 \
>>>>      -S  \
>>>>      -name 'virt-tests-vm1'  \
>>>>      -sandbox off  \
>>>>      -M pc-1.0  \
>>>>      -nodefaults  \
>>>>      -vga std  \
>>>>      -chardev
>>>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
>>>> \
>>>>      -mon chardev=qmp_id_qmp1,mode=control  \
>>>>      -chardev
>>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
>>>> \
>>>>      -device isa-serial,chardev=serial_id_serial0  \
>>>>      -chardev
>>>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
>>>> \
>>>>      -device
>>>> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
>>>>      -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>>>      -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>>>      -device
>>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>>>      -device
>>>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05
>>>> \
>>>>      -netdev
>>>> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023  \
>>>>      -m 2G  \
>>>>      -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>>>      -cpu phenom \
>>>>      -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>>>      -vnc :1  \
>>>>      -rtc base=localtime,clock=host,driftfix=none  \
>>>>      -boot order=cdn,once=c,menu=off \
>>>>      -enable-kvm \
>>>>      -incoming tcp:0:5200
>>>>
>>>>
>>>> Thanks,
>>>> Mikhail


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration
  2015-01-27 19:09     ` Jidong Xiao
  2015-01-28  6:42       ` Zhang Haoyu
@ 2015-01-29  7:57       ` Mikhail Sennikovskii
  1 sibling, 0 replies; 5+ messages in thread
From: Mikhail Sennikovskii @ 2015-01-29  7:57 UTC (permalink / raw)
  To: Jidong Xiao; +Cc: KVM

Hi Jidong,

right, this issue is SMP-specific.

Mikhail

On 27.01.2015 20:09, Jidong Xiao wrote:
> On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
> <mikhail.sennikovskii@profitbricks.com> wrote:
>> Hi all,
>>
>> I've posted the bolow mail to the qemu-dev mailing list, but I've got no
>> response there.
>> That's why I decided to re-post it here as well, and besides that I think
>> this could be a kvm-specific issue as well.
>>
>> Some additional thing to note:
>> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
>> well.
>> I would typically use a max_downtime adjusted to 1 second instead of default
>> 30 ms.
>> I also noticed that the issue happens much more rarelly if I increase the
>> migration bandwidth, i.e. like
>>
>> diff --git a/migration.c b/migration.c
>> index 26f4b65..d2e3b39 100644
>> --- a/migration.c
>> +++ b/migration.c
>> @@ -36,7 +36,7 @@ enum {
>>       MIG_STATE_COMPLETED,
>>   };
>>
>> -#define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
>> +#define MAX_THROTTLE  (90 << 20)      /* Migration speed throttling */
>>
>> Like I said below, I would be glad to provide you with any additional
>> information.
>>
>> Thanks,
>> Mikhail
>>
> Hi, Mikhail,
>
> So if you choose to use one vcpu, instead of smp, this issue would not
> happen, right?
>
> -Jidong
>
>> On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
>>> Hi all,
>>>
>>> I'm running a slitely modified migration over tcp test in virt-test, which
>>> does a migration from one "smp=2" VM to another on the same host over TCP,
>>> and exposes some dummy CPU load inside the GUEST while migration, and
>>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
>>> inside the guest,
>>> which happens when
>>> "
>>> An expected clock interrupt was not received on a secondary processor in
>>> an
>>> MP system within the allocated interval. This indicates that the specified
>>> processor is hung and not processing interrupts.
>>> "
>>>
>>> This seems to happen with any qemu version I've tested (1.2 and above,
>>> including upstream),
>>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
>>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
>>> host.
>>>
>>> One thing I noticed is that exposing a dummy CPU load on the HOST (like
>>> running multiple instances of the "while true; do false; done" script) in
>>> parallel with doing migration makes the issue to be quite easily
>>> reproducible.
>>>
>>>
>>> Looking inside the windows crash dump, the second CPU is just running at
>>> IRQL 0, and it aparently not hung, as Windows is able to save its state in
>>> the crash dump correctly, which assumes running some code on it.
>>> So this aparently seems to be some timing issue (like host scheduler does
>>> not schedule the thread executing secondary CPU's code in time).
>>>
>>> Could you give me some insight on this, i.e. is there a way to customize
>>> QEMU/KVM to avoid such issue?
>>>
>>> If you think this might be a qemu/kvm issue, I can provide you any info,
>>> like windows crash dumps, or the test-case to reproduce this.
>>>
>>>
>>> qemu is started as:
>>>
>>> from-VM:
>>>
>>> qemu-system-x86_64 \
>>>      -S  \
>>>      -name 'virt-tests-vm1'  \
>>>      -sandbox off  \
>>>      -M pc-1.0  \
>>>      -nodefaults  \
>>>      -vga std  \
>>>      -chardev
>>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
>>> \
>>>      -mon chardev=qmp_id_qmp1,mode=control  \
>>>      -chardev
>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
>>> \
>>>      -device isa-serial,chardev=serial_id_serial0  \
>>>      -chardev
>>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
>>> \
>>>      -device
>>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
>>>      -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>>      -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>>      -device
>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>>      -device
>>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
>>> \
>>>      -netdev
>>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
>>>      -m 2G  \
>>>      -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>>      -cpu phenom \
>>>      -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>>      -vnc :0  \
>>>      -rtc base=localtime,clock=host,driftfix=none  \
>>>      -boot order=cdn,once=c,menu=off \
>>>      -enable-kvm
>>>
>>> to-VM:
>>>
>>> qemu-system-x86_64 \
>>>      -S  \
>>>      -name 'virt-tests-vm1'  \
>>>      -sandbox off  \
>>>      -M pc-1.0  \
>>>      -nodefaults  \
>>>      -vga std  \
>>>      -chardev
>>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
>>> \
>>>      -mon chardev=qmp_id_qmp1,mode=control  \
>>>      -chardev
>>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
>>> \
>>>      -device isa-serial,chardev=serial_id_serial0  \
>>>      -chardev
>>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
>>> \
>>>      -device
>>> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
>>>      -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>>      -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>>      -device
>>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>>      -device
>>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05
>>> \
>>>      -netdev
>>> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023  \
>>>      -m 2G  \
>>>      -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>>      -cpu phenom \
>>>      -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>>      -vnc :1  \
>>>      -rtc base=localtime,clock=host,driftfix=none  \
>>>      -boot order=cdn,once=c,menu=off \
>>>      -enable-kvm \
>>>      -incoming tcp:0:5200
>>>
>>>
>>> Thanks,
>>> Mikhail
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-29  7:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20150118030317.23598.27686.malonedeb@chaenomeles.canonical.com>
     [not found] ` <54C254AF.7010101@profitbricks.com>
2015-01-27 13:55   ` Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration Mikhail Sennikovskii
2015-01-27 19:09     ` Jidong Xiao
2015-01-28  6:42       ` Zhang Haoyu
2015-01-29  7:56         ` Mikhail Sennikovskii
2015-01-29  7:57       ` Mikhail Sennikovskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox