[Qemu-devel] Unresponsive linux guest once migrated

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] Unresponsive linux guest once migrated
@ 2014-03-27 22:52 Chris Dunlop
  2014-03-27 23:29 ` Marcin Gibuła
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2014-03-27 22:52 UTC (permalink / raw)
  To: qemu-devel

Hi,

I have a problem where I migrate a linux guest VM, and on the
receiving side the guest goes to 100% cpu as seen by the host, and
the guest itself is unresponsive, e.g. not responding to ping etc.
The only way out I've found is to destroy the guest.

This seems to only happen if the guest has been idle for an extended
period (e.g. overnight). I've migrated the guest 100 times in a row
without any problems when the guest has been used "a little" (e.g.
logging in and looking around, it's not doing anything normally).

I've not had similar problems migrating Windows guests.

guest - debian wheezy, kernel 3.2.0-4-amd64
host - debian wheezy, kernel 3.10.33 x86_64 (self-compiled)
qemu - qemu_1.7.0+dfsg-2~bpo70+2 + rbd (self-compiled)

All guests use ceph rbd for backing store.

qemu-system-x86_64 -enable-kvm -name test -S -machine pc-1.0,accel=kvm,usb=off -m 1024 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid 620dd8e0-f24c-485d-a134-ba5961ce6531 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=rbd:pool/test:id=test:key=xxxxxxxxxxx=:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=25 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:29:10:16,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4

Ps tells me the qemu-system-x86_64 process has 17 threads, and it's
the 2nd last of these that's consuming the cpu. Strace on that
thread doen't tell me much:

rt_sigtimedwait([BUS USR1], 0x7f5761957b30, {0, 0}, 8) = -1 EAGAIN (Resource temporarily unavailable)
rt_sigpending([])                       = 0
ioctl(16, KVM_RUN <unfinished ...>

Using 'echo l > /proc/sysrq-trigger' a few times shows me the CPU
running that thread is always at vmx_vcpu_run+0x5eb, e.g.:

[571745.343753] NMI backtrace for cpu 2
[571745.343779] CPU: 2 PID: 31618 Comm: qemu-system-x86 Tainted: G           O 3.10.33-otn-00017-g510ea14 #2
[571745.343827] Hardware name: Supermicro X8DTH-i/6/iF/6F/X8DTH, BIOS 2.0c       07/19/11   
[571745.343871] task: ffff880002f99380 ti: ffff8801acaf0000 task.ti: ffff8801acaf0000
[571745.343915] RIP: 0010:[<ffffffffa104130b>]  [<ffffffffa104130b>] vmx_vcpu_run+0x5eb/0x670 [kvm_intel]
[571745.343978] RSP: 0018:ffff8801acaf3cc8  EFLAGS: 00000082
[571745.344004] RAX: 0000000080000202 RBX: 0000000001443980 RCX: ffff8801fd698000
[571745.344046] RDX: 0000000000000200 RSI: 00000000693e2680 RDI: ffff8801fd698000
[571745.344089] RBP: ffff8801acaf3d38 R08: 00000000693e9b40 R09: 0000000000000000
[571745.344131] R10: 0000000000000f08 R11: 0000000000000000 R12: 0000000000000000
[571745.344174] R13: 0000000000000001 R14: 0000000000000014 R15: ffffffffffffffff
[571745.344217] FS:  00007f5609fec700(0000) GS:ffff88081fc80000(0000) knlGS:fffff801388f8000
[571745.344261] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[571745.344288] CR2: 0000000001449b8a CR3: 00000006eaa6c000 CR4: 00000000000027e0
[571745.344330] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[571745.344373] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[571745.344415] Stack:
[571745.344435]  ffff8801acaf3d38 ffffffffa1042576 0000000000000000 ffff8801fd698000
[571745.344487]  0000000200000000 ffff8801fd698000 ffff8801acaf3d18 ffff8805cbfbc040
[571745.344539]  0000000000000002 ffff8805cbfbc040 0000000000000001 0000000000000000
[571745.344590] Call Trace:
[571745.344615]  [<ffffffffa1042576>] ? vmx_handle_exit+0xf6/0x8d0 [kvm_intel]
[571745.344661]  [<ffffffffa0459341>] kvm_arch_vcpu_ioctl_run+0x9a1/0x1100 [kvm]
[571745.344699]  [<ffffffffa04543d7>] ? kvm_arch_vcpu_load+0x57/0x1e0 [kvm]
[571745.344734]  [<ffffffffa0444d24>] kvm_vcpu_ioctl+0x2b4/0x580 [kvm]
[571745.344767]  [<ffffffffa04468ef>] ? kvm_vm_ioctl+0x57f/0x5f0 [kvm]
[571745.344797]  [<ffffffff81147090>] do_vfs_ioctl+0x90/0x520
[571745.344825]  [<ffffffff8106fd98>] ? __enqueue_entity+0x78/0x80 
[571745.344853]  [<ffffffff81083b38>] ? SyS_futex+0x98/0x1a0
[571745.344887]  [<ffffffffa044e1b4>] ? kvm_on_user_return+0x64/0x70 [kvm]
[571745.344916]  [<ffffffff81147570>] SyS_ioctl+0x50/0x90
[571745.344944]  [<ffffffff813bf782>] system_call_fastpath+0x16/0x1b
[571745.344971] Code: 82 1c 02 00 00 a8 10 0f 84 8b fa ff ff e9 66 ff ff ff 66 0f 1f 44 00 00 85 c0 0f 89 51 fd ff ff 48 8b 7d a8 e8 87 9f 40 ff cd 02 <48> 8b 7d a8 e8 9c 9f 40 ff e9 38 fd ff ff 48 89 f9 48 c1 e9 0d 


What can I do to help track this down?

Cheers,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-03-27 22:52 [Qemu-devel] Unresponsive linux guest once migrated Chris Dunlop
@ 2014-03-27 23:29 ` Marcin Gibuła
  2014-03-27 23:59   ` Chris Dunlop
  0 siblings, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-03-27 23:29 UTC (permalink / raw)
  To: qemu-devel, chris

W dniu 2014-03-27 23:52, Chris Dunlop pisze:
> Hi,
>
> I have a problem where I migrate a linux guest VM, and on the
> receiving side the guest goes to 100% cpu as seen by the host, and
> the guest itself is unresponsive, e.g. not responding to ping etc.
> The only way out I've found is to destroy the guest.
>
> This seems to only happen if the guest has been idle for an extended
> period (e.g. overnight). I've migrated the guest 100 times in a row
> without any problems when the guest has been used "a little" (e.g.
> logging in and looking around, it's not doing anything normally).

Hi,

I've seen very similar problem on our installation. Have you tried to 
run with kvm-clock explicitly disabled (either via no-kvmclock in guest 
kernel or with -kvm-clock in qemu) ?

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-03-27 23:29 ` Marcin Gibuła
@ 2014-03-27 23:59   ` Chris Dunlop
  2014-03-31  8:39     ` Marcin Gibuła
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2014-03-27 23:59 UTC (permalink / raw)
  To: Marcin Gibuła; +Cc: qemu-devel

On Fri, Mar 28, 2014 at 12:29:18AM +0100, Marcin Gibuła wrote:
> W dniu 2014-03-27 23:52, Chris Dunlop pisze:
>> Hi,
>>
>> I have a problem where I migrate a linux guest VM, and on the
>> receiving side the guest goes to 100% cpu as seen by the host, and
>> the guest itself is unresponsive, e.g. not responding to ping etc.
>> The only way out I've found is to destroy the guest.
>>
>> This seems to only happen if the guest has been idle for an extended
>> period (e.g. overnight). I've migrated the guest 100 times in a row
>> without any problems when the guest has been used "a little" (e.g.
>> logging in and looking around, it's not doing anything normally).
> 
> Hi,
> 
> I've seen very similar problem on our installation. Have you tried to
> run with kvm-clock explicitly disabled (either via no-kvmclock in
> guest kernel or with -kvm-clock in qemu) ?

No, I haven't tried it yet (I've confirmed kvm-clock is currently
being used). I'll have a look at it.

Did it help your issue?

Thanks,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-03-27 23:59   ` Chris Dunlop
@ 2014-03-31  8:39     ` Marcin Gibuła
  2014-04-02  5:41       ` Chris Dunlop
  0 siblings, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-03-31  8:39 UTC (permalink / raw)
  To: Chris Dunlop; +Cc: qemu-devel

>> I've seen very similar problem on our installation. Have you tried to
>> run with kvm-clock explicitly disabled (either via no-kvmclock in
>> guest kernel or with -kvm-clock in qemu) ?
>
> No, I haven't tried it yet (I've confirmed kvm-clock is currently
> being used). I'll have a look at it.
>
> Did it help your issue?

My results were inconclusive, but there way a guy two months ago who had 
the same problem and disabling kvm-clock resolved this for him.

I wonder if it'll help you as well.

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-03-31  8:39     ` Marcin Gibuła
@ 2014-04-02  5:41       ` Chris Dunlop
  2014-04-02  8:45         ` Marcin Gibuła
  0 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2014-04-02  5:41 UTC (permalink / raw)
  To: Marcin Gibuła; +Cc: qemu-devel

On Mon, Mar 31, 2014 at 10:39:47AM +0200, Marcin Gibuła wrote:
>>> I've seen very similar problem on our installation. Have you tried to
>>> run with kvm-clock explicitly disabled (either via no-kvmclock in
>>> guest kernel or with -kvm-clock in qemu) ?
>>
>> No, I haven't tried it yet (I've confirmed kvm-clock is currently
>> being used). I'll have a look at it.
>>
>> Did it help your issue?
> 
> My results were inconclusive, but there way a guy two months ago who
> had the same problem and disabling kvm-clock resolved this for him.
> 
> I wonder if it'll help you as well.

It's looking good so far, after a few migrations (it takes a while to test
because I'm waiting at least 5 hours between migrations). I'll be happier
once I've done a couple of weeks of this without any failures!

Thanks for your suggestion.

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-04-02  5:41       ` Chris Dunlop
@ 2014-04-02  8:45         ` Marcin Gibuła
  2014-04-02  9:04           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-04-02  8:45 UTC (permalink / raw)
  To: qemu-devel

> It's looking good so far, after a few migrations (it takes a while to test
> because I'm waiting at least 5 hours between migrations). I'll be happier
> once I've done a couple of weeks of this without any failures!

Does anyone have any hints how to debug this thing? :(

I've tried to put hanged guest under gdb and found it's looped deep 
inside kernel time management functions. Disabling kvmclock suggests it 
is somehow related to its corruption during migration. It happens on 
both old and new versions of guest kernels.

Any hints from developers are welcome:)

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-04-02  8:45         ` Marcin Gibuła
@ 2014-04-02  9:04           ` Dr. David Alan Gilbert
  2014-04-02  9:30             ` Marcin Gibuła
  0 siblings, 1 reply; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2014-04-02  9:04 UTC (permalink / raw)
  To: Marcin Gibu??a; +Cc: qemu-devel

* Marcin Gibu??a (m.gibula@beyond.pl) wrote:
> >It's looking good so far, after a few migrations (it takes a while to test
> >because I'm waiting at least 5 hours between migrations). I'll be happier
> >once I've done a couple of weeks of this without any failures!
> 
> Does anyone have any hints how to debug this thing? :(
> 
> I've tried to put hanged guest under gdb and found it's looped deep
> inside kernel time management functions. Disabling kvmclock suggests
> it is somehow related to its corruption during migration. It happens
> on both old and new versions of guest kernels.
> 
> Any hints from developers are welcome:)

Can you give:
  1) A backtrace from the guest
     thread apply all bt full
     in gdb
  2) What's the earliest/newest qemu versions you've seen this on?
  3) What guest OS are you running?
  4) What host OS are you running?
  5) What CPU are you running on?
  6) What does your qemu command line look like?
  7) How exactly are you migrating?
  8) You talk about having to wait a few hours to trigger it - do
     you have a more exact description of a test?
  9) Is there any output from qemu stderr/stdout in your qemu logs?

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-04-02  9:04           ` Dr. David Alan Gilbert
@ 2014-04-02  9:30             ` Marcin Gibuła
  2014-04-02  9:39               ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 11+ messages in thread
From: Marcin Gibuła @ 2014-04-02  9:30 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: qemu-devel

> Can you give:
>    1) A backtrace from the guest
>       thread apply all bt full
>       in gdb

You mean from gdb attached to hanged guest? I'll try to get it. From 
what I remember it looks rather "normal" - busy executing guest code.

>    2) What's the earliest/newest qemu versions you've seen this on?

1.4 - 1.6
Don't know about earlier versions because I didn't use migration on 
them. Haven't tried 1.7 yet (I know about XBZRLE fixes, but it happened 
without it as well...).

>    3) What guest OS are you running?

All flavors of Centos, Ubuntu, Redhat, etc. Also Windows. But never seen 
a crash with Windows so far.

Seems that few people who also have this issue, reports success with 
kvmclock disabled (either in qemu or kernel command line).

>    4) What host OS are you running?

Distro is Gentoo based (with no crazy compiler options). I've been using 
kernel 3.4 - 3.10.

>    5) What CPU are you running on?

AMD Opteron(tm) Processor 6164 HE

>    6) What does your qemu command line look like?

Example VM:
/usr/bin/qemu-system-x86_64 -machine accel=kvm -name 
3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -S -machine 
pc-i440fx-1.5,accel=kvm,usb=off -cpu 
qemu64,+misalignsse,+abm,+lahf_lm,+rdtscp,+popcnt,+x2apic,-svm,+kvmclock 
-m 1024 -realtime mlock=on -smp 2,sockets=4,cores=12,threads=1 -uuid 
3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -smbios type=0,vendor=HAL 9000 
-smbios type=1,manufacturer=cloud -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/3b5e37ea-04be-4a6b-8d63-f1a5853f2138.monitor,server,nowait 
-mon chardev=charmonitor,id=monitor,mode=control -rtc 
base=utc,clock=vm,driftfix=slew -no-hpet -no-kvm-pit-reinjection 
-no-shutdown -boot menu=off -device 
piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive 
file=/dev/stor1c/2e7fd7aa-8588-47ed-a091-af2b81c9e935,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275 
-device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 
-drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device 
ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 
-netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:11:11:11:11,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device 
isa-serial,chardev=charserial0,id=serial0 -chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/f16x86_64.agent,server,nowait 
-device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1 
-device usb-tablet,id=input0 -vnc 0.0.0.0:4,password -vga cirrus -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox on

I've tried playing with different CPU model (Opteron_G3) and flags, it 
didn't make any difference.

>    7) How exactly are you migrating?

Via libvirt live migration. Seen it with and without XBZRLE enabled.

>    8) You talk about having to wait a few hours to trigger it - do
>       you have a more exact description of a test?

Yes, that's where it gets weird. I've never seen this on fresh VM. It 
needs to be idle for couple of hours at least. And even then it doesn't 
always hang.

>    9) Is there any output from qemu stderr/stdout in your qemu logs?

Nothing unusual. From QEMU point of view guest is up and running. Only 
its OS is hanged (but not panicked, there is no backtrace,  oops or BUG 
on its screen).

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-04-02  9:30             ` Marcin Gibuła
@ 2014-04-02  9:39               ` Dr. David Alan Gilbert
  2014-04-02 10:18                 ` Marcin Gibuła
  2014-04-02 17:05                 ` Marcin Gibuła
  0 siblings, 2 replies; 11+ messages in thread
From: Dr. David Alan Gilbert @ 2014-04-02  9:39 UTC (permalink / raw)
  To: Marcin Gibu??a; +Cc: qemu-devel

* Marcin Gibu??a (m.gibula@beyond.pl) wrote:
> >Can you give:
> >   1) A backtrace from the guest
> >      thread apply all bt full
> >      in gdb
> 
> You mean from gdb attached to hanged guest? I'll try to get it. From
> what I remember it looks rather "normal" - busy executing guest
> code.

yes; if you can send it a sysrq to trigger a backtrace it might also
be worth a try - I'm just trying to find what the guest is really doing
when it's apparentyly 'hung'.


> >   2) What's the earliest/newest qemu versions you've seen this on?
> 
> 1.4 - 1.6
> Don't know about earlier versions because I didn't use migration on
> them. Haven't tried 1.7 yet (I know about XBZRLE fixes, but it
> happened without it as well...).

If you were going to try one thing, I'd prefer you to try the head
of git, i.e.the very latest.

> >   3) What guest OS are you running?
> 
> All flavors of Centos, Ubuntu, Redhat, etc. Also Windows. But never
> seen a crash with Windows so far.
> 
> Seems that few people who also have this issue, reports success with
> kvmclock disabled (either in qemu or kernel command line).

OK.

> >   4) What host OS are you running?
> 
> Distro is Gentoo based (with no crazy compiler options). I've been
> using kernel 3.4 - 3.10.
> 
> >   5) What CPU are you running on?
> 
> AMD Opteron(tm) Processor 6164 HE
> 
> >   6) What does your qemu command line look like?
> 
> Example VM:
> /usr/bin/qemu-system-x86_64 -machine accel=kvm -name
> 3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -S -machine
> pc-i440fx-1.5,accel=kvm,usb=off -cpu qemu64,+misalignsse,+abm,+lahf_lm,+rdtscp,+popcnt,+x2apic,-svm,+kvmclock
> -m 1024 -realtime mlock=on -smp 2,sockets=4,cores=12,threads=1 -uuid
> 3b5e37ea-04be-4a6b-8d63-f1a5853f2138 -smbios type=0,vendor=HAL 9000
> -smbios type=1,manufacturer=cloud -no-user-config -nodefaults
> -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/3b5e37ea-04be-4a6b-8d63-f1a5853f2138.monitor,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc
> base=utc,clock=vm,driftfix=slew -no-hpet -no-kvm-pit-reinjection
> -no-shutdown -boot menu=off -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/dev/stor1c/2e7fd7aa-8588-47ed-a091-af2b81c9e935,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native,bps_rd=57671680,bps_wr=57671680,iops_rd=275,iops_wr=275
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2
> -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device
> ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1
> -netdev tap,fd=22,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:11:11:11:11,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/f16x86_64.agent,server,nowait
> -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.1
> -device usb-tablet,id=input0 -vnc 0.0.0.0:4,password -vga cirrus
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -sandbox
> on
> 
> I've tried playing with different CPU model (Opteron_G3) and flags,
> it didn't make any difference.
> 
> >   7) How exactly are you migrating?
> 
> Via libvirt live migration. Seen it with and without XBZRLE enabled.
> 
> >   8) You talk about having to wait a few hours to trigger it - do
> >      you have a more exact description of a test?
> 
> Yes, that's where it gets weird. I've never seen this on fresh VM.
> It needs to be idle for couple of hours at least. And even then it
> doesn't always hang.

So your OS is just sitting at a text console, running nothing special?
When you reboot after the migration what's the last thing you see
in the guests logs? Is there anything from after the migration?

> >   9) Is there any output from qemu stderr/stdout in your qemu logs?
> 
> Nothing unusual. From QEMU point of view guest is up and running.
> Only its OS is hanged (but not panicked, there is no backtrace,
> oops or BUG on its screen).

Dave
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-04-02  9:39               ` Dr. David Alan Gilbert
@ 2014-04-02 10:18                 ` Marcin Gibuła
  2014-04-02 17:05                 ` Marcin Gibuła
  1 sibling, 0 replies; 11+ messages in thread
From: Marcin Gibuła @ 2014-04-02 10:18 UTC (permalink / raw)
  To: qemu-devel

On 02.04.2014 11:39, Dr. David Alan Gilbert wrote:
> * Marcin Gibu??a (m.gibula@beyond.pl) wrote:
>>> Can you give:
>>>    1) A backtrace from the guest
>>>       thread apply all bt full
>>>       in gdb
>>
>> You mean from gdb attached to hanged guest? I'll try to get it. From
>> what I remember it looks rather "normal" - busy executing guest
>> code.
>
> yes; if you can send it a sysrq to trigger a backtrace it might also
> be worth a try - I'm just trying to find what the guest is really doing
> when it's apparentyly 'hung'.

IIRC VM doesn't respond to sysrq key sequence. It doesn't respond to 
anything actually but NMI. I tried to do inject-nmi. VMs kernel 
responded with timestamped message "Uhhuh. NMI received. Dazed and 
confused, but trying to continue". That timestamp never changes - its 
like time is frozen on VM.

I'll try to find my notes from this gdb session.

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] Unresponsive linux guest once migrated
  2014-04-02  9:39               ` Dr. David Alan Gilbert
  2014-04-02 10:18                 ` Marcin Gibuła
@ 2014-04-02 17:05                 ` Marcin Gibuła
  1 sibling, 0 replies; 11+ messages in thread
From: Marcin Gibuła @ 2014-04-02 17:05 UTC (permalink / raw)
  To: dgilbert; +Cc: qemu-devel

>> Yes, that's where it gets weird. I've never seen this on fresh VM.
>> It needs to be idle for couple of hours at least. And even then it
>> doesn't always hang.
>
> So your OS is just sitting at a text console, running nothing special?
> When you reboot after the migration what's the last thing you see
> in the guests logs? Is there anything from after the migration?

Yes, it's completely idle. After reboot there is nothing in logs. I've 
dumped memory of one of hanged test VMs and found kernel message buffer. 
The last entries were:

init: failsafe main process (659) killed by TERM signal
init: plymouth-upstart-bridge main process (651) killed by TERM signal

<migration goes here, guest hangs>

Clocksource tsc unstable (delta = 470666274 ns)

<inject-nmi to test>

Uhhuh. NMI received for unknown reason 30 on CPU 0.
Do you have a strange power saving mode enabled?I:
Dazed and confused, but trying to continue
Uhhuh. NMI received for unknown reason 20 on CPU 0.
Do you have a strange power saving mode enabled?I:
Dazed and confused, but trying to continue
<0>Dazed and confused, but trying to continue

I've tried to disassemble where VM kernel (3.8.something from Ubuntu) is 
spinning (using qemu-monitor, registers info and symbols from guest 
kernel) and it was loop inside __run_timers function from kernel/timer.c:

while (time_after_eq(jiffies, base->timer_jiffies)) {
   ...
}

However my disassembly and qemu debugging skills are limited, would it 
help if I dump memory of broken VM and send it you somehow?

-- 
mg

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-04-02 17:06 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-27 22:52 [Qemu-devel] Unresponsive linux guest once migrated Chris Dunlop
2014-03-27 23:29 ` Marcin Gibuła
2014-03-27 23:59   ` Chris Dunlop
2014-03-31  8:39     ` Marcin Gibuła
2014-04-02  5:41       ` Chris Dunlop
2014-04-02  8:45         ` Marcin Gibuła
2014-04-02  9:04           ` Dr. David Alan Gilbert
2014-04-02  9:30             ` Marcin Gibuła
2014-04-02  9:39               ` Dr. David Alan Gilbert
2014-04-02 10:18                 ` Marcin Gibuła
2014-04-02 17:05                 ` Marcin Gibuła

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).