qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
@ 2011-10-24 10:00 Chris Webb
  2011-10-24 10:42 ` Kevin Wolf
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Webb @ 2011-10-24 10:00 UTC (permalink / raw)
  To: qemu-devel, kvm

I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has
stopped and refuses to continue:

  (qemu) info status
  VM status: paused
  (qemu) cont
  (qemu) info status
  VM status: paused

The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron
6176 box, and has nine other 2GB production guests on it running absolutely
fine.

It's been a while since I've seen one of these. When I last saw a cluster of
them, they were emulation failures (big real mode instructions, maybe?). I
also remember a message about abnormal exit in the dmesg previously, but I
don't have that here. This time, there is no host kernel output at all, just
the paused guest.

I have qemu monitor access and can even strace the relevant qemu process if
necessary: is it possible to use this to diagnose what's caused this guest
to stop, e.g. the unsupported instruction if it's an emulation failure?

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
  2011-10-24 10:00 [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?) Chris Webb
@ 2011-10-24 10:42 ` Kevin Wolf
  2011-10-24 10:58   ` Chris Webb
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2011-10-24 10:42 UTC (permalink / raw)
  To: Chris Webb; +Cc: qemu-devel, kvm

Am 24.10.2011 12:00, schrieb Chris Webb:
> I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has
> stopped and refuses to continue:
> 
>   (qemu) info status
>   VM status: paused
>   (qemu) cont
>   (qemu) info status
>   VM status: paused
> 
> The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron
> 6176 box, and has nine other 2GB production guests on it running absolutely
> fine.
> 
> It's been a while since I've seen one of these. When I last saw a cluster of
> them, they were emulation failures (big real mode instructions, maybe?). I
> also remember a message about abnormal exit in the dmesg previously, but I
> don't have that here. This time, there is no host kernel output at all, just
> the paused guest.
> 
> I have qemu monitor access and can even strace the relevant qemu process if
> necessary: is it possible to use this to diagnose what's caused this guest
> to stop, e.g. the unsupported instruction if it's an emulation failure?

Another common cause for stopped VMs are I/O errors, for example writes
to a sparse image when the disk is full.

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
  2011-10-24 10:42 ` Kevin Wolf
@ 2011-10-24 10:58   ` Chris Webb
  2011-10-24 11:18     ` Kevin Wolf
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Webb @ 2011-10-24 10:58 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm

Kevin Wolf <kwolf@redhat.com> writes:

> Am 24.10.2011 12:00, schrieb Chris Webb:
> > I have qemu monitor access and can even strace the relevant qemu process if
> > necessary: is it possible to use this to diagnose what's caused this guest
> > to stop, e.g. the unsupported instruction if it's an emulation failure?
> 
> Another common cause for stopped VMs are I/O errors, for example writes
> to a sparse image when the disk is full.

This guest are backed by LVM LVs so I don't think they can return EFULL, but I
could imagine read errors, so I've just done a trivial test to make sure I can
read them end-to-end:

  0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 of=/dev/null bs=1M
  3136+0 records in
  3136+0 records out
  3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s

  0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 of=/dev/null bs=1M
  276+0 records in
  276+0 records out
  289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s

Is there any way to ask qemu why a guest has stopped, so I can distinguish IO
problems from emulation problems from anything else?

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
  2011-10-24 10:58   ` Chris Webb
@ 2011-10-24 11:18     ` Kevin Wolf
  2011-10-24 11:29       ` Chris Webb
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2011-10-24 11:18 UTC (permalink / raw)
  To: Chris Webb; +Cc: qemu-devel, kvm

Am 24.10.2011 12:58, schrieb Chris Webb:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> Am 24.10.2011 12:00, schrieb Chris Webb:
>>> I have qemu monitor access and can even strace the relevant qemu process if
>>> necessary: is it possible to use this to diagnose what's caused this guest
>>> to stop, e.g. the unsupported instruction if it's an emulation failure?
>>
>> Another common cause for stopped VMs are I/O errors, for example writes
>> to a sparse image when the disk is full.
> 
> This guest are backed by LVM LVs so I don't think they can return EFULL, but I
> could imagine read errors, so I've just done a trivial test to make sure I can
> read them end-to-end:
> 
>   0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 of=/dev/null bs=1M
>   3136+0 records in
>   3136+0 records out
>   3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s
> 
>   0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 of=/dev/null bs=1M
>   276+0 records in
>   276+0 records out
>   289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s
> 
> Is there any way to ask qemu why a guest has stopped, so I can distinguish IO
> problems from emulation problems from anything else?

In qemu 1.0 we'll have an extended 'info status' that includes the stop
reason, but 0.14 doesn't have this yet (was committed to git master only
recently).

If you attach a QMP monitor (see QMP/README, don't forget to send the
capabilities command, it's part of creating the connection) you will
receive messages for I/O errors, though.

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
  2011-10-24 11:18     ` Kevin Wolf
@ 2011-10-24 11:29       ` Chris Webb
  2011-10-24 11:36         ` Kevin Wolf
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Webb @ 2011-10-24 11:29 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm

Kevin Wolf <kwolf@redhat.com> writes:

> In qemu 1.0 we'll have an extended 'info status' that includes the stop
> reason, but 0.14 doesn't have this yet (was committed to git master only
> recently).

Right, okay. I might take a look at cherry-picking and back-porting that to
our version of qemu-kvm if it's not too entangled with other changes. It
would be very useful in these situations.

> If you attach a QMP monitor (see QMP/README, don't forget to send the
> capabilities command, it's part of creating the connection) you will
> receive messages for I/O errors, though.

Thanks. I don't think I can do this with an already-running qemu-kvm that's
in a stopped state can I, only with a new qemu-kvm invocation and wait to
try to catch the problem again?

Cheers,

Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
  2011-10-24 11:29       ` Chris Webb
@ 2011-10-24 11:36         ` Kevin Wolf
  2011-10-24 12:05           ` Chris Webb
  0 siblings, 1 reply; 7+ messages in thread
From: Kevin Wolf @ 2011-10-24 11:36 UTC (permalink / raw)
  To: Chris Webb; +Cc: qemu-devel, kvm

Am 24.10.2011 13:29, schrieb Chris Webb:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> In qemu 1.0 we'll have an extended 'info status' that includes the stop
>> reason, but 0.14 doesn't have this yet (was committed to git master only
>> recently).
> 
> Right, okay. I might take a look at cherry-picking and back-porting that to
> our version of qemu-kvm if it's not too entangled with other changes. It
> would be very useful in these situations.

I'm afraid that it depends on many other changes, but you can try.

> 
>> If you attach a QMP monitor (see QMP/README, don't forget to send the
>> capabilities command, it's part of creating the connection) you will
>> receive messages for I/O errors, though.
> 
> Thanks. I don't think I can do this with an already-running qemu-kvm that's
> in a stopped state can I, only with a new qemu-kvm invocation and wait to
> try to catch the problem again?

Good point... The only other thing that I can think of would be
attaching gdb and setting a breakpoint in vm_stop() or something.

Kevin

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
  2011-10-24 11:36         ` Kevin Wolf
@ 2011-10-24 12:05           ` Chris Webb
  0 siblings, 0 replies; 7+ messages in thread
From: Chris Webb @ 2011-10-24 12:05 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, kvm

Kevin Wolf <kwolf@redhat.com> writes:

> Good point... The only other thing that I can think of would be
> attaching gdb and setting a breakpoint in vm_stop() or something.

Perfect, that seems to identified what's going on very nicely:

(gdb) break vm_stop
Breakpoint 1 at 0x407d10: file /home/root/packages/qemu-kvm/src-UMBurO/cpus.c, line 318.
(gdb) fg
Continuing.

Breakpoint 1, vm_stop (reason=0)
    at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318
318     /home/root/packages/qemu-kvm/src-UMBurO/cpus.c: No such file or directory.
        in /home/root/packages/qemu-kvm/src-UMBurO/cpus.c
(gdb) bt
#0  vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318
#1  0x000000000058585f in ide_handle_rw_error (s=0x20330d8, error=28, op=8)
    at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:468
#2  0x0000000000588376 in ide_dma_cb (opaque=0x20330d8, 
    ret=<value optimized out>)
    at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:494
#3  0x0000000000590092 in dma_bdrv_cb (opaque=0x2043a10, ret=-28)
    at /home/root/packages/qemu-kvm/src-UMBurO/dma-helpers.c:94
#4  0x000000000044d64a in qcow2_aio_write_cb (opaque=0x2034900, ret=-28)
    at block/qcow2.c:714
#5  0x000000000043df6d in posix_aio_process_queue (
    opaque=<value optimized out>) at posix-aio-compat.c:462
#6  0x000000000043e07d in posix_aio_read (opaque=0x17c8110)
    at posix-aio-compat.c:503
#7  0x0000000000415fca in main_loop_wait (nonblocking=<value optimized out>)
    at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1383
#8  0x000000000042ca37 in kvm_main_loop ()
    at /home/root/packages/qemu-kvm/src-UMBurO/qemu-kvm.c:1589
#9  0x00000000004170a3 in main (argc=32, argv=<value optimized out>, 
    envp=<value optimized out>)
    at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1429

I see what's happened here: we're not explicitly setting format=raw when we
start that guest and someone's uploaded a qcow2 image directly to a block
device. Ouch. Sorry for the noise!

Best wishes,

Chris.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-10-24 12:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-24 10:00 [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?) Chris Webb
2011-10-24 10:42 ` Kevin Wolf
2011-10-24 10:58   ` Chris Webb
2011-10-24 11:18     ` Kevin Wolf
2011-10-24 11:29       ` Chris Webb
2011-10-24 11:36         ` Kevin Wolf
2011-10-24 12:05           ` Chris Webb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).