qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] KVM guest gets aborted if blockcommit is called
@ 2015-08-25  6:02 Christian Rößner
  2015-08-26  8:08 ` Christian Rößner
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Rößner @ 2015-08-25  6:02 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7689 bytes --]

Hello,

I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)

I have reproducable problems with some code in qemu-coroutine.c:


void qemu_coroutine_enter(Coroutine *co, void *opaque)
{
    Coroutine *self = qemu_coroutine_self();
    CoroutineAction ret;

    trace_qemu_coroutine_enter(self, co, opaque);

    if (co->caller) {
        fprintf(stderr, "Co-routine re-entered recursively\n");
        abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
    }

Unfortunately a "normal" system administrator like me does not understand the error message. I have no idea what causes it nor how to prevent it. Or if this is just a bug ;-)

Original mail to qemu-discuss:
-------------------------------------------------------------------------

I spent now full five days to debug a major problem with backing up VMs. I run a HP ProLiant Server SE316M1-R2 aka DL160G6) with two Xeon L5520 and 48GB RAM tripple channel. On this server I do monitoring and Qemu/libvirt. I run 7 guests on this server, which runs with Gentoo Linux (hardened; Grsecurity patched kernel, PaX, no RBAC).

All guests use raw images as disks (also tested QED and QCOW2). The systems are all Gentoo and Ubuntu. All having qemu-guest-agent running.

app-emulation/libvirt-1.2.18-r1::gentoo was built with the following:
USE="caps fuse iscsi libvirtd lvm lxc macvtap nfs nls parted pcap qemu sasl systemd udev vepa -apparmor -audit -avahi -firewalld -glusterfs -numa -openvz -phyp -policykit -rbd (-selinux) -uml -virt-network -virtualbox (-wireshark-plugins) -xen"

app-emulation/qemu-2.4.0::gentoo was built with the following:
USE="aio caps curl fdt filecaps jpeg ncurses nls pin-upstream-blobs png python sasl seccomp spice ssh threads tls uuid vhost-net vnc xattr -accessibility -alsa -bluetooth -debug -glusterfs -gtk -gtk2 -infiniband -iscsi -lzo -nfs -numa -opengl -pulseaudio -rbd -sdl -sdl2 (-selinux) -smartcard -snappy -static -static-softmmu -static-user -systemtap -tci -test -usb -usbredir -vde -virtfs -vte -xen -xfs" PYTHON_TARGETS="python2_7" QEMU_SOFTMMU_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -cris -lm32 (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -moxie -or32 (-ppc) (-ppc64) -ppcemb -s390x -sh4 -sh4eb (-sparc) -sparc64 -unicore32 -xtensa -xtensaeb" QEMU_USER_TARGETS="i386 x86_64 -aarch64 (-alpha) (-arm) -armeb -cris (-m68k) -microblaze -microblazeel (-mips) -mips64 -mips64el -mipsel -mipsn32 -mipsn32el -or32 (-ppc) (-ppc64) -ppc64abi32 -s390x -sh4 -sh4eb (-sparc) -sparc32plus -sparc64 -unicore32"

I wrote a bash script hat shall backup all guests. It works like this:

1. Create external snapshot
2. Copy/rsync away the image
3. blockcommit snapshot
4. blockjob pivot
5. Copy/rsync away the XML description for the guest
6. Remove Snapshot file

I did some test running the script in a cron job. For this I found out that copying the image file takes round about 15 minutes. So I did a 30 minute cycle for the script.

4 or 5 cycles work perfectly. (1) and (2) are working and when it comes to blockcommit, the guest may (random) be aborted and the command fails to continue, because the guest is no longer running. Starting the guest again, I found two situations:

1. I can directly call blockjob … —pivot, because the last blockcommit that failed reached 100%, or
2. Run a blockjob abort action. Re-sync and pivot on command line and that might work.

Anyways, blockcommit is not stable here. I tested this on qemu-2.3.0 and 2.4.0

In the logs I only get this:

…
2015-08-24 18:38:13.077+0000: starting up libvirt version: 1.2.18, qemu version: 2.4.0
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING -S -machine pc-i440fx-2.1,accel=kvm,usb=off -cpu qemu64,+kvm_pv_eoi -m 4096 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid d86b82d5-153f-4dd9-aa66-d98c2e65db8c -no-user-config -nodefaults -device sga -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-shutdown -boot order=cd,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x8 -drive file=/var/lib/libvirt/images/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.img,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=34,id=hostnet0,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=54:52:00:27:ac:8d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.org.qemu.guest_agent.0,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc 127.0.0.1:7 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device i6300esb,id=watchdog0,bus=pci.0,addr=0x7 -watchdog-action reset -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -object rng-random,id=objrng0,filename=/dev/random -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 -msg timestamp=on
char device redirected to /dev/pts/8 (label charserial0)
Formatting '/var/backups/snapshots/backup-snapshot-mx.roessner-net.de <http://backup-snapshot-mx.roessner-net.de/>-TESTING.qcow2', fmt=qcow2 size=107374182400 backing_file='/var/lib/libvirt/images/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
Formatting '/var/backups/snapshots/backup-snapshot-mx.roessner-net.de <http://backup-snapshot-mx.roessner-net.de/>-TESTING.qcow2', fmt=qcow2 size=107374182400 backing_file='/var/lib/libvirt/images/mx.roessner-net.de <http://mx.roessner-net.de/>-TESTING.img' backing_fmt='raw' encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
Co-routine re-entered recursively
2015-08-24 19:43:17.700+0000: shutting down

I tried to find out what this error: "Co-routine re-entered recursively" means? I have no idea. I only know that is is in qemu-coroutine.c line 111. But what causes this error? What am I missing?

I checked a different linux kernel. Pur vanilla sources with NUMA-balancing on and off. Several Grsecurity-Kernels. Kernel makes no difference. Qemu version makes no difference. If I clean memory, I have round about 36GB of free memory. Storage is also ok, because it is a BBU driven P410i RAID-controller with RAID1+0 15k SAS disks. Even this server is 6 years old, it has enough power. So I don't think it is a resource or hardware problem. Anything else on the server runs perfectly without any issues.

So if you have any idea, what could cause these aborts, please let me know :-)

Only stuff I found on the web is that someone said that this co-routine code would be ugly and probably not thread save. No idea where I found this message. But could this be a threading problem?

Many, many thanks in advance

Christian

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5226 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-25  6:02 [Qemu-devel] KVM guest gets aborted if blockcommit is called Christian Rößner
@ 2015-08-26  8:08 ` Christian Rößner
  2015-08-26 13:25   ` Jeff Cody
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Rößner @ 2015-08-26  8:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 2973 bytes --]


> Am 25.08.2015 um 08:02 schrieb Christian Rößner <c@roessner.co>:
> 
> Hello,
> 
> I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
> 
> I have reproducable problems with some code in qemu-coroutine.c:
> 
> 
> void qemu_coroutine_enter(Coroutine *co, void *opaque)
> {
>    Coroutine *self = qemu_coroutine_self();
>    CoroutineAction ret;
> 
>    trace_qemu_coroutine_enter(self, co, opaque);
> 
>    if (co->caller) {
>        fprintf(stderr, "Co-routine re-entered recursively\n");
>        abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
>    }

Caught Co-routine SIGABRT while a blockcommit operation was running.

Recompiled with debugging symbols and I connected gdb to the process:

(gdb) bt
#0  0x00007f4b6e6ccb8e in raise () from /lib64/libc.so.6
#1  0x00007f4b6e6ce391 in abort () from /lib64/libc.so.6
#2  0x0000555a316a8c39 in qemu_coroutine_enter (co=0x555a34651a50, opaque=0x0)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:111
#3  0x0000555a316a8eda in qemu_co_queue_run_restart (co=co@entry=0x555a33d271b0)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine-lock.c:59
#4  0x0000555a316a8b53 in qemu_coroutine_enter (co=0x555a33d271b0, opaque=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:118
#5  0x0000555a316e3adf in bdrv_co_aio_rw_vector (bs=bs@entry=0x555a336a6be0,
    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
    nb_sectors=nb_sectors@entry=15360, flags=flags@entry=(unknown: 0),
    cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>, opaque=0x555a3367d2c0, is_write=is_write@entry=false)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:2142
#6  0x0000555a316e4b1e in bdrv_aio_readv (bs=bs@entry=0x555a336a6be0,
    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
    nb_sectors=nb_sectors@entry=15360, cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>,
    opaque=opaque@entry=0x555a3367d2c0)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:1744
#7  0x0000555a316e2ccf in mirror_iteration (s=0x555a34a0c250)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:302
#8  mirror_run (opaque=0x555a34a0c250)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:512
#9  0x0000555a316a9a5a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/coroutine-ucontext.c:80
#10 0x00007f4b6e6df4a0 in ?? () from /lib64/libc.so.6
#11 0x00007ffe67b71840 in ?? ()
#12 0x0000000000000000 in ?? ()
(gdb)

Please, could someone reply to me :-)

Thanks

Christian

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5226 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-26  8:08 ` Christian Rößner
@ 2015-08-26 13:25   ` Jeff Cody
  2015-08-26 14:53     ` Christian Rößner
  2015-08-27  9:26     ` Christian Rößner
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff Cody @ 2015-08-26 13:25 UTC (permalink / raw)
  To: Christian Rößner; +Cc: qemu-devel, qemu-discuss

On Wed, Aug 26, 2015 at 10:08:26AM +0200, Christian Rößner wrote:
> 
> > Am 25.08.2015 um 08:02 schrieb Christian Rößner <c@roessner.co>:
> > 
> > Hello,
> > 
> > I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
> > 
> > I have reproducable problems with some code in qemu-coroutine.c:
> > 
> > 
> > void qemu_coroutine_enter(Coroutine *co, void *opaque)
> > {
> >    Coroutine *self = qemu_coroutine_self();
> >    CoroutineAction ret;
> > 
> >    trace_qemu_coroutine_enter(self, co, opaque);
> > 
> >    if (co->caller) {
> >        fprintf(stderr, "Co-routine re-entered recursively\n");
> >        abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
> >    }
> 
> Caught Co-routine SIGABRT while a blockcommit operation was running.
> 
> Recompiled with debugging symbols and I connected gdb to the process:
> 
> (gdb) bt
> #0  0x00007f4b6e6ccb8e in raise () from /lib64/libc.so.6
> #1  0x00007f4b6e6ce391 in abort () from /lib64/libc.so.6
> #2  0x0000555a316a8c39 in qemu_coroutine_enter (co=0x555a34651a50, opaque=0x0)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:111
> #3  0x0000555a316a8eda in qemu_co_queue_run_restart (co=co@entry=0x555a33d271b0)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine-lock.c:59
> #4  0x0000555a316a8b53 in qemu_coroutine_enter (co=0x555a33d271b0, opaque=<optimized out>)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:118
> #5  0x0000555a316e3adf in bdrv_co_aio_rw_vector (bs=bs@entry=0x555a336a6be0,
>     sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>     nb_sectors=nb_sectors@entry=15360, flags=flags@entry=(unknown: 0),
>     cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>, opaque=0x555a3367d2c0, is_write=is_write@entry=false)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:2142
> #6  0x0000555a316e4b1e in bdrv_aio_readv (bs=bs@entry=0x555a336a6be0,
>     sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>     nb_sectors=nb_sectors@entry=15360, cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>,
>     opaque=opaque@entry=0x555a3367d2c0)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:1744
> #7  0x0000555a316e2ccf in mirror_iteration (s=0x555a34a0c250)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:302
> #8  mirror_run (opaque=0x555a34a0c250)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:512
> #9  0x0000555a316a9a5a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
>     at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/coroutine-ucontext.c:80
> #10 0x00007f4b6e6df4a0 in ?? () from /lib64/libc.so.6
> #11 0x00007ffe67b71840 in ?? ()
> #12 0x0000000000000000 in ?? ()
> (gdb)
> 
> Please, could someone reply to me :-)
> 
> Thanks
> 
> Christian

Hi Christian,

I think  you may be running into a bug that is fixed by a recent patch
(after v2.4.0): 

commit e424aff5f307227b1c2512bbb8ece891bb895cef
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Thu Aug 13 10:41:50 2015 +0200

    mirror: Fix coroutine reentrance


Could you retry with qemu.git/master, and see if that fixes the issue
you are seeing?


Thanks,
Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-26 13:25   ` Jeff Cody
@ 2015-08-26 14:53     ` Christian Rößner
  2015-08-27  9:26     ` Christian Rößner
  1 sibling, 0 replies; 9+ messages in thread
From: Christian Rößner @ 2015-08-26 14:53 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 3852 bytes --]


> Am 26.08.2015 um 15:25 schrieb Jeff Cody <jcody@redhat.com>:
> 
> On Wed, Aug 26, 2015 at 10:08:26AM +0200, Christian Rößner wrote:
>> 
>>> Am 25.08.2015 um 08:02 schrieb Christian Rößner <c@roessner.co>:
>>> 
>>> Hello,
>>> 
>>> I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
>>> 
>>> I have reproducable problems with some code in qemu-coroutine.c:
>>> 
>>> 
>>> void qemu_coroutine_enter(Coroutine *co, void *opaque)
>>> {
>>>   Coroutine *self = qemu_coroutine_self();
>>>   CoroutineAction ret;
>>> 
>>>   trace_qemu_coroutine_enter(self, co, opaque);
>>> 
>>>   if (co->caller) {
>>>       fprintf(stderr, "Co-routine re-entered recursively\n");
>>>       abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
>>>   }
>> 
>> Caught Co-routine SIGABRT while a blockcommit operation was running.
>> 
>> Recompiled with debugging symbols and I connected gdb to the process:
>> 
>> (gdb) bt
>> #0  0x00007f4b6e6ccb8e in raise () from /lib64/libc.so.6
>> #1  0x00007f4b6e6ce391 in abort () from /lib64/libc.so.6
>> #2  0x0000555a316a8c39 in qemu_coroutine_enter (co=0x555a34651a50, opaque=0x0)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:111
>> #3  0x0000555a316a8eda in qemu_co_queue_run_restart (co=co@entry=0x555a33d271b0)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine-lock.c:59
>> #4  0x0000555a316a8b53 in qemu_coroutine_enter (co=0x555a33d271b0, opaque=<optimized out>)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:118
>> #5  0x0000555a316e3adf in bdrv_co_aio_rw_vector (bs=bs@entry=0x555a336a6be0,
>>    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>>    nb_sectors=nb_sectors@entry=15360, flags=flags@entry=(unknown: 0),
>>    cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>, opaque=0x555a3367d2c0, is_write=is_write@entry=false)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:2142
>> #6  0x0000555a316e4b1e in bdrv_aio_readv (bs=bs@entry=0x555a336a6be0,
>>    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>>    nb_sectors=nb_sectors@entry=15360, cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>,
>>    opaque=opaque@entry=0x555a3367d2c0)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:1744
>> #7  0x0000555a316e2ccf in mirror_iteration (s=0x555a34a0c250)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:302
>> #8  mirror_run (opaque=0x555a34a0c250)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:512
>> #9  0x0000555a316a9a5a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/coroutine-ucontext.c:80
>> #10 0x00007f4b6e6df4a0 in ?? () from /lib64/libc.so.6
>> #11 0x00007ffe67b71840 in ?? ()
>> #12 0x0000000000000000 in ?? ()
>> (gdb)
>> 
>> Please, could someone reply to me :-)
>> 
>> Thanks
>> 
>> Christian
> 
> Hi Christian,
> 
> I think  you may be running into a bug that is fixed by a recent patch
> (after v2.4.0): 
> 
> commit e424aff5f307227b1c2512bbb8ece891bb895cef
> Author: Kevin Wolf <kwolf@redhat.com>
> Date:   Thu Aug 13 10:41:50 2015 +0200
> 
>    mirror: Fix coroutine reentrance
> 
> 
> Could you retry with qemu.git/master, and see if that fixes the issue
> you are seeing?

I just compiled the master branch. I start e endless loop with my backup script. I will give it 24h hours for testing, so I come back later. Many thanks for your feedback! :-)

Christian


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5226 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-26 13:25   ` Jeff Cody
  2015-08-26 14:53     ` Christian Rößner
@ 2015-08-27  9:26     ` Christian Rößner
  2015-08-27 12:34       ` Jeff Cody
  1 sibling, 1 reply; 9+ messages in thread
From: Christian Rößner @ 2015-08-27  9:26 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 3841 bytes --]


> Am 26.08.2015 um 15:25 schrieb Jeff Cody <jcody@redhat.com>:
> 
> On Wed, Aug 26, 2015 at 10:08:26AM +0200, Christian Rößner wrote:
>> 
>>> Am 25.08.2015 um 08:02 schrieb Christian Rößner <c@roessner.co>:
>>> 
>>> Hello,
>>> 
>>> I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
>>> 
>>> I have reproducable problems with some code in qemu-coroutine.c:
>>> 
>>> 
>>> void qemu_coroutine_enter(Coroutine *co, void *opaque)
>>> {
>>>   Coroutine *self = qemu_coroutine_self();
>>>   CoroutineAction ret;
>>> 
>>>   trace_qemu_coroutine_enter(self, co, opaque);
>>> 
>>>   if (co->caller) {
>>>       fprintf(stderr, "Co-routine re-entered recursively\n");
>>>       abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
>>>   }
>> 
>> Caught Co-routine SIGABRT while a blockcommit operation was running.
>> 
>> Recompiled with debugging symbols and I connected gdb to the process:
>> 
>> (gdb) bt
>> #0  0x00007f4b6e6ccb8e in raise () from /lib64/libc.so.6
>> #1  0x00007f4b6e6ce391 in abort () from /lib64/libc.so.6
>> #2  0x0000555a316a8c39 in qemu_coroutine_enter (co=0x555a34651a50, opaque=0x0)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:111
>> #3  0x0000555a316a8eda in qemu_co_queue_run_restart (co=co@entry=0x555a33d271b0)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine-lock.c:59
>> #4  0x0000555a316a8b53 in qemu_coroutine_enter (co=0x555a33d271b0, opaque=<optimized out>)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:118
>> #5  0x0000555a316e3adf in bdrv_co_aio_rw_vector (bs=bs@entry=0x555a336a6be0,
>>    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>>    nb_sectors=nb_sectors@entry=15360, flags=flags@entry=(unknown: 0),
>>    cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>, opaque=0x555a3367d2c0, is_write=is_write@entry=false)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:2142
>> #6  0x0000555a316e4b1e in bdrv_aio_readv (bs=bs@entry=0x555a336a6be0,
>>    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>>    nb_sectors=nb_sectors@entry=15360, cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>,
>>    opaque=opaque@entry=0x555a3367d2c0)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:1744
>> #7  0x0000555a316e2ccf in mirror_iteration (s=0x555a34a0c250)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:302
>> #8  mirror_run (opaque=0x555a34a0c250)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:512
>> #9  0x0000555a316a9a5a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
>>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/coroutine-ucontext.c:80
>> #10 0x00007f4b6e6df4a0 in ?? () from /lib64/libc.so.6
>> #11 0x00007ffe67b71840 in ?? ()
>> #12 0x0000000000000000 in ?? ()
>> (gdb)
>> 
>> Please, could someone reply to me :-)
>> 
>> Thanks
>> 
>> Christian
> 
> Hi Christian,
> 
> I think  you may be running into a bug that is fixed by a recent patch
> (after v2.4.0): 
> 
> commit e424aff5f307227b1c2512bbb8ece891bb895cef
> Author: Kevin Wolf <kwolf@redhat.com>
> Date:   Thu Aug 13 10:41:50 2015 +0200
> 
>    mirror: Fix coroutine reentrance
> 
> 
> Could you retry with qemu.git/master, and see if that fixes the issue
> you are seeing?

Until now, everything looks perfectly. No issues. Backup is running smoothly.

Thanks very much. If nothing changes until tonight, I am going to close the bug report.

Christian

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5226 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-27  9:26     ` Christian Rößner
@ 2015-08-27 12:34       ` Jeff Cody
  2015-08-27 20:01         ` Christian Rößner
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff Cody @ 2015-08-27 12:34 UTC (permalink / raw)
  To: Christian Rößner; +Cc: qemu-devel, qemu-discuss

On Thu, Aug 27, 2015 at 11:26:13AM +0200, Christian Rößner wrote:
> 
> > Am 26.08.2015 um 15:25 schrieb Jeff Cody <jcody@redhat.com>:
> > 
> > On Wed, Aug 26, 2015 at 10:08:26AM +0200, Christian Rößner wrote:
> >> 
> >>> Am 25.08.2015 um 08:02 schrieb Christian Rößner <c@roessner.co>:
> >>> 
> >>> Hello,
> >>> 
> >>> I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
> >>> 
> >>> I have reproducable problems with some code in qemu-coroutine.c:
> >>> 
> >>> 
> >>> void qemu_coroutine_enter(Coroutine *co, void *opaque)
> >>> {
> >>>   Coroutine *self = qemu_coroutine_self();
> >>>   CoroutineAction ret;
> >>> 
> >>>   trace_qemu_coroutine_enter(self, co, opaque);
> >>> 
> >>>   if (co->caller) {
> >>>       fprintf(stderr, "Co-routine re-entered recursively\n");
> >>>       abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
> >>>   }
> >> 
> >> Caught Co-routine SIGABRT while a blockcommit operation was running.
> >> 
> >> Recompiled with debugging symbols and I connected gdb to the process:
> >> 
> >> (gdb) bt
> >> #0  0x00007f4b6e6ccb8e in raise () from /lib64/libc.so.6
> >> #1  0x00007f4b6e6ce391 in abort () from /lib64/libc.so.6
> >> #2  0x0000555a316a8c39 in qemu_coroutine_enter (co=0x555a34651a50, opaque=0x0)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:111
> >> #3  0x0000555a316a8eda in qemu_co_queue_run_restart (co=co@entry=0x555a33d271b0)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine-lock.c:59
> >> #4  0x0000555a316a8b53 in qemu_coroutine_enter (co=0x555a33d271b0, opaque=<optimized out>)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:118
> >> #5  0x0000555a316e3adf in bdrv_co_aio_rw_vector (bs=bs@entry=0x555a336a6be0,
> >>    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
> >>    nb_sectors=nb_sectors@entry=15360, flags=flags@entry=(unknown: 0),
> >>    cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>, opaque=0x555a3367d2c0, is_write=is_write@entry=false)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:2142
> >> #6  0x0000555a316e4b1e in bdrv_aio_readv (bs=bs@entry=0x555a336a6be0,
> >>    sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
> >>    nb_sectors=nb_sectors@entry=15360, cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>,
> >>    opaque=opaque@entry=0x555a3367d2c0)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:1744
> >> #7  0x0000555a316e2ccf in mirror_iteration (s=0x555a34a0c250)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:302
> >> #8  mirror_run (opaque=0x555a34a0c250)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:512
> >> #9  0x0000555a316a9a5a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
> >>    at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/coroutine-ucontext.c:80
> >> #10 0x00007f4b6e6df4a0 in ?? () from /lib64/libc.so.6
> >> #11 0x00007ffe67b71840 in ?? ()
> >> #12 0x0000000000000000 in ?? ()
> >> (gdb)
> >> 
> >> Please, could someone reply to me :-)
> >> 
> >> Thanks
> >> 
> >> Christian
> > 
> > Hi Christian,
> > 
> > I think  you may be running into a bug that is fixed by a recent patch
> > (after v2.4.0): 
> > 
> > commit e424aff5f307227b1c2512bbb8ece891bb895cef
> > Author: Kevin Wolf <kwolf@redhat.com>
> > Date:   Thu Aug 13 10:41:50 2015 +0200
> > 
> >    mirror: Fix coroutine reentrance
> > 
> > 
> > Could you retry with qemu.git/master, and see if that fixes the issue
> > you are seeing?
> 
> Until now, everything looks perfectly. No issues. Backup is running smoothly.
> 
> Thanks very much. If nothing changes until tonight, I am going to close the bug report.
> 

Christian,

Great to hear, thanks for the follow-up.

-Jeff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-27 12:34       ` Jeff Cody
@ 2015-08-27 20:01         ` Christian Rößner
  2015-08-27 20:15           ` Eric Blake
  0 siblings, 1 reply; 9+ messages in thread
From: Christian Rößner @ 2015-08-27 20:01 UTC (permalink / raw)
  To: Jeff Cody; +Cc: qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 4667 bytes --]


> Am 27.08.2015 um 14:34 schrieb Jeff Cody <jcody@redhat.com>:
> 
> On Thu, Aug 27, 2015 at 11:26:13AM +0200, Christian Rößner wrote:
>> 
>>> Am 26.08.2015 um 15:25 schrieb Jeff Cody <jcody@redhat.com>:
>>> 
>>> On Wed, Aug 26, 2015 at 10:08:26AM +0200, Christian Rößner wrote:
>>>> 
>>>>> Am 25.08.2015 um 08:02 schrieb Christian Rößner <c@roessner.co>:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I wrote this mail to the qemu-discuss mailing list, but today I am unsure, if I chose the right list. So I copy and paste this mail here in hope someone can respond :-)
>>>>> 
>>>>> I have reproducable problems with some code in qemu-coroutine.c:
>>>>> 
>>>>> 
>>>>> void qemu_coroutine_enter(Coroutine *co, void *opaque)
>>>>> {
>>>>>  Coroutine *self = qemu_coroutine_self();
>>>>>  CoroutineAction ret;
>>>>> 
>>>>>  trace_qemu_coroutine_enter(self, co, opaque);
>>>>> 
>>>>>  if (co->caller) {
>>>>>      fprintf(stderr, "Co-routine re-entered recursively\n");
>>>>>      abort();   <————————— This one triggers 4 or 5 out of ten tests to use the blockcommit feature
>>>>>  }
>>>> 
>>>> Caught Co-routine SIGABRT while a blockcommit operation was running.
>>>> 
>>>> Recompiled with debugging symbols and I connected gdb to the process:
>>>> 
>>>> (gdb) bt
>>>> #0  0x00007f4b6e6ccb8e in raise () from /lib64/libc.so.6
>>>> #1  0x00007f4b6e6ce391 in abort () from /lib64/libc.so.6
>>>> #2  0x0000555a316a8c39 in qemu_coroutine_enter (co=0x555a34651a50, opaque=0x0)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:111
>>>> #3  0x0000555a316a8eda in qemu_co_queue_run_restart (co=co@entry=0x555a33d271b0)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine-lock.c:59
>>>> #4  0x0000555a316a8b53 in qemu_coroutine_enter (co=0x555a33d271b0, opaque=<optimized out>)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/qemu-coroutine.c:118
>>>> #5  0x0000555a316e3adf in bdrv_co_aio_rw_vector (bs=bs@entry=0x555a336a6be0,
>>>>   sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>>>>   nb_sectors=nb_sectors@entry=15360, flags=flags@entry=(unknown: 0),
>>>>   cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>, opaque=0x555a3367d2c0, is_write=is_write@entry=false)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:2142
>>>> #6  0x0000555a316e4b1e in bdrv_aio_readv (bs=bs@entry=0x555a336a6be0,
>>>>   sector_num=sector_num@entry=113551488, qiov=qiov@entry=0x555a3367d2c8,
>>>>   nb_sectors=nb_sectors@entry=15360, cb=cb@entry=0x555a316e1fe0 <mirror_read_complete>,
>>>>   opaque=opaque@entry=0x555a3367d2c0)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/io.c:1744
>>>> #7  0x0000555a316e2ccf in mirror_iteration (s=0x555a34a0c250)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:302
>>>> #8  mirror_run (opaque=0x555a34a0c250)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/block/mirror.c:512
>>>> #9  0x0000555a316a9a5a in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>)
>>>>   at /var/tmp/portage/app-emulation/qemu-2.4.0/work/qemu-2.4.0/coroutine-ucontext.c:80
>>>> #10 0x00007f4b6e6df4a0 in ?? () from /lib64/libc.so.6
>>>> #11 0x00007ffe67b71840 in ?? ()
>>>> #12 0x0000000000000000 in ?? ()
>>>> (gdb)
>>>> 
>>>> Please, could someone reply to me :-)
>>>> 
>>>> Thanks
>>>> 
>>>> Christian
>>> 
>>> Hi Christian,
>>> 
>>> I think  you may be running into a bug that is fixed by a recent patch
>>> (after v2.4.0): 
>>> 
>>> commit e424aff5f307227b1c2512bbb8ece891bb895cef
>>> Author: Kevin Wolf <kwolf@redhat.com>
>>> Date:   Thu Aug 13 10:41:50 2015 +0200
>>> 
>>>   mirror: Fix coroutine reentrance
>>> 
>>> 
>>> Could you retry with qemu.git/master, and see if that fixes the issue
>>> you are seeing?
>> 
>> Until now, everything looks perfectly. No issues. Backup is running smoothly.
>> 
>> Thanks very much. If nothing changes until tonight, I am going to close the bug report.
>> 
> 
> Christian,
> 
> Great to hear, thanks for the follow-up.

Just a final result:

As I use libvirt with qemu, I used the blockcommit feature with libvirt. Running blockcommit directly with --wait --active --pivot, it may happen that the pivot fails, because the blockcommit wasn't already ready to pivot. But I do not know, if this is libvirt related or qemu. Splitting it into blockcommit and blockjob, everything works like a charm. So I have a working solution now. Thanks very much for your help and feedback.

Christian


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5226 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-27 20:01         ` Christian Rößner
@ 2015-08-27 20:15           ` Eric Blake
  2015-08-27 20:22             ` Christian Rößner
  0 siblings, 1 reply; 9+ messages in thread
From: Eric Blake @ 2015-08-27 20:15 UTC (permalink / raw)
  To: Christian Rößner, Jeff Cody; +Cc: qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 1701 bytes --]

On 08/27/2015 02:01 PM, Christian Rößner wrote:

>>>>
>>>> I think  you may be running into a bug that is fixed by a recent patch
>>>> (after v2.4.0): 
>>>>
>>>> commit e424aff5f307227b1c2512bbb8ece891bb895cef
>>>> Author: Kevin Wolf <kwolf@redhat.com>
>>>> Date:   Thu Aug 13 10:41:50 2015 +0200
>>>>
>>>>   mirror: Fix coroutine reentrance
>>>>
>>>>
>>>> Could you retry with qemu.git/master, and see if that fixes the issue
>>>> you are seeing?
>>>
>>> Until now, everything looks perfectly. No issues. Backup is running smoothly.
>>>
>>> Thanks very much. If nothing changes until tonight, I am going to close the bug report.
>>>
>>
>> Christian,
>>
>> Great to hear, thanks for the follow-up.
> 
> Just a final result:
> 
> As I use libvirt with qemu, I used the blockcommit feature with libvirt. Running blockcommit directly with --wait --active --pivot, it may happen that the pivot fails, because the blockcommit wasn't already ready to pivot. But I do not know, if this is libvirt related or qemu. Splitting it into blockcommit and blockjob, everything works like a charm. So I have a working solution now. Thanks very much for your help and feedback.

Which version of libvirt? There was a bug up until 1.2.18 where older
libvirt got thrown off by newer qemu returning a status with 0 progress
but equal to the block job size, and libvirt interpreting it as job
complete with a result of failing the command; splitting the job into
distinct parts was indeed the right workaround as it prevented hitting
the window.  See libvirt commit eae5924

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Qemu-devel] KVM guest gets aborted if blockcommit is called
  2015-08-27 20:15           ` Eric Blake
@ 2015-08-27 20:22             ` Christian Rößner
  0 siblings, 0 replies; 9+ messages in thread
From: Christian Rößner @ 2015-08-27 20:22 UTC (permalink / raw)
  To: Eric Blake; +Cc: Jeff Cody, qemu-devel, qemu-discuss

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]


> Am 27.08.2015 um 22:15 schrieb Eric Blake <eblake@redhat.com>:
> 
> On 08/27/2015 02:01 PM, Christian Rößner wrote:
> 
>>>>> 
>>>>> I think  you may be running into a bug that is fixed by a recent patch
>>>>> (after v2.4.0): 
>>>>> 
>>>>> commit e424aff5f307227b1c2512bbb8ece891bb895cef
>>>>> Author: Kevin Wolf <kwolf@redhat.com>
>>>>> Date:   Thu Aug 13 10:41:50 2015 +0200
>>>>> 
>>>>>  mirror: Fix coroutine reentrance
>>>>> 
>>>>> 
>>>>> Could you retry with qemu.git/master, and see if that fixes the issue
>>>>> you are seeing?
>>>> 
>>>> Until now, everything looks perfectly. No issues. Backup is running smoothly.
>>>> 
>>>> Thanks very much. If nothing changes until tonight, I am going to close the bug report.
>>>> 
>>> 
>>> Christian,
>>> 
>>> Great to hear, thanks for the follow-up.
>> 
>> Just a final result:
>> 
>> As I use libvirt with qemu, I used the blockcommit feature with libvirt. Running blockcommit directly with --wait --active --pivot, it may happen that the pivot fails, because the blockcommit wasn't already ready to pivot. But I do not know, if this is libvirt related or qemu. Splitting it into blockcommit and blockjob, everything works like a charm. So I have a working solution now. Thanks very much for your help and feedback.
> 
> Which version of libvirt? There was a bug up until 1.2.18 where older
> libvirt got thrown off by newer qemu returning a status with 0 progress
> but equal to the block job size, and libvirt interpreting it as job
> complete with a result of failing the command; splitting the job into
> distinct parts was indeed the right workaround as it prevented hitting
> the window.  See libvirt commit eae5924

As far as I can see, I use libvirt 1.2.18 with qemu master branch (yesterday ~3:30pm).

Christian

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5226 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-08-27 20:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-25  6:02 [Qemu-devel] KVM guest gets aborted if blockcommit is called Christian Rößner
2015-08-26  8:08 ` Christian Rößner
2015-08-26 13:25   ` Jeff Cody
2015-08-26 14:53     ` Christian Rößner
2015-08-27  9:26     ` Christian Rößner
2015-08-27 12:34       ` Jeff Cody
2015-08-27 20:01         ` Christian Rößner
2015-08-27 20:15           ` Eric Blake
2015-08-27 20:22             ` Christian Rößner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).