All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fam Zheng <famz@redhat.com>
To: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	stefanha@redhat.com, jcody@redhat.com, quintela@redhat.com
Subject: Re: [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads
Date: Tue, 24 May 2016 10:12:44 +0800	[thread overview]
Message-ID: <20160524021244.GD14601@ad.usersys.redhat.com> (raw)
In-Reply-To: <574351CE.8000605@linux.vnet.ibm.com>

On Mon, 05/23 14:54, Jason J. Herne wrote:
> Using libvirt to migrate a guest and one guest disk that is using iothreads
> causes Qemu to crash with the message:
> Co-routine re-entered recursively
> 
> I've looked into this one a bit but I have not seen anything that
> immediately stands out.
> Here is what I have found:
> 
> In qemu_coroutine_enter:
>     if (co->caller) {
>         fprintf(stderr, "Co-routine re-entered recursively\n");
>         abort();
>     }
> 
> The value of co->caller is actually changing between the time "if
> (co->caller)" is evaluated and the time I print some debug statements
> directly under the existing fprintf. I confirmed this by saving the value in
> a local variable and printing both the new local variable and co->caller
> immediately after the existing fprintf. This would certainly indicate some
> kind of concurrency issue. However, it does not necessarily point to the
> reason we ended up inside this if statement because co->caller was not NULL
> before it was trashed. Perhaps it was trashed more than once then? I figured
> maybe the problem was with coroutine pools so I disabled them
> (--disable-coroutine-pool) and still hit the bug.

Which coroutine backend are you using?

> 
> The backtrace is not always identical. Here is one instance:
> (gdb) bt
> #0  0x000003ffa78be2c0 in raise () from /lib64/libc.so.6
> #1  0x000003ffa78bfc26 in abort () from /lib64/libc.so.6
> #2  0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0)
> at /root/kvmdev/qemu/util/qemu-coroutine.c:112
> #3  0x000000008032246e in nbd_restart_write	 (opaque=0xa2d0cd40) at
> /root/kvmdev/qemu/block/nbd-client.c:114
> #4  0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at
> /root/kvmdev/qemu/aio-posix.c:341
> #5  0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at
> /root/kvmdev/qemu/aio-posix.c:479
> #6  0x0000000080155aba in iothread_run (opaque=0xa2c90260) at
> /root/kvmdev/qemu/iothread.c:46
> #7  0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0
> #8  0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6

It may be worth looking at backtrace of all threads especially the monitor
thread (main thread).

> 
> I've also noticed that co->entry sometimes (maybe always?) points to
> mirror_run. Though, given that co->caller changes unexpectedly I don't know
> if we can trust co->entry.
> 
> I do not see the bug when I perform the same migration without migrating the
> disk.
> I also do not see the bug when I remove the iothread from the guest.
> 
> I tested this scenario as far back as tag v2.4.0 and hit the bug every time.
> I was unable to test v2.3.0 due to unresolved guest hangs. I did, however,
> manage to get as far as this commit:
> 
> commit ca96ac44dcd290566090b2435bc828fded356ad9
> Author: Stefan Hajnoczi <stefanha@redhat.com>
> Date:   Tue Jul 28 18:34:09 2015 +0200
> AioContext: force event loop iteration using BH
> 
> This commit fixes a hang that my test scenario experiences. I was able to
> test even further back by cherry-picking ca96ac44 on top of the earlier
> commits but at this point I cannot be sure if the bug was introduced by
> ca96ac44 so I stopped.
> 
> I am willing to run tests or collect any info needed. I'll keep
> investigating but I won't turn down any help :).
> 
> Qemu command line as taken from Libvirt log:
> qemu-system-s390x
>     -name kvm1 -S -machine s390-ccw-virtio-2.6,accel=kvm,usb=off
>     -m 6144 -realtime mlock=off
>     -smp 1,sockets=1,cores=1,threads=1
>     -object iothread,id=iothread1
>     -uuid 3796d9f0-8555-4a1e-9d5c-fac56b8cbf56
>     -nographic -no-user-config -nodefaults
>     -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-kvm1/monitor.sock,server,nowait
>     -mon chardev=charmonitor,id=monitor,mode=control
>     -rtc base=utc -no-shutdown
>     -boot strict=on -kernel /data/vms/kvm1/kvm1-image
>     -initrd /data/vms/kvm1/kvm1-initrd -append 'hvc_iucv=8 TERM=dumb'
>     -drive file=/dev/disk/by-path/ccw-0.0.c22b,format=raw,if=none,id=drive-virtio-disk0,cache=none
>     -device virtio-blk-ccw,scsi=off,devno=fe.0.0000,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
>     -drive file=/data/vms/kvm1/kvm1.qcow,format=qcow2,if=none,id=drive-virtio-disk1,cache=none
>     -device virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0008,drive=drive-virtio-disk1,id=virtio-disk1
>     -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27
>     -device
> virtio-net-ccw,netdev=hostnet0,id=net0,mac=52:54:00:c9:86:2b,devno=fe.0.0001
>     -chardev pty,id=charconsole0 -device
> sclpconsole,chardev=charconsole0,id=console0
>     -device virtio-balloon-ccw,id=balloon0,devno=fe.0.0002 -msg timestamp=on
> 
> Libvirt migration command:
> virsh migrate --live --persistent --copy-storage-all --migrate-disks vdb
> kvm1 qemu+ssh://dev1/system
> 
> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 

  reply	other threads:[~2016-05-24  2:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-23 18:54 [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads Jason J. Herne
2016-05-24  2:12 ` Fam Zheng [this message]
2016-05-24 15:05   ` Jason J. Herne
2016-05-25  8:36     ` Fam Zheng
2016-06-06 18:55       ` Jason J. Herne
2016-06-07  2:44         ` Fam Zheng
2016-06-07 12:42           ` Jason J. Herne
2016-06-08 15:30             ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-08 16:03               ` Paolo Bonzini
2016-06-09  7:35                 ` Stefan Hajnoczi
2016-06-09  8:25                   ` Paolo Bonzini
2016-06-09  8:47                     ` Stefan Hajnoczi
2016-06-09  8:48                       ` Stefan Hajnoczi
2016-06-09 10:02                         ` Paolo Bonzini
2016-06-09 16:31 ` [Qemu-devel] " Stefan Hajnoczi
2016-06-09 18:19   ` Jason J. Herne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160524021244.GD14601@ad.usersys.redhat.com \
    --to=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=jjherne@linux.vnet.ibm.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.