From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34422) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b5Dth-00039j-QW for qemu-devel@nongnu.org; Tue, 24 May 2016 11:05:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b5Dta-0007MW-CG for qemu-devel@nongnu.org; Tue, 24 May 2016 11:05:28 -0400 Received: from e19.ny.us.ibm.com ([129.33.205.209]:46611) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b5Dta-0007MG-7U for qemu-devel@nongnu.org; Tue, 24 May 2016 11:05:22 -0400 Received: from localhost by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 24 May 2016 11:05:20 -0400 Reply-To: jjherne@linux.vnet.ibm.com References: <574351CE.8000605@linux.vnet.ibm.com> <20160524021244.GD14601@ad.usersys.redhat.com> From: "Jason J. Herne" Message-ID: <57446DA8.8030709@linux.vnet.ibm.com> Date: Tue, 24 May 2016 11:05:12 -0400 MIME-Version: 1.0 In-Reply-To: <20160524021244.GD14601@ad.usersys.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: jcody@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org, quintela@redhat.com On 05/23/2016 10:12 PM, Fam Zheng wrote: > On Mon, 05/23 14:54, Jason J. Herne wrote: >> Using libvirt to migrate a guest and one guest disk that is using iothreads >> causes Qemu to crash with the message: >> Co-routine re-entered recursively >> >> I've looked into this one a bit but I have not seen anything that >> immediately stands out. >> Here is what I have found: >> >> In qemu_coroutine_enter: >> if (co->caller) { >> fprintf(stderr, "Co-routine re-entered recursively\n"); >> abort(); >> } >> >> The value of co->caller is actually changing between the time "if >> (co->caller)" is evaluated and the time I print some debug statements >> directly under the existing fprintf. I confirmed this by saving the value in >> a local variable and printing both the new local variable and co->caller >> immediately after the existing fprintf. This would certainly indicate some >> kind of concurrency issue. However, it does not necessarily point to the >> reason we ended up inside this if statement because co->caller was not NULL >> before it was trashed. Perhaps it was trashed more than once then? I figured >> maybe the problem was with coroutine pools so I disabled them >> (--disable-coroutine-pool) and still hit the bug. > > Which coroutine backend are you using? > ucontext normally. I've also reproduced the problem with sigaltstack. >> >> The backtrace is not always identical. Here is one instance: >> (gdb) bt >> #0 0x000003ffa78be2c0 in raise () from /lib64/libc.so.6 >> #1 0x000003ffa78bfc26 in abort () from /lib64/libc.so.6 >> #2 0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0) >> at /root/kvmdev/qemu/util/qemu-coroutine.c:112 >> #3 0x000000008032246e in nbd_restart_write (opaque=0xa2d0cd40) at >> /root/kvmdev/qemu/block/nbd-client.c:114 >> #4 0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at >> /root/kvmdev/qemu/aio-posix.c:341 >> #5 0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at >> /root/kvmdev/qemu/aio-posix.c:479 >> #6 0x0000000080155aba in iothread_run (opaque=0xa2c90260) at >> /root/kvmdev/qemu/iothread.c:46 >> #7 0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0 >> #8 0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6 > > It may be worth looking at backtrace of all threads especially the monitor > thread (main thread). > Here is a complete backtrace: (gdb) thread apply all bt Thread #1 is main. Thread #13 is the crashing thread. Thread 50 (Thread 0x3fdeb1f1910 (LWP 29570)): #0 0x000003ff99c901fc in do_futex_wait () from /lib64/libpthread.so.0 #1 0x000003ff99c90302 in __new_sem_wait_slow () from /lib64/libpthread.so.0 #2 0x00000000804097a4 in qemu_sem_timedwait (sem=0x3ff8c000ce8, ms=10000) at /root/kvmdev/qemu/util/qemu-thread-posix.c:245 #3 0x00000000802a215e in worker_thread (opaque=0x3ff8c000c80) at /root/kvmdev/qemu/thread-pool.c:92 #4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6 Thread 49 (Thread 0x3fdb47ff910 (LWP 29569)): #0 0x000003ff99c901fc in do_futex_wait () from /lib64/libpthread.so.0 #1 0x000003ff99c90302 in __new_sem_wait_slow () from /lib64/libpthread.so.0 #2 0x00000000804097a4 in qemu_sem_timedwait (sem=0x9c54c8d8, ms=10000) at /root/kvmdev/qemu/util/qemu-thread-posix.c:245 #3 0x00000000802a215e in worker_thread (opaque=0x9c54c870) at /root/kvmdev/qemu/thread-pool.c:92 #4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6 Thread 15 (Thread 0x3ff999ff910 (LWP 29449)): #0 0x000003ff99b8841e in syscall () from /lib64/libc.so.6 #1 0x00000000804099c6 in futex_wait (ev=0x80ac597c , val=4294967295) at /root/kvmdev/qemu/util/qemu-thread-posix.c:292 #2 0x0000000080409c56 in qemu_event_wait (ev=0x80ac597c ) at /root/kvmdev/qemu/util/qemu-thread-posix.c:399 #3 0x00000000804271ec in call_rcu_thread (opaque=0x0) at /root/kvmdev/qemu/util/rcu.c:250 #4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6 Thread 14 (Thread 0x3ff991ff910 (LWP 29451)): #0 0x000003ff99b8841e in syscall () from /lib64/libc.so.6 #1 0x000003ff9a19e330 in g_cond_wait () from /lib64/libglib-2.0.so.0 #2 0x000000008039d936 in wait_for_trace_records_available () at /root/kvmdev/qemu/trace/simple.c:147 #3 0x000000008039d9c6 in writeout_thread (opaque=0x0) at /root/kvmdev/qemu/trace/simple.c:165 #4 0x000003ff9a17c4cc in g_thread_proxy () from /lib64/libglib-2.0.so.0 #5 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0 #6 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6 Thread 13 (Thread 0x3ff989ff910 (LWP 29452)): #0 0x000003ff99abe2c0 in raise () from /lib64/libc.so.6 #1 0x000003ff99abfc26 in abort () from /lib64/libc.so.6 #2 0x0000000080427d80 in qemu_coroutine_enter (co=0x9c5a4120, opaque=0x0) at /root/kvmdev/qemu/util/qemu-coroutine.c:112 #3 0x000000008032246e in nbd_restart_write (opaque=0x9c5897b0) at /root/kvmdev/qemu/block/nbd-client.c:114 #4 0x00000000802b3a1c in aio_dispatch (ctx=0x9c530770) at /root/kvmdev/qemu/aio-posix.c:341 #5 0x00000000802b4332 in aio_poll (ctx=0x9c530770, blocking=true) at /root/kvmdev/qemu/aio-posix.c:479 #6 0x0000000080155aba in iothread_run (opaque=0x9c530200) at /root/kvmdev/qemu/iothread.c:46 #7 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0 #8 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6 Thread 12 (Thread 0x3ff91b5c910 (LWP 29456)): #0 0x000003ff99c8d68a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000008040932e in qemu_cond_wait (cond=0x9c904690, mutex=0x8065aab0 ) at /root/kvmdev/qemu/util/qemu-thread-posix.c:123 #2 0x000000008005c1d6 in qemu_kvm_wait_io_event (cpu=0x9c8c8e80) at /root/kvmdev/qemu/cpus.c:1030 #3 0x000000008005c37a in qemu_kvm_cpu_thread_fn (arg=0x9c8c8e80) at /root/kvmdev/qemu/cpus.c:1069 #4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x3ff9a6f2a90 (LWP 29433)): #0 0x000003ff99c8d68a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000000008040932e in qemu_cond_wait (cond=0x9c530800, mutex=0x9c5307d0) at /root/kvmdev/qemu/util/qemu-thread-posix.c:123 #2 0x0000000080426a38 in rfifolock_lock (r=0x9c5307d0) at /root/kvmdev/qemu/util/rfifolock.c:59 #3 0x00000000802a1f72 in aio_context_acquire (ctx=0x9c530770) at /root/kvmdev/qemu/async.c:373 #4 0x00000000802b3f54 in aio_poll (ctx=0x9c530770, blocking=true) at /root/kvmdev/qemu/aio-posix.c:415 #5 0x000000008031e7ac in bdrv_flush (bs=0x9c59b5c0) at /root/kvmdev/qemu/block/io.c:2470 #6 0x00000000802a8e6e in bdrv_close (bs=0x9c59b5c0) at /root/kvmdev/qemu/block.c:2134 #7 0x00000000802a9966 in bdrv_delete (bs=0x9c59b5c0) at /root/kvmdev/qemu/block.c:2341 #8 0x00000000802ac7c6 in bdrv_unref (bs=0x9c59b5c0) at /root/kvmdev/qemu/block.c:3376 #9 0x0000000080315340 in mirror_exit (job=0x9c956ed0, opaque=0x9c9570d0) at /root/kvmdev/qemu/block/mirror.c:494 #10 0x00000000802afb52 in block_job_defer_to_main_loop_bh (opaque=0x9c90dc10) at /root/kvmdev/qemu/blockjob.c:476 #11 0x00000000802a10a8 in aio_bh_call (bh=0x9c9090a0) at /root/kvmdev/qemu/async.c:66 #12 0x00000000802a1206 in aio_bh_poll (ctx=0x9c51e6c0) at /root/kvmdev/qemu/async.c:94 #13 0x00000000802b389e in aio_dispatch (ctx=0x9c51e6c0) at /root/kvmdev/qemu/aio-posix.c:308 #14 0x00000000802a1854 in aio_ctx_dispatch (source=0x9c51e6c0, callback=0x0, user_data=0x0) at /root/kvmdev/qemu/async.c:233 #15 0x000003ff9a151c0a in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #16 0x00000000802b05ce in glib_pollfds_poll () at /root/kvmdev/qemu/main-loop.c:213 #17 0x00000000802b070a in os_host_main_loop_wait (timeout=0) at /root/kvmdev/qemu/main-loop.c:258 #18 0x00000000802b0816 in main_loop_wait (nonblocking=0) at /root/kvmdev/qemu/main-loop.c:506 #19 0x000000008016d434 in main_loop () at /root/kvmdev/qemu/vl.c:1934 #20 0x00000000801756b8 in main (argc=54, argv=0x3ffdcd7ee58, envp=0x3ffdcd7f010) at /root/kvmdev/qemu/vl.c:4656 -- -- Jason J. Herne (jjherne@linux.vnet.ibm.com)