From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
To: Fam Zheng <famz@redhat.com>
Cc: jcody@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org,
qemu-block@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads
Date: Tue, 24 May 2016 11:05:12 -0400 [thread overview]
Message-ID: <57446DA8.8030709@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160524021244.GD14601@ad.usersys.redhat.com>
On 05/23/2016 10:12 PM, Fam Zheng wrote:
> On Mon, 05/23 14:54, Jason J. Herne wrote:
>> Using libvirt to migrate a guest and one guest disk that is using iothreads
>> causes Qemu to crash with the message:
>> Co-routine re-entered recursively
>>
>> I've looked into this one a bit but I have not seen anything that
>> immediately stands out.
>> Here is what I have found:
>>
>> In qemu_coroutine_enter:
>> if (co->caller) {
>> fprintf(stderr, "Co-routine re-entered recursively\n");
>> abort();
>> }
>>
>> The value of co->caller is actually changing between the time "if
>> (co->caller)" is evaluated and the time I print some debug statements
>> directly under the existing fprintf. I confirmed this by saving the value in
>> a local variable and printing both the new local variable and co->caller
>> immediately after the existing fprintf. This would certainly indicate some
>> kind of concurrency issue. However, it does not necessarily point to the
>> reason we ended up inside this if statement because co->caller was not NULL
>> before it was trashed. Perhaps it was trashed more than once then? I figured
>> maybe the problem was with coroutine pools so I disabled them
>> (--disable-coroutine-pool) and still hit the bug.
>
> Which coroutine backend are you using?
>
ucontext normally. I've also reproduced the problem with sigaltstack.
>>
>> The backtrace is not always identical. Here is one instance:
>> (gdb) bt
>> #0 0x000003ffa78be2c0 in raise () from /lib64/libc.so.6
>> #1 0x000003ffa78bfc26 in abort () from /lib64/libc.so.6
>> #2 0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0)
>> at /root/kvmdev/qemu/util/qemu-coroutine.c:112
>> #3 0x000000008032246e in nbd_restart_write (opaque=0xa2d0cd40) at
>> /root/kvmdev/qemu/block/nbd-client.c:114
>> #4 0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at
>> /root/kvmdev/qemu/aio-posix.c:341
>> #5 0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at
>> /root/kvmdev/qemu/aio-posix.c:479
>> #6 0x0000000080155aba in iothread_run (opaque=0xa2c90260) at
>> /root/kvmdev/qemu/iothread.c:46
>> #7 0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0
>> #8 0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6
>
> It may be worth looking at backtrace of all threads especially the monitor
> thread (main thread).
>
Here is a complete backtrace: (gdb) thread apply all bt
Thread #1 is main.
Thread #13 is the crashing thread.
Thread 50 (Thread 0x3fdeb1f1910 (LWP 29570)):
#0 0x000003ff99c901fc in do_futex_wait () from /lib64/libpthread.so.0
#1 0x000003ff99c90302 in __new_sem_wait_slow () from /lib64/libpthread.so.0
#2 0x00000000804097a4 in qemu_sem_timedwait (sem=0x3ff8c000ce8,
ms=10000) at /root/kvmdev/qemu/util/qemu-thread-posix.c:245
#3 0x00000000802a215e in worker_thread (opaque=0x3ff8c000c80) at
/root/kvmdev/qemu/thread-pool.c:92
#4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6
Thread 49 (Thread 0x3fdb47ff910 (LWP 29569)):
#0 0x000003ff99c901fc in do_futex_wait () from /lib64/libpthread.so.0
#1 0x000003ff99c90302 in __new_sem_wait_slow () from /lib64/libpthread.so.0
#2 0x00000000804097a4 in qemu_sem_timedwait (sem=0x9c54c8d8, ms=10000)
at /root/kvmdev/qemu/util/qemu-thread-posix.c:245
#3 0x00000000802a215e in worker_thread (opaque=0x9c54c870) at
/root/kvmdev/qemu/thread-pool.c:92
#4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6
Thread 15 (Thread 0x3ff999ff910 (LWP 29449)):
#0 0x000003ff99b8841e in syscall () from /lib64/libc.so.6
#1 0x00000000804099c6 in futex_wait (ev=0x80ac597c
<rcu_call_ready_event>, val=4294967295) at
/root/kvmdev/qemu/util/qemu-thread-posix.c:292
#2 0x0000000080409c56 in qemu_event_wait (ev=0x80ac597c
<rcu_call_ready_event>) at /root/kvmdev/qemu/util/qemu-thread-posix.c:399
#3 0x00000000804271ec in call_rcu_thread (opaque=0x0) at
/root/kvmdev/qemu/util/rcu.c:250
#4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6
Thread 14 (Thread 0x3ff991ff910 (LWP 29451)):
#0 0x000003ff99b8841e in syscall () from /lib64/libc.so.6
#1 0x000003ff9a19e330 in g_cond_wait () from /lib64/libglib-2.0.so.0
#2 0x000000008039d936 in wait_for_trace_records_available () at
/root/kvmdev/qemu/trace/simple.c:147
#3 0x000000008039d9c6 in writeout_thread (opaque=0x0) at
/root/kvmdev/qemu/trace/simple.c:165
#4 0x000003ff9a17c4cc in g_thread_proxy () from /lib64/libglib-2.0.so.0
#5 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#6 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6
Thread 13 (Thread 0x3ff989ff910 (LWP 29452)):
#0 0x000003ff99abe2c0 in raise () from /lib64/libc.so.6
#1 0x000003ff99abfc26 in abort () from /lib64/libc.so.6
#2 0x0000000080427d80 in qemu_coroutine_enter (co=0x9c5a4120,
opaque=0x0) at /root/kvmdev/qemu/util/qemu-coroutine.c:112
#3 0x000000008032246e in nbd_restart_write (opaque=0x9c5897b0) at
/root/kvmdev/qemu/block/nbd-client.c:114
#4 0x00000000802b3a1c in aio_dispatch (ctx=0x9c530770) at
/root/kvmdev/qemu/aio-posix.c:341
#5 0x00000000802b4332 in aio_poll (ctx=0x9c530770, blocking=true) at
/root/kvmdev/qemu/aio-posix.c:479
#6 0x0000000080155aba in iothread_run (opaque=0x9c530200) at
/root/kvmdev/qemu/iothread.c:46
#7 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#8 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6
Thread 12 (Thread 0x3ff91b5c910 (LWP 29456)):
#0 0x000003ff99c8d68a in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x000000008040932e in qemu_cond_wait (cond=0x9c904690,
mutex=0x8065aab0 <qemu_global_mutex>) at
/root/kvmdev/qemu/util/qemu-thread-posix.c:123
#2 0x000000008005c1d6 in qemu_kvm_wait_io_event (cpu=0x9c8c8e80) at
/root/kvmdev/qemu/cpus.c:1030
#3 0x000000008005c37a in qemu_kvm_cpu_thread_fn (arg=0x9c8c8e80) at
/root/kvmdev/qemu/cpus.c:1069
#4 0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5 0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6
Thread 1 (Thread 0x3ff9a6f2a90 (LWP 29433)):
#0 0x000003ff99c8d68a in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x000000008040932e in qemu_cond_wait (cond=0x9c530800,
mutex=0x9c5307d0) at /root/kvmdev/qemu/util/qemu-thread-posix.c:123
#2 0x0000000080426a38 in rfifolock_lock (r=0x9c5307d0) at
/root/kvmdev/qemu/util/rfifolock.c:59
#3 0x00000000802a1f72 in aio_context_acquire (ctx=0x9c530770) at
/root/kvmdev/qemu/async.c:373
#4 0x00000000802b3f54 in aio_poll (ctx=0x9c530770, blocking=true) at
/root/kvmdev/qemu/aio-posix.c:415
#5 0x000000008031e7ac in bdrv_flush (bs=0x9c59b5c0) at
/root/kvmdev/qemu/block/io.c:2470
#6 0x00000000802a8e6e in bdrv_close (bs=0x9c59b5c0) at
/root/kvmdev/qemu/block.c:2134
#7 0x00000000802a9966 in bdrv_delete (bs=0x9c59b5c0) at
/root/kvmdev/qemu/block.c:2341
#8 0x00000000802ac7c6 in bdrv_unref (bs=0x9c59b5c0) at
/root/kvmdev/qemu/block.c:3376
#9 0x0000000080315340 in mirror_exit (job=0x9c956ed0,
opaque=0x9c9570d0) at /root/kvmdev/qemu/block/mirror.c:494
#10 0x00000000802afb52 in block_job_defer_to_main_loop_bh
(opaque=0x9c90dc10) at /root/kvmdev/qemu/blockjob.c:476
#11 0x00000000802a10a8 in aio_bh_call (bh=0x9c9090a0) at
/root/kvmdev/qemu/async.c:66
#12 0x00000000802a1206 in aio_bh_poll (ctx=0x9c51e6c0) at
/root/kvmdev/qemu/async.c:94
#13 0x00000000802b389e in aio_dispatch (ctx=0x9c51e6c0) at
/root/kvmdev/qemu/aio-posix.c:308
#14 0x00000000802a1854 in aio_ctx_dispatch (source=0x9c51e6c0,
callback=0x0, user_data=0x0) at /root/kvmdev/qemu/async.c:233
#15 0x000003ff9a151c0a in g_main_context_dispatch () from
/lib64/libglib-2.0.so.0
#16 0x00000000802b05ce in glib_pollfds_poll () at
/root/kvmdev/qemu/main-loop.c:213
#17 0x00000000802b070a in os_host_main_loop_wait (timeout=0) at
/root/kvmdev/qemu/main-loop.c:258
#18 0x00000000802b0816 in main_loop_wait (nonblocking=0) at
/root/kvmdev/qemu/main-loop.c:506
#19 0x000000008016d434 in main_loop () at /root/kvmdev/qemu/vl.c:1934
#20 0x00000000801756b8 in main (argc=54, argv=0x3ffdcd7ee58,
envp=0x3ffdcd7f010) at /root/kvmdev/qemu/vl.c:4656
--
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)
next prev parent reply other threads:[~2016-05-24 15:05 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-23 18:54 [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads Jason J. Herne
2016-05-24 2:12 ` Fam Zheng
2016-05-24 15:05 ` Jason J. Herne [this message]
2016-05-25 8:36 ` Fam Zheng
2016-06-06 18:55 ` Jason J. Herne
2016-06-07 2:44 ` Fam Zheng
2016-06-07 12:42 ` Jason J. Herne
2016-06-08 15:30 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-08 16:03 ` Paolo Bonzini
2016-06-09 7:35 ` Stefan Hajnoczi
2016-06-09 8:25 ` Paolo Bonzini
2016-06-09 8:47 ` Stefan Hajnoczi
2016-06-09 8:48 ` Stefan Hajnoczi
2016-06-09 10:02 ` Paolo Bonzini
2016-06-09 16:31 ` [Qemu-devel] " Stefan Hajnoczi
2016-06-09 18:19 ` Jason J. Herne
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57446DA8.8030709@linux.vnet.ibm.com \
--to=jjherne@linux.vnet.ibm.com \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).