qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Jason J. Herne" <jjherne@linux.vnet.ibm.com>
To: Fam Zheng <famz@redhat.com>
Cc: jcody@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, quintela@redhat.com
Subject: Re: [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads
Date: Tue, 24 May 2016 11:05:12 -0400	[thread overview]
Message-ID: <57446DA8.8030709@linux.vnet.ibm.com> (raw)
In-Reply-To: <20160524021244.GD14601@ad.usersys.redhat.com>

On 05/23/2016 10:12 PM, Fam Zheng wrote:
> On Mon, 05/23 14:54, Jason J. Herne wrote:
>> Using libvirt to migrate a guest and one guest disk that is using iothreads
>> causes Qemu to crash with the message:
>> Co-routine re-entered recursively
>>
>> I've looked into this one a bit but I have not seen anything that
>> immediately stands out.
>> Here is what I have found:
>>
>> In qemu_coroutine_enter:
>>      if (co->caller) {
>>          fprintf(stderr, "Co-routine re-entered recursively\n");
>>          abort();
>>      }
>>
>> The value of co->caller is actually changing between the time "if
>> (co->caller)" is evaluated and the time I print some debug statements
>> directly under the existing fprintf. I confirmed this by saving the value in
>> a local variable and printing both the new local variable and co->caller
>> immediately after the existing fprintf. This would certainly indicate some
>> kind of concurrency issue. However, it does not necessarily point to the
>> reason we ended up inside this if statement because co->caller was not NULL
>> before it was trashed. Perhaps it was trashed more than once then? I figured
>> maybe the problem was with coroutine pools so I disabled them
>> (--disable-coroutine-pool) and still hit the bug.
>
> Which coroutine backend are you using?
>

ucontext normally. I've also reproduced the problem with sigaltstack.

>>
>> The backtrace is not always identical. Here is one instance:
>> (gdb) bt
>> #0  0x000003ffa78be2c0 in raise () from /lib64/libc.so.6
>> #1  0x000003ffa78bfc26 in abort () from /lib64/libc.so.6
>> #2  0x0000000080427d80 in qemu_coroutine_enter (co=0xa2cf2b40, opaque=0x0)
>> at /root/kvmdev/qemu/util/qemu-coroutine.c:112
>> #3  0x000000008032246e in nbd_restart_write	 (opaque=0xa2d0cd40) at
>> /root/kvmdev/qemu/block/nbd-client.c:114
>> #4  0x00000000802b3a1c in aio_dispatch (ctx=0xa2c907a0) at
>> /root/kvmdev/qemu/aio-posix.c:341
>> #5  0x00000000802b4332 in aio_poll (ctx=0xa2c907a0, blocking=true) at
>> /root/kvmdev/qemu/aio-posix.c:479
>> #6  0x0000000080155aba in iothread_run (opaque=0xa2c90260) at
>> /root/kvmdev/qemu/iothread.c:46
>> #7  0x000003ffa7a87c2c in start_thread () from /lib64/libpthread.so.0
>> #8  0x000003ffa798ec9a in thread_start () from /lib64/libc.so.6
>
> It may be worth looking at backtrace of all threads especially the monitor
> thread (main thread).
>

Here is a complete backtrace: (gdb) thread apply all bt
Thread #1 is main.
Thread #13 is the crashing thread.

Thread 50 (Thread 0x3fdeb1f1910 (LWP 29570)):
#0  0x000003ff99c901fc in do_futex_wait () from /lib64/libpthread.so.0
#1  0x000003ff99c90302 in __new_sem_wait_slow () from /lib64/libpthread.so.0
#2  0x00000000804097a4 in qemu_sem_timedwait (sem=0x3ff8c000ce8, 
ms=10000) at /root/kvmdev/qemu/util/qemu-thread-posix.c:245
#3  0x00000000802a215e in worker_thread (opaque=0x3ff8c000c80) at 
/root/kvmdev/qemu/thread-pool.c:92
#4  0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6

Thread 49 (Thread 0x3fdb47ff910 (LWP 29569)):
#0  0x000003ff99c901fc in do_futex_wait () from /lib64/libpthread.so.0
#1  0x000003ff99c90302 in __new_sem_wait_slow () from /lib64/libpthread.so.0
#2  0x00000000804097a4 in qemu_sem_timedwait (sem=0x9c54c8d8, ms=10000) 
at /root/kvmdev/qemu/util/qemu-thread-posix.c:245
#3  0x00000000802a215e in worker_thread (opaque=0x9c54c870) at 
/root/kvmdev/qemu/thread-pool.c:92
#4  0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6

Thread 15 (Thread 0x3ff999ff910 (LWP 29449)):
#0  0x000003ff99b8841e in syscall () from /lib64/libc.so.6
#1  0x00000000804099c6 in futex_wait (ev=0x80ac597c 
<rcu_call_ready_event>, val=4294967295) at 
/root/kvmdev/qemu/util/qemu-thread-posix.c:292
#2  0x0000000080409c56 in qemu_event_wait (ev=0x80ac597c 
<rcu_call_ready_event>) at /root/kvmdev/qemu/util/qemu-thread-posix.c:399
#3  0x00000000804271ec in call_rcu_thread (opaque=0x0) at 
/root/kvmdev/qemu/util/rcu.c:250
#4  0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6

Thread 14 (Thread 0x3ff991ff910 (LWP 29451)):
#0  0x000003ff99b8841e in syscall () from /lib64/libc.so.6
#1  0x000003ff9a19e330 in g_cond_wait () from /lib64/libglib-2.0.so.0
#2  0x000000008039d936 in wait_for_trace_records_available () at 
/root/kvmdev/qemu/trace/simple.c:147
#3  0x000000008039d9c6 in writeout_thread (opaque=0x0) at 
/root/kvmdev/qemu/trace/simple.c:165
#4  0x000003ff9a17c4cc in g_thread_proxy () from /lib64/libglib-2.0.so.0
#5  0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#6  0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6

Thread 13 (Thread 0x3ff989ff910 (LWP 29452)):
#0  0x000003ff99abe2c0 in raise () from /lib64/libc.so.6
#1  0x000003ff99abfc26 in abort () from /lib64/libc.so.6
#2  0x0000000080427d80 in qemu_coroutine_enter (co=0x9c5a4120, 
opaque=0x0) at /root/kvmdev/qemu/util/qemu-coroutine.c:112
#3  0x000000008032246e in nbd_restart_write (opaque=0x9c5897b0) at 
/root/kvmdev/qemu/block/nbd-client.c:114
#4  0x00000000802b3a1c in aio_dispatch (ctx=0x9c530770) at 
/root/kvmdev/qemu/aio-posix.c:341
#5  0x00000000802b4332 in aio_poll (ctx=0x9c530770, blocking=true) at 
/root/kvmdev/qemu/aio-posix.c:479
#6  0x0000000080155aba in iothread_run (opaque=0x9c530200) at 
/root/kvmdev/qemu/iothread.c:46
#7  0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#8  0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6

Thread 12 (Thread 0x3ff91b5c910 (LWP 29456)):
#0  0x000003ff99c8d68a in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x000000008040932e in qemu_cond_wait (cond=0x9c904690, 
mutex=0x8065aab0 <qemu_global_mutex>) at 
/root/kvmdev/qemu/util/qemu-thread-posix.c:123
#2  0x000000008005c1d6 in qemu_kvm_wait_io_event (cpu=0x9c8c8e80) at 
/root/kvmdev/qemu/cpus.c:1030
#3  0x000000008005c37a in qemu_kvm_cpu_thread_fn (arg=0x9c8c8e80) at 
/root/kvmdev/qemu/cpus.c:1069
#4  0x000003ff99c87c2c in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff99b8ec9a in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x3ff9a6f2a90 (LWP 29433)):
#0  0x000003ff99c8d68a in pthread_cond_wait@@GLIBC_2.3.2 () from 
/lib64/libpthread.so.0
#1  0x000000008040932e in qemu_cond_wait (cond=0x9c530800, 
mutex=0x9c5307d0) at /root/kvmdev/qemu/util/qemu-thread-posix.c:123
#2  0x0000000080426a38 in rfifolock_lock (r=0x9c5307d0) at 
/root/kvmdev/qemu/util/rfifolock.c:59
#3  0x00000000802a1f72 in aio_context_acquire (ctx=0x9c530770) at 
/root/kvmdev/qemu/async.c:373
#4  0x00000000802b3f54 in aio_poll (ctx=0x9c530770, blocking=true) at 
/root/kvmdev/qemu/aio-posix.c:415
#5  0x000000008031e7ac in bdrv_flush (bs=0x9c59b5c0) at 
/root/kvmdev/qemu/block/io.c:2470
#6  0x00000000802a8e6e in bdrv_close (bs=0x9c59b5c0) at 
/root/kvmdev/qemu/block.c:2134
#7  0x00000000802a9966 in bdrv_delete (bs=0x9c59b5c0) at 
/root/kvmdev/qemu/block.c:2341
#8  0x00000000802ac7c6 in bdrv_unref (bs=0x9c59b5c0) at 
/root/kvmdev/qemu/block.c:3376
#9  0x0000000080315340 in mirror_exit (job=0x9c956ed0, 
opaque=0x9c9570d0) at /root/kvmdev/qemu/block/mirror.c:494
#10 0x00000000802afb52 in block_job_defer_to_main_loop_bh 
(opaque=0x9c90dc10) at /root/kvmdev/qemu/blockjob.c:476
#11 0x00000000802a10a8 in aio_bh_call (bh=0x9c9090a0) at 
/root/kvmdev/qemu/async.c:66
#12 0x00000000802a1206 in aio_bh_poll (ctx=0x9c51e6c0) at 
/root/kvmdev/qemu/async.c:94
#13 0x00000000802b389e in aio_dispatch (ctx=0x9c51e6c0) at 
/root/kvmdev/qemu/aio-posix.c:308
#14 0x00000000802a1854 in aio_ctx_dispatch (source=0x9c51e6c0, 
callback=0x0, user_data=0x0) at /root/kvmdev/qemu/async.c:233
#15 0x000003ff9a151c0a in g_main_context_dispatch () from 
/lib64/libglib-2.0.so.0
#16 0x00000000802b05ce in glib_pollfds_poll () at 
/root/kvmdev/qemu/main-loop.c:213
#17 0x00000000802b070a in os_host_main_loop_wait (timeout=0) at 
/root/kvmdev/qemu/main-loop.c:258
#18 0x00000000802b0816 in main_loop_wait (nonblocking=0) at 
/root/kvmdev/qemu/main-loop.c:506
#19 0x000000008016d434 in main_loop () at /root/kvmdev/qemu/vl.c:1934
#20 0x00000000801756b8 in main (argc=54, argv=0x3ffdcd7ee58, 
envp=0x3ffdcd7f010) at /root/kvmdev/qemu/vl.c:4656

-- 
-- Jason J. Herne (jjherne@linux.vnet.ibm.com)

  reply	other threads:[~2016-05-24 15:05 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-23 18:54 [Qemu-devel] coroutines: block: Co-routine re-entered recursively when migrating disk with iothreads Jason J. Herne
2016-05-24  2:12 ` Fam Zheng
2016-05-24 15:05   ` Jason J. Herne [this message]
2016-05-25  8:36     ` Fam Zheng
2016-06-06 18:55       ` Jason J. Herne
2016-06-07  2:44         ` Fam Zheng
2016-06-07 12:42           ` Jason J. Herne
2016-06-08 15:30             ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2016-06-08 16:03               ` Paolo Bonzini
2016-06-09  7:35                 ` Stefan Hajnoczi
2016-06-09  8:25                   ` Paolo Bonzini
2016-06-09  8:47                     ` Stefan Hajnoczi
2016-06-09  8:48                       ` Stefan Hajnoczi
2016-06-09 10:02                         ` Paolo Bonzini
2016-06-09 16:31 ` [Qemu-devel] " Stefan Hajnoczi
2016-06-09 18:19   ` Jason J. Herne

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57446DA8.8030709@linux.vnet.ibm.com \
    --to=jjherne@linux.vnet.ibm.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).