From: Jason Wang <jasowang@redhat.com>
To: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
zhangchen.fnst@cn.fujitsu.com
Cc: qemu-devel@nongnu.org, weidong.huang@huawei.com
Subject: Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
Date: Fri, 5 May 2017 11:03:23 +0800 [thread overview]
Message-ID: <6af63243-5e20-8a43-9ba1-fce37323bf78@redhat.com> (raw)
In-Reply-To: <590A971B.1070900@huawei.com>
On 2017年05月04日 10:51, Hailiang Zhang wrote:
> Hi Jason,
>
> On 2017/4/25 19:33, Jason Wang wrote:
>>
>> On 2017年04月25日 17:59, Hailiang Zhang wrote:
>>> On 2017/4/25 16:41, Jason Wang wrote:
>>>> On 2017年04月24日 14:03, Hailiang Zhang wrote:
>>>>> On 2017/4/24 12:10, Jason Wang wrote:
>>>>>> On 2017年04月20日 15:46, zhanghailiang wrote:
>>>>>>> We call qemu_chr_fe_set_handlers() in colo-compare thread, it is
>>>>>>> used
>>>>>>> to detach watched fd from default main context, so it has chance to
>>>>>>> handle the same watched fd with main thread concurrently, which
>>>>>>> will
>>>>>>> trigger an error report:
>>>>>>> "qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src ==
>>>>>>> ((void *)0)' failed."
>>>>>> Anyway to prevent fd from being handled by main thread before
>>>>>> creating
>>>>>> colo thread? Using semaphore seems not elegant.
>>>>> So how about calling qemu_mutex_lock_iothread() before
>>>>> qemu_chr_fe_set_handlers() ?
>>>> Looks better, but I needs more information e.g how main thread can
>>>> touch it?
>>> Hmm, this happened quite occasionally, and we didn't catch the first
>>> place (backtrace)
>>> of removing fd from been watched, but from the codes logic, we found
>>> there should
>>> be such possible cases:
>>> tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect)
>>> ->tcp_chr_disconnect (Or char_socket_finalize)
>>> ->tcp_chr_free_connection
>>> -> remove_fd_in_watch(chr);
>>>
>>> Anyway, it needs the protection from been freed twice.
>>>
>>> Thanks,
>>> Hailiang
>> Still a little bit confused. The question is how could main thread still
>> call tcp_chr_write or other in the above case?
>
> Finally, we reproduced this bug (We use qemu 2.6), and got the follow
> backtrace of this problem:
>
> (gdb) thread apply all bt
>
> Thread 7 (Thread 0x7f407a1ff700 (LWP 23144)):
> #0 0x00007f41037e0db5 in _int_malloc () from /usr/lib64/libc.so.6
> #1 0x00007f41037e3b96 in calloc () from /usr/lib64/libc.so.6
> #2 0x00007f41041ad4d7 in g_malloc0 () from /usr/lib64/libglib-2.0.so.0
> #3 0x00007f41041a5437 in g_source_new () from
> /usr/lib64/libglib-2.0.so.0
> #4 0x00007f410a2cec9c in qio_channel_create_fd_watch
> (ioc=ioc@entry=0x7f410d6238c0, fd=20, condition=condition@entry=
> (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at
> io/channel-watch.c:259
> #5 0x00007f410a2ced01 in qio_channel_create_socket_watch
> (ioc=ioc@entry=0x7f410d6238c0, socket=<optimized out>,
> condition=condition@entry=(G_IO_IN | G_IO_ERR | G_IO_HUP |
> G_IO_NVAL)) at io/channel-watch.c:311
> #6 0x00007f410a2cbea7 in qio_channel_socket_create_watch
> (ioc=0x7f410d6238c0, condition=(G_IO_IN | G_IO_ERR | G_IO_HUP |
> G_IO_NVAL))
> at io/channel-socket.c:732
> #7 0x00007f410a2c94d2 in qio_channel_create_watch
> (ioc=0x7f410d6238c0, condition=condition@entry=
> (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel.c:132
> #8 0x00007f410a003cd6 in io_watch_poll_prepare
> (source=0x7f4070000d00, timeout_=<optimized out>) at qemu-char.c:883
> #9 0x00007f41041a72ed in g_main_context_prepare () from
> /usr/lib64/libglib-2.0.so.0
> #10 0x00007f41041a7b7b in g_main_context_iterate.isra.24 () from
> /usr/lib64/libglib-2.0.so.0
> #11 0x00007f41041a7fba in g_main_loop_run () from
> /usr/lib64/libglib-2.0.so.0
> #12 0x00007f410a1e528f in colo_compare_thread (opaque=0x7f410d7d6800)
> at net/colo-compare.c:651
> #13 0x00007f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0
> #14 0x00007f410385971d in clone () from /usr/lib64/libc.so.6
It looks like we use main context which is wrong, maybe you can track
io_add_watch_poll() and its caller get the reason.
Thanks
next prev parent reply other threads:[~2017-05-05 3:03 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-20 7:46 [Qemu-devel] [PATCH 0/3] colo-compare: fix three bugs zhanghailiang
2017-04-20 7:46 ` [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread zhanghailiang
2017-04-24 4:10 ` Jason Wang
2017-04-24 6:03 ` Hailiang Zhang
2017-04-25 8:41 ` Jason Wang
2017-04-25 9:59 ` Hailiang Zhang
2017-04-25 11:33 ` Jason Wang
2017-04-26 7:51 ` Hailiang Zhang
2017-05-04 2:51 ` Hailiang Zhang
2017-05-05 3:03 ` Jason Wang [this message]
2017-04-20 7:46 ` [Qemu-devel] [PATCH 2/3] colo-compare: Check main_loop value before call g_main_loop_quit zhanghailiang
2017-04-24 4:13 ` Jason Wang
2017-04-24 6:06 ` Hailiang Zhang
2017-04-20 7:46 ` [Qemu-devel] [PATCH 3/3] colo-compare: fix a memory leak zhanghailiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6af63243-5e20-8a43-9ba1-fce37323bf78@redhat.com \
--to=jasowang@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=weidong.huang@huawei.com \
--cc=zhang.zhanghailiang@huawei.com \
--cc=zhangchen.fnst@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).