qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
	zhangchen.fnst@cn.fujitsu.com
Cc: qemu-devel@nongnu.org, weidong.huang@huawei.com
Subject: Re: [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread
Date: Fri, 5 May 2017 11:03:23 +0800	[thread overview]
Message-ID: <6af63243-5e20-8a43-9ba1-fce37323bf78@redhat.com> (raw)
In-Reply-To: <590A971B.1070900@huawei.com>



On 2017年05月04日 10:51, Hailiang Zhang wrote:
> Hi Jason,
>
> On 2017/4/25 19:33, Jason Wang wrote:
>>
>> On 2017年04月25日 17:59, Hailiang Zhang wrote:
>>> On 2017/4/25 16:41, Jason Wang wrote:
>>>> On 2017年04月24日 14:03, Hailiang Zhang wrote:
>>>>> On 2017/4/24 12:10, Jason Wang wrote:
>>>>>> On 2017年04月20日 15:46, zhanghailiang wrote:
>>>>>>> We call qemu_chr_fe_set_handlers() in colo-compare thread, it is 
>>>>>>> used
>>>>>>> to detach watched fd from default main context, so it has chance to
>>>>>>> handle the same watched fd with main thread concurrently, which 
>>>>>>> will
>>>>>>> trigger an error report:
>>>>>>> "qemu-char.c:918: io_watch_poll_finalize: Assertion `iwp->src ==
>>>>>>> ((void *)0)' failed."
>>>>>> Anyway to prevent fd from being handled by main thread before 
>>>>>> creating
>>>>>> colo thread? Using semaphore seems not elegant.
>>>>> So how about calling qemu_mutex_lock_iothread() before
>>>>> qemu_chr_fe_set_handlers() ?
>>>> Looks better, but I needs more information e.g how main thread can
>>>> touch it?
>>> Hmm, this happened quite occasionally, and we didn't catch the first
>>> place (backtrace)
>>> of removing fd from been watched, but  from the codes logic, we found
>>> there should
>>> be such possible cases:
>>> tcp_chr_write (Or tcp_chr_read/tcp_chr_sync_read/chr_disconnect)
>>>   ->tcp_chr_disconnect (Or char_socket_finalize)
>>>      ->tcp_chr_free_connection
>>>        -> remove_fd_in_watch(chr);
>>>
>>> Anyway, it needs the protection from been freed twice.
>>>
>>> Thanks,
>>> Hailiang
>> Still a little bit confused. The question is how could main thread still
>> call tcp_chr_write or other in the above case?
>
> Finally, we reproduced this bug (We use qemu 2.6), and got the follow 
> backtrace of this problem:
>
> (gdb) thread apply all bt
>
> Thread 7 (Thread 0x7f407a1ff700 (LWP 23144)):
> #0  0x00007f41037e0db5 in _int_malloc () from /usr/lib64/libc.so.6
> #1  0x00007f41037e3b96 in calloc () from /usr/lib64/libc.so.6
> #2  0x00007f41041ad4d7 in g_malloc0 () from /usr/lib64/libglib-2.0.so.0
> #3  0x00007f41041a5437 in g_source_new () from 
> /usr/lib64/libglib-2.0.so.0
> #4  0x00007f410a2cec9c in qio_channel_create_fd_watch 
> (ioc=ioc@entry=0x7f410d6238c0, fd=20, condition=condition@entry=
>     (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at 
> io/channel-watch.c:259
> #5  0x00007f410a2ced01 in qio_channel_create_socket_watch 
> (ioc=ioc@entry=0x7f410d6238c0, socket=<optimized out>,
>     condition=condition@entry=(G_IO_IN | G_IO_ERR | G_IO_HUP | 
> G_IO_NVAL)) at io/channel-watch.c:311
> #6  0x00007f410a2cbea7 in qio_channel_socket_create_watch 
> (ioc=0x7f410d6238c0, condition=(G_IO_IN | G_IO_ERR | G_IO_HUP | 
> G_IO_NVAL))
>     at io/channel-socket.c:732
> #7  0x00007f410a2c94d2 in qio_channel_create_watch 
> (ioc=0x7f410d6238c0, condition=condition@entry=
>     (G_IO_IN | G_IO_ERR | G_IO_HUP | G_IO_NVAL)) at io/channel.c:132
> #8  0x00007f410a003cd6 in io_watch_poll_prepare 
> (source=0x7f4070000d00, timeout_=<optimized out>) at qemu-char.c:883
> #9  0x00007f41041a72ed in g_main_context_prepare () from 
> /usr/lib64/libglib-2.0.so.0
> #10 0x00007f41041a7b7b in g_main_context_iterate.isra.24 () from 
> /usr/lib64/libglib-2.0.so.0
> #11 0x00007f41041a7fba in g_main_loop_run () from 
> /usr/lib64/libglib-2.0.so.0
> #12 0x00007f410a1e528f in colo_compare_thread (opaque=0x7f410d7d6800) 
> at net/colo-compare.c:651
> #13 0x00007f4103b2bdc5 in start_thread () from /usr/lib64/libpthread.so.0
> #14 0x00007f410385971d in clone () from /usr/lib64/libc.so.6 

It looks like we use main context which is wrong, maybe you can track 
io_add_watch_poll() and its caller get the reason.

Thanks

  reply	other threads:[~2017-05-05  3:03 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-20  7:46 [Qemu-devel] [PATCH 0/3] colo-compare: fix three bugs zhanghailiang
2017-04-20  7:46 ` [Qemu-devel] [PATCH 1/3] colo-compare: serialize compare thread's initialization with main thread zhanghailiang
2017-04-24  4:10   ` Jason Wang
2017-04-24  6:03     ` Hailiang Zhang
2017-04-25  8:41       ` Jason Wang
2017-04-25  9:59         ` Hailiang Zhang
2017-04-25 11:33           ` Jason Wang
2017-04-26  7:51             ` Hailiang Zhang
2017-05-04  2:51             ` Hailiang Zhang
2017-05-05  3:03               ` Jason Wang [this message]
2017-04-20  7:46 ` [Qemu-devel] [PATCH 2/3] colo-compare: Check main_loop value before call g_main_loop_quit zhanghailiang
2017-04-24  4:13   ` Jason Wang
2017-04-24  6:06     ` Hailiang Zhang
2017-04-20  7:46 ` [Qemu-devel] [PATCH 3/3] colo-compare: fix a memory leak zhanghailiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6af63243-5e20-8a43-9ba1-fce37323bf78@redhat.com \
    --to=jasowang@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=weidong.huang@huawei.com \
    --cc=zhang.zhanghailiang@huawei.com \
    --cc=zhangchen.fnst@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).