From: Peter Xu <peterx@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>,
"Stefan Hajnoczi" <shajnocz@redhat.com>,
"Fam Zheng" <famz@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
mdroth@linux.vnet.ibm.com, "Eric Blake" <eblake@redhat.com>,
"Laurent Vivier" <lvivier@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@gmail.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [RFC 01/15] char-io: fix possible race on IOWatchPoll
Date: Wed, 20 Sep 2017 19:18:49 +0800 [thread overview]
Message-ID: <20170920111849.GB30661@pxdev.xzpeter.org> (raw)
In-Reply-To: <20170920110309.GF4053@redhat.com>
On Wed, Sep 20, 2017 at 12:03:09PM +0100, Daniel P. Berrange wrote:
> On Wed, Sep 20, 2017 at 06:49:58PM +0800, Peter Xu wrote:
> > On Wed, Sep 20, 2017 at 10:14:38AM +0100, Daniel P. Berrange wrote:
> > > On Wed, Sep 20, 2017 at 05:09:26PM +0800, Peter Xu wrote:
> > > > On Wed, Sep 20, 2017 at 08:57:03AM +0100, Daniel P. Berrange wrote:
> > > > > On Thu, Sep 14, 2017 at 03:50:22PM +0800, Peter Xu wrote:
> > > > > > This is not a problem if we are only having one single loop thread like
> > > > > > before. However, after per-monitor thread is introduced, this is not
> > > > > > true any more, and the race can happen.
> > > > > >
> > > > > > The race can be triggered with "make check -j8" sometimes:
> > > > > >
> > > > > > qemu-system-x86_64: /root/git/qemu/chardev/char-io.c:91:
> > > > > > io_watch_poll_finalize: Assertion `iwp->src == NULL' failed.
> > > > > >
> > > > > > This patch keeps the reference for the watch object when creating in
> > > > > > io_add_watch_poll(), so that the object will never be released in the
> > > > > > context main loop, especially when the context loop is running in
> > > > > > another standalone thread. Meanwhile, when we want to remove the watch
> > > > > > object, we always first detach the watch object from its owner context,
> > > > > > then we continue with the cleanup.
> > > > > >
> > > > > > Without this patch, calling io_remove_watch_poll() in main loop thread
> > > > > > is not thread-safe, since the other per-monitor thread may be modifying
> > > > > > the watch object at the same time.
> > > > >
> > > > > This doesn't feel right to me. Why is the main loop thread doing anything
> > > > > at all with the Chardev, if there is a per-monitor thread ? The Chardev
> > > > > code isn't thread safe so it isn't safe to have two separate threads
> > > > > accessing the same Chardev. IOW, if we want a per-monitor thread, then
> > > > > we must make sure the main thread never touches that monitor's chardev
> > > > > at all. While your patch here might have avoided the assertion you
> > > > > mention above, I fear this is just papering over a fundamental problem
> > > > > that still exists, that can only be solved by not letting the mainloop
> > > > > touch the chardev at all.
> > > >
> > > > The stack I encountered:
> > > >
> > > > #0 0x00007f658234c765 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
> > > > #1 0x00007f658234e36a in __GI_abort () at abort.c:89
> > > > #2 0x00007f6582344f97 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x55c76345fce1 "iwp->src == NULL", file=file@entry=0x55c76345fcc0 "/root/git/qemu/chardev/char-io.c", line=line@entry=91, function=function@entry=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> "io_watch_poll_finalize") at assert.c:92
> > > > #3 0x00007f6582345042 in __GI___assert_fail (assertion=0x55c76345fce1 "iwp->src == NULL", file=0x55c76345fcc0 "/root/git/qemu/chardev/char-io.c", line=91, function=0x55c76345fd10 <__PRETTY_FUNCTION__.21863> "io_watch_poll_finalize") at assert.c:101
> > > > #4 0x000055c7632c2be5 in io_watch_poll_finalize (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:91
> > > > #5 0x00007f65847bb859 in g_source_unref_internal () at /lib64/libglib-2.0.so.0
> > > > #6 0x00007f65847bca29 in g_source_destroy_internal () at /lib64/libglib-2.0.so.0
> > > > #7 0x000055c7632c2d30 in io_remove_watch_poll (source=0x55c7651cd450) at /root/git/qemu/chardev/char-io.c:139
> > > > #8 0x000055c7632c2d5c in remove_fd_in_watch (chr=0x55c7651ccdf0) at /root/git/qemu/chardev/char-io.c:145
> > > > #9 0x000055c7632c2368 in qemu_chr_fe_set_handlers (b=0x55c7651f6410, fd_can_read=0x0, fd_read=0x0, fd_event=0x0, be_change=0x0, opaque=0x0, context=0x0, set_open=true)
> > > > at /root/git/qemu/chardev/char-fe.c:267
> > > > #10 0x000055c7632c2221 in qemu_chr_fe_deinit (b=0x55c7651f6410, del=false) at /root/git/qemu/chardev/char-fe.c:231
> > > > #11 0x000055c762e2b15c in monitor_data_destroy (mon=0x55c7651f6410) at /root/git/qemu/monitor.c:600
> > > > #12 0x000055c762e340ec in monitor_cleanup () at /root/git/qemu/monitor.c:4346
> > > > #13 0x000055c762f9445d in main (argc=19, argv=0x7ffc6846d0e8, envp=0x7ffc6846d188) at /root/git/qemu/vl.c:4889
> > > >
> > > > So it's destroying the CharBackend, but it'll then call
> > > > qemu_chr_fe_set_handlers() which finally tries to remove the watch poll.
> > >
> > > Ok that code is broken - it must not call monitor_cleanup from the main
> > > thread - it needs to be called from the monitor thread, unless it can
> > > guarantee that the monitor thread has already exited, which seems unlikely
> >
> > The problem is that not all monitors are parsed in the IO thread, but
> > only those with use_io_thr=true set.
> >
> > How about I move the calls of monitor_data_destroy() into that monitor
> > IO thread when use_io_thr=true? And for the rest, I think they still
> > need to be destroyed in the main thread.
>
> I think having the monitor sometimes run in the main thread and sometimes
> run in a background thread is a recipe for ongoing trouble, of which this
> problem is just the first example that will hurt us. People will test
> behaviour of a feature with one setup and then users will later run it in
> a different setup and potentially experiance obscure bugs as a result.
> IOW, use_io_thr flag should not exist, and every monitor should be run
> unconditionally in the background thread from the point at which your
> patch series merges.
I agree with you that this may bring trouble in some aspect. I just
don't know whether it'll bring more trouble if we move all the
monitor-related chardev IO into monitor thread.
The key is the muxed typed chardev.
If we don't have muxed typed chardev, I'll surely consider to use IO
thread for all the monitors.
However, the muxed chardevs can support e.g. one monitor plus a serial
port. Can we just run the IO stuff in monitor thread even part of its
frontend is a serial port? And also I'm not sure what would happen if
it's a monitor plus something else I even don't aware of.
Any nicer thoughts? Thanks,
--
Peter Xu
next prev parent reply other threads:[~2017-09-20 13:46 UTC|newest]
Thread overview: 77+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-14 7:50 [Qemu-devel] [RFC 00/15] QMP: out-of-band (OOB) execution support Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 01/15] char-io: fix possible race on IOWatchPoll Peter Xu
2017-09-19 19:59 ` Eric Blake
2017-09-20 4:44 ` Peter Xu
2017-09-20 7:57 ` Daniel P. Berrange
2017-09-20 9:09 ` Peter Xu
2017-09-20 9:14 ` Daniel P. Berrange
2017-09-20 10:49 ` Peter Xu
2017-09-20 11:03 ` Daniel P. Berrange
2017-09-20 11:18 ` Peter Xu [this message]
2017-09-20 11:29 ` Daniel P. Berrange
2017-09-21 3:45 ` Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 02/15] qobject: allow NULL for qstring_get_str() Peter Xu
2017-09-19 20:48 ` Eric Blake
2017-09-20 5:02 ` Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 03/15] qobject: introduce qobject_to_str() Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 04/15] monitor: move skip_flush into monitor_data_init Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 05/15] qjson: add "opaque" field to JSONMessageParser Peter Xu
2017-09-19 20:55 ` Eric Blake
2017-09-20 5:45 ` Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 06/15] monitor: move the cur_mon hack deeper for QMP Peter Xu
2017-09-19 21:05 ` Eric Blake
2017-09-20 5:54 ` Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 07/15] monitor: unify global init Peter Xu
2017-09-19 21:35 ` Eric Blake
2017-09-19 21:48 ` Eric Blake
2017-09-20 6:54 ` Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 08/15] monitor: create IO thread Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 09/15] monitor: allow to use IO thread for parsing Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 10/15] monitor: introduce monitor_qmp_respond() Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 11/15] monitor: separate QMP parser and dispatcher Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 12/15] monitor: enable IO thread for (qmp & !mux) typed Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 13/15] qapi: introduce new cmd option "allow-oob" Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 14/15] qmp: support out-of-band (oob) execution Peter Xu
2017-09-14 15:33 ` Stefan Hajnoczi
2017-09-15 2:59 ` Peter Xu
2017-09-15 18:34 ` Eric Blake
2017-09-18 7:36 ` Peter Xu
2017-09-15 15:55 ` Dr. David Alan Gilbert
2017-09-18 7:53 ` Peter Xu
2017-09-14 7:50 ` [Qemu-devel] [RFC 15/15] qmp: let migrate-incoming allow out-of-band Peter Xu
2017-09-15 16:09 ` Dr. David Alan Gilbert
2017-09-18 8:00 ` Peter Xu
2017-09-14 11:15 ` [Qemu-devel] [RFC 00/15] QMP: out-of-band (OOB) execution support Marc-André Lureau
2017-09-14 15:19 ` Stefan Hajnoczi
2017-09-15 3:50 ` Peter Xu
2017-09-15 10:49 ` Stefan Hajnoczi
2017-09-15 11:34 ` Daniel P. Berrange
2017-09-15 12:06 ` Dr. David Alan Gilbert
2017-09-15 12:14 ` Daniel P. Berrange
2017-09-15 12:19 ` Dr. David Alan Gilbert
2017-09-15 12:29 ` Daniel P. Berrange
2017-09-15 14:29 ` Dr. David Alan Gilbert
2017-09-15 14:32 ` Daniel P. Berrange
2017-09-15 14:56 ` Stefan Hajnoczi
2017-09-15 15:17 ` Dr. David Alan Gilbert
2017-09-18 9:26 ` Peter Xu
2017-09-18 10:40 ` Dr. David Alan Gilbert
2017-09-19 2:23 ` Peter Xu
2017-09-19 9:13 ` Dr. David Alan Gilbert
2017-09-19 9:22 ` Peter Xu
2017-09-14 18:53 ` Dr. David Alan Gilbert
2017-09-15 4:46 ` Peter Xu
2017-09-15 11:14 ` Marc-André Lureau
2017-09-18 8:37 ` Peter Xu
2017-09-18 10:20 ` Marc-André Lureau
2017-09-18 10:55 ` Dr. David Alan Gilbert
2017-09-18 11:13 ` Marc-André Lureau
2017-09-18 11:26 ` Dr. David Alan Gilbert
2017-09-18 16:09 ` Marc-André Lureau
2017-09-19 6:29 ` Peter Xu
2017-09-19 9:19 ` Dr. David Alan Gilbert
2017-09-20 4:37 ` Peter Xu
2017-09-19 18:49 ` Dr. David Alan Gilbert
2017-09-18 15:08 ` Eric Blake
2017-09-14 18:56 ` Dr. David Alan Gilbert
2017-09-15 3:58 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170920111849.GB30661@pxdev.xzpeter.org \
--to=peterx@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eblake@redhat.com \
--cc=famz@redhat.com \
--cc=lvivier@redhat.com \
--cc=marcandre.lureau@gmail.com \
--cc=mdroth@linux.vnet.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=shajnocz@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).