From: "Daniel P. Berrangé" <berrange@redhat.com>
To: "Marc-André Lureau" <marcandre.lureau@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
"Armbruster, Markus" <armbru@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
Date: Mon, 30 Jul 2018 14:17:02 +0100 [thread overview]
Message-ID: <20180730131702.GJ1626@redhat.com> (raw)
In-Reply-To: <CAMxuvazdTPupFNVfCy_hO3Vqh+ZCWeYfroe4EbAMVwwg7T7jZQ@mail.gmail.com>
On Mon, Jul 30, 2018 at 03:09:08PM +0200, Marc-André Lureau wrote:
> Hi
>
> On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote:
> >> With a Spice port chardev, it is possible to reenter
> >> monitor_qapi_event_queue() (when the client disconnects for
> >> example). This will dead-lock on monitor_lock.
> >>
> >> Instead, use some TLS variables to check for recursion and queue the
> >> events.
> >
> > I wonder if it would be clearer to just change
> > qapi_event_send_spice_disconnected() so that it use g_idle_add() to
> > send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong
> > reason to sent that sychronously. This would avoid the recursion problem.
>
> Yes, I seriously thought about that solution. But in the end, the bug
> is in monitor code: it should be fixed there at some point. And it
> could in theory happen with a different code path than Spice.
The question is what we consider the API contract to be for the chardevs
that are used by the monitor. eg is the chardev I/O function permitted to
emit monitor events. Given that chardevs are widely used for lots of
backends, I guess I would probably say chardevs should be allowed to
emit monitor events, which does put the burden on the monitor code to
handle recursion.
I wonder if we could still use g_idle_add() though but in a different
place. The monitor_qapi_event_queue() method already emits events
asynchronously if they are tagged as needing rate limiting. So instead
of directly calling monitor_qapi_event_emit() in the non-rate limited
branch, we could send that call via g_idle_add(), so that all events
are emitted asynchronously. This would avoid monitor_qapi_event_queue()
becoming a re-entrancy problem.
I think g_idle_add() calls are processed in the order in which they
are registered, so event orderin should still be preserved.
> >> Fixes:
> >> (gdb) bt
> >> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
> >> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
> >> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
> >> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
> >> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
> >> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
> >> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
> >> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
> >> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
> >> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
> >> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
> >> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
> >> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
> >> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
> >> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> >> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> >> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
> >> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
> >> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
> >> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
> >> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
> >> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
> >> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
> >> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
> >> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
> >> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
> >> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
> >> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
> >> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
> >> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
> >> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
> >>
> >> Note that error report is now moved to the first caller, which may
> >> receive an error for a recursed event. This is probably fine (95% of
> >> callers use &error_abort, the rest have NULL error and ignore it)
> >>
> >> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> >> ---
> >> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 50 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/monitor.c b/monitor.c
> >> index d8d8211ae4..d580c5a79c 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
> >> * applying any rate limiting if required.
> >> */
> >> static void
> >> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
> >> {
> >> MonitorQAPIEventConf *evconf;
> >> MonitorQAPIEventState *evstate;
> >> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> qemu_mutex_unlock(&monitor_lock);
> >> }
> >>
> >> +static void
> >> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> +{
> >> + Error *local_err = NULL;
> >> + /*
> >> + * If the function recurse, monitor_lock will dead-lock.
> >> + * Instead, queue pending events in TLS.
> >> + * TODO: remove this, make it re-enter safe.
> >> + */
> >> + static __thread bool recurse;
> >> + typedef struct MonitorQapiEvent {
> >> + QAPIEvent event;
> >> + QDict *qdict;
> >> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
> >> + } MonitorQapiEvent;
> >> + MonitorQapiEvent *ev;
> >> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
> >> +
> >> + if (!recurse) {
> >> + QSIMPLEQ_INIT(&event_queue);
> >> + }
> >> +
> >> + ev = g_new(MonitorQapiEvent, 1);
> >> + ev->qdict = qobject_ref(qdict);
> >> + ev->event = event;
> >> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
> >> + if (recurse) {
> >> + return;
> >> + }
> >> +
> >> + recurse = true;
> >> +
> >> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
> >> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
> >> + if (!local_err) {
> >> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
> >> + &local_err);
> >> + }
> >> + qobject_unref(ev->qdict);
> >> + g_free(ev);
> >> + }
> >> +
> >> + recurse = false;
> >> +
> >> + if (local_err) {
> >> + error_propagate(errp, local_err);
> >> + }
> >> +}
> >> +
> >> /*
> >> * This function runs evconf->rate ns after sending a throttled
> >> * event.
> >> --
> >> 2.18.0.232.gb7bd9486b0
> >>
> >>
> >
> > Regards,
> > Daniel
> > --
> > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org -o- https://fstop138.berrange.com :|
> > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2018-07-30 13:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau
2018-07-30 13:05 ` Daniel P. Berrangé
2018-07-30 13:09 ` Marc-André Lureau
2018-07-30 13:17 ` Daniel P. Berrangé [this message]
2018-07-31 7:05 ` Markus Armbruster
2018-07-31 14:45 ` Marc-André Lureau
2018-07-31 15:15 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180730131702.GJ1626@redhat.com \
--to=berrange@redhat.com \
--cc=armbru@redhat.com \
--cc=dgilbert@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.