From: "Marc-André Lureau" <marcandre.lureau@redhat.com>
To: "Daniel P. Berrangé" <berrange@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
"Armbruster, Markus" <armbru@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
Date: Mon, 30 Jul 2018 15:09:08 +0200 [thread overview]
Message-ID: <CAMxuvazdTPupFNVfCy_hO3Vqh+ZCWeYfroe4EbAMVwwg7T7jZQ@mail.gmail.com> (raw)
In-Reply-To: <20180730130526.GI1626@redhat.com>
Hi
On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote:
>> With a Spice port chardev, it is possible to reenter
>> monitor_qapi_event_queue() (when the client disconnects for
>> example). This will dead-lock on monitor_lock.
>>
>> Instead, use some TLS variables to check for recursion and queue the
>> events.
>
> I wonder if it would be clearer to just change
> qapi_event_send_spice_disconnected() so that it use g_idle_add() to
> send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong
> reason to sent that sychronously. This would avoid the recursion problem.
Yes, I seriously thought about that solution. But in the end, the bug
is in monitor code: it should be fixed there at some point. And it
could in theory happen with a different code path than Spice.
>>
>> Fixes:
>> (gdb) bt
>> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
>> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
>> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
>> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
>> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
>> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
>> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
>> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
>> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
>> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
>> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
>> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
>> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
>> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
>> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
>> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
>> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
>> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
>> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
>> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
>> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
>> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
>> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
>> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
>> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
>> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
>> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
>> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
>> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
>> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
>> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
>>
>> Note that error report is now moved to the first caller, which may
>> receive an error for a recursed event. This is probably fine (95% of
>> callers use &error_abort, the rest have NULL error and ignore it)
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index d8d8211ae4..d580c5a79c 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
>> * applying any rate limiting if required.
>> */
>> static void
>> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
>> {
>> MonitorQAPIEventConf *evconf;
>> MonitorQAPIEventState *evstate;
>> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> qemu_mutex_unlock(&monitor_lock);
>> }
>>
>> +static void
>> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> +{
>> + Error *local_err = NULL;
>> + /*
>> + * If the function recurse, monitor_lock will dead-lock.
>> + * Instead, queue pending events in TLS.
>> + * TODO: remove this, make it re-enter safe.
>> + */
>> + static __thread bool recurse;
>> + typedef struct MonitorQapiEvent {
>> + QAPIEvent event;
>> + QDict *qdict;
>> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
>> + } MonitorQapiEvent;
>> + MonitorQapiEvent *ev;
>> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
>> +
>> + if (!recurse) {
>> + QSIMPLEQ_INIT(&event_queue);
>> + }
>> +
>> + ev = g_new(MonitorQapiEvent, 1);
>> + ev->qdict = qobject_ref(qdict);
>> + ev->event = event;
>> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
>> + if (recurse) {
>> + return;
>> + }
>> +
>> + recurse = true;
>> +
>> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
>> + if (!local_err) {
>> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
>> + &local_err);
>> + }
>> + qobject_unref(ev->qdict);
>> + g_free(ev);
>> + }
>> +
>> + recurse = false;
>> +
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + }
>> +}
>> +
>> /*
>> * This function runs evconf->rate ns after sending a throttled
>> * event.
>> --
>> 2.18.0.232.gb7bd9486b0
>>
>>
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
next prev parent reply other threads:[~2018-07-30 13:09 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau
2018-07-30 13:05 ` Daniel P. Berrangé
2018-07-30 13:09 ` Marc-André Lureau [this message]
2018-07-30 13:17 ` Daniel P. Berrangé
2018-07-31 7:05 ` Markus Armbruster
2018-07-31 14:45 ` Marc-André Lureau
2018-07-31 15:15 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMxuvazdTPupFNVfCy_hO3Vqh+ZCWeYfroe4EbAMVwwg7T7jZQ@mail.gmail.com \
--to=marcandre.lureau@redhat.com \
--cc=armbru@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).