* [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion @ 2018-07-30 12:57 Marc-André Lureau 2018-07-30 13:05 ` Daniel P. Berrangé 2018-07-31 7:05 ` Markus Armbruster 0 siblings, 2 replies; 7+ messages in thread From: Marc-André Lureau @ 2018-07-30 12:57 UTC (permalink / raw) To: qemu-devel; +Cc: armbru, Marc-André Lureau, Dr. David Alan Gilbert With a Spice port chardev, it is possible to reenter monitor_qapi_event_queue() (when the client disconnects for example). This will dead-lock on monitor_lock. Instead, use some TLS variables to check for recursion and queue the events. Fixes: (gdb) bt #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 Note that error report is now moved to the first caller, which may receive an error for a recursed event. This is probably fine (95% of callers use &error_abort, the rest have NULL error and ignore it) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> --- monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index d8d8211ae4..d580c5a79c 100644 --- a/monitor.c +++ b/monitor.c @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); * applying any rate limiting if required. */ static void -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) { MonitorQAPIEventConf *evconf; MonitorQAPIEventState *evstate; @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) qemu_mutex_unlock(&monitor_lock); } +static void +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) +{ + Error *local_err = NULL; + /* + * If the function recurse, monitor_lock will dead-lock. + * Instead, queue pending events in TLS. + * TODO: remove this, make it re-enter safe. + */ + static __thread bool recurse; + typedef struct MonitorQapiEvent { + QAPIEvent event; + QDict *qdict; + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; + } MonitorQapiEvent; + MonitorQapiEvent *ev; + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; + + if (!recurse) { + QSIMPLEQ_INIT(&event_queue); + } + + ev = g_new(MonitorQapiEvent, 1); + ev->qdict = qobject_ref(qdict); + ev->event = event; + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); + if (recurse) { + return; + } + + recurse = true; + + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); + if (!local_err) { + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, + &local_err); + } + qobject_unref(ev->qdict); + g_free(ev); + } + + recurse = false; + + if (local_err) { + error_propagate(errp, local_err); + } +} + /* * This function runs evconf->rate ns after sending a throttled * event. -- 2.18.0.232.gb7bd9486b0 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion 2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau @ 2018-07-30 13:05 ` Daniel P. Berrangé 2018-07-30 13:09 ` Marc-André Lureau 2018-07-31 7:05 ` Markus Armbruster 1 sibling, 1 reply; 7+ messages in thread From: Daniel P. Berrangé @ 2018-07-30 13:05 UTC (permalink / raw) To: Marc-André Lureau; +Cc: qemu-devel, armbru, Dr. David Alan Gilbert On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote: > With a Spice port chardev, it is possible to reenter > monitor_qapi_event_queue() (when the client disconnects for > example). This will dead-lock on monitor_lock. > > Instead, use some TLS variables to check for recursion and queue the > events. I wonder if it would be clearer to just change qapi_event_send_spice_disconnected() so that it use g_idle_add() to send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong reason to sent that sychronously. This would avoid the recursion problem. > > Fixes: > (gdb) bt > #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 > #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 > #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 > #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 > #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 > #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 > #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 > #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 > #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 > #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 > #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 > #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 > #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 > #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 > #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 > #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 > #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 > #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 > #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 > #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 > #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 > #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 > #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 > #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 > #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 > #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 > #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 > #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 > #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 > #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 > #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 > > Note that error report is now moved to the first caller, which may > receive an error for a recursed event. This is probably fine (95% of > callers use &error_abort, the rest have NULL error and ignore it) > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> > --- > monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 50 insertions(+), 1 deletion(-) > > diff --git a/monitor.c b/monitor.c > index d8d8211ae4..d580c5a79c 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); > * applying any rate limiting if required. > */ > static void > -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) > { > MonitorQAPIEventConf *evconf; > MonitorQAPIEventState *evstate; > @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > qemu_mutex_unlock(&monitor_lock); > } > > +static void > +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > +{ > + Error *local_err = NULL; > + /* > + * If the function recurse, monitor_lock will dead-lock. > + * Instead, queue pending events in TLS. > + * TODO: remove this, make it re-enter safe. > + */ > + static __thread bool recurse; > + typedef struct MonitorQapiEvent { > + QAPIEvent event; > + QDict *qdict; > + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; > + } MonitorQapiEvent; > + MonitorQapiEvent *ev; > + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; > + > + if (!recurse) { > + QSIMPLEQ_INIT(&event_queue); > + } > + > + ev = g_new(MonitorQapiEvent, 1); > + ev->qdict = qobject_ref(qdict); > + ev->event = event; > + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); > + if (recurse) { > + return; > + } > + > + recurse = true; > + > + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { > + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); > + if (!local_err) { > + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, > + &local_err); > + } > + qobject_unref(ev->qdict); > + g_free(ev); > + } > + > + recurse = false; > + > + if (local_err) { > + error_propagate(errp, local_err); > + } > +} > + > /* > * This function runs evconf->rate ns after sending a throttled > * event. > -- > 2.18.0.232.gb7bd9486b0 > > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion 2018-07-30 13:05 ` Daniel P. Berrangé @ 2018-07-30 13:09 ` Marc-André Lureau 2018-07-30 13:17 ` Daniel P. Berrangé 0 siblings, 1 reply; 7+ messages in thread From: Marc-André Lureau @ 2018-07-30 13:09 UTC (permalink / raw) To: Daniel P. Berrangé Cc: qemu-devel, Armbruster, Markus, Dr. David Alan Gilbert Hi On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <berrange@redhat.com> wrote: > On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote: >> With a Spice port chardev, it is possible to reenter >> monitor_qapi_event_queue() (when the client disconnects for >> example). This will dead-lock on monitor_lock. >> >> Instead, use some TLS variables to check for recursion and queue the >> events. > > I wonder if it would be clearer to just change > qapi_event_send_spice_disconnected() so that it use g_idle_add() to > send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong > reason to sent that sychronously. This would avoid the recursion problem. Yes, I seriously thought about that solution. But in the end, the bug is in monitor code: it should be fixed there at some point. And it could in theory happen with a different code path than Spice. >> >> Fixes: >> (gdb) bt >> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 >> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 >> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 >> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 >> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 >> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 >> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 >> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 >> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 >> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 >> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 >> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 >> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 >> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 >> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 >> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 >> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 >> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 >> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 >> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 >> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 >> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 >> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 >> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 >> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 >> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 >> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 >> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 >> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 >> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 >> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 >> >> Note that error report is now moved to the first caller, which may >> receive an error for a recursed event. This is probably fine (95% of >> callers use &error_abort, the rest have NULL error and ignore it) >> >> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> >> --- >> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 50 insertions(+), 1 deletion(-) >> >> diff --git a/monitor.c b/monitor.c >> index d8d8211ae4..d580c5a79c 100644 >> --- a/monitor.c >> +++ b/monitor.c >> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); >> * applying any rate limiting if required. >> */ >> static void >> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) >> { >> MonitorQAPIEventConf *evconf; >> MonitorQAPIEventState *evstate; >> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >> qemu_mutex_unlock(&monitor_lock); >> } >> >> +static void >> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >> +{ >> + Error *local_err = NULL; >> + /* >> + * If the function recurse, monitor_lock will dead-lock. >> + * Instead, queue pending events in TLS. >> + * TODO: remove this, make it re-enter safe. >> + */ >> + static __thread bool recurse; >> + typedef struct MonitorQapiEvent { >> + QAPIEvent event; >> + QDict *qdict; >> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; >> + } MonitorQapiEvent; >> + MonitorQapiEvent *ev; >> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; >> + >> + if (!recurse) { >> + QSIMPLEQ_INIT(&event_queue); >> + } >> + >> + ev = g_new(MonitorQapiEvent, 1); >> + ev->qdict = qobject_ref(qdict); >> + ev->event = event; >> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); >> + if (recurse) { >> + return; >> + } >> + >> + recurse = true; >> + >> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { >> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); >> + if (!local_err) { >> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, >> + &local_err); >> + } >> + qobject_unref(ev->qdict); >> + g_free(ev); >> + } >> + >> + recurse = false; >> + >> + if (local_err) { >> + error_propagate(errp, local_err); >> + } >> +} >> + >> /* >> * This function runs evconf->rate ns after sending a throttled >> * event. >> -- >> 2.18.0.232.gb7bd9486b0 >> >> > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion 2018-07-30 13:09 ` Marc-André Lureau @ 2018-07-30 13:17 ` Daniel P. Berrangé 0 siblings, 0 replies; 7+ messages in thread From: Daniel P. Berrangé @ 2018-07-30 13:17 UTC (permalink / raw) To: Marc-André Lureau Cc: qemu-devel, Armbruster, Markus, Dr. David Alan Gilbert On Mon, Jul 30, 2018 at 03:09:08PM +0200, Marc-André Lureau wrote: > Hi > > On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <berrange@redhat.com> wrote: > > On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote: > >> With a Spice port chardev, it is possible to reenter > >> monitor_qapi_event_queue() (when the client disconnects for > >> example). This will dead-lock on monitor_lock. > >> > >> Instead, use some TLS variables to check for recursion and queue the > >> events. > > > > I wonder if it would be clearer to just change > > qapi_event_send_spice_disconnected() so that it use g_idle_add() to > > send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong > > reason to sent that sychronously. This would avoid the recursion problem. > > Yes, I seriously thought about that solution. But in the end, the bug > is in monitor code: it should be fixed there at some point. And it > could in theory happen with a different code path than Spice. The question is what we consider the API contract to be for the chardevs that are used by the monitor. eg is the chardev I/O function permitted to emit monitor events. Given that chardevs are widely used for lots of backends, I guess I would probably say chardevs should be allowed to emit monitor events, which does put the burden on the monitor code to handle recursion. I wonder if we could still use g_idle_add() though but in a different place. The monitor_qapi_event_queue() method already emits events asynchronously if they are tagged as needing rate limiting. So instead of directly calling monitor_qapi_event_emit() in the non-rate limited branch, we could send that call via g_idle_add(), so that all events are emitted asynchronously. This would avoid monitor_qapi_event_queue() becoming a re-entrancy problem. I think g_idle_add() calls are processed in the order in which they are registered, so event orderin should still be preserved. > >> Fixes: > >> (gdb) bt > >> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0 > >> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0 > >> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66 > >> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645 > >> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149 > >> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235 > >> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316 > >> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197 > >> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197 > >> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414 > >> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388 > >> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347 > >> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0 > >> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341 > >> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 > >> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305 > >> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353 > >> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199 > >> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112 > >> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147 > >> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42 > >> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425 > >> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468 > >> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517 > >> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538 > >> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624 > >> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649 > >> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58 > >> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822 > >> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862 > >> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644 > >> > >> Note that error report is now moved to the first caller, which may > >> receive an error for a recursed event. This is probably fine (95% of > >> callers use &error_abort, the rest have NULL error and ignore it) > >> > >> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> > >> --- > >> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- > >> 1 file changed, 50 insertions(+), 1 deletion(-) > >> > >> diff --git a/monitor.c b/monitor.c > >> index d8d8211ae4..d580c5a79c 100644 > >> --- a/monitor.c > >> +++ b/monitor.c > >> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); > >> * applying any rate limiting if required. > >> */ > >> static void > >> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > >> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) > >> { > >> MonitorQAPIEventConf *evconf; > >> MonitorQAPIEventState *evstate; > >> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > >> qemu_mutex_unlock(&monitor_lock); > >> } > >> > >> +static void > >> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > >> +{ > >> + Error *local_err = NULL; > >> + /* > >> + * If the function recurse, monitor_lock will dead-lock. > >> + * Instead, queue pending events in TLS. > >> + * TODO: remove this, make it re-enter safe. > >> + */ > >> + static __thread bool recurse; > >> + typedef struct MonitorQapiEvent { > >> + QAPIEvent event; > >> + QDict *qdict; > >> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; > >> + } MonitorQapiEvent; > >> + MonitorQapiEvent *ev; > >> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; > >> + > >> + if (!recurse) { > >> + QSIMPLEQ_INIT(&event_queue); > >> + } > >> + > >> + ev = g_new(MonitorQapiEvent, 1); > >> + ev->qdict = qobject_ref(qdict); > >> + ev->event = event; > >> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); > >> + if (recurse) { > >> + return; > >> + } > >> + > >> + recurse = true; > >> + > >> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { > >> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); > >> + if (!local_err) { > >> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, > >> + &local_err); > >> + } > >> + qobject_unref(ev->qdict); > >> + g_free(ev); > >> + } > >> + > >> + recurse = false; > >> + > >> + if (local_err) { > >> + error_propagate(errp, local_err); > >> + } > >> +} > >> + > >> /* > >> * This function runs evconf->rate ns after sending a throttled > >> * event. > >> -- > >> 2.18.0.232.gb7bd9486b0 > >> > >> > > > > Regards, > > Daniel > > -- > > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > > |: https://libvirt.org -o- https://fstop138.berrange.com :| > > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion 2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau 2018-07-30 13:05 ` Daniel P. Berrangé @ 2018-07-31 7:05 ` Markus Armbruster 2018-07-31 14:45 ` Marc-André Lureau 1 sibling, 1 reply; 7+ messages in thread From: Markus Armbruster @ 2018-07-31 7:05 UTC (permalink / raw) To: Marc-André Lureau; +Cc: qemu-devel, Dr. David Alan Gilbert Marc-André Lureau <marcandre.lureau@redhat.com> writes: > With a Spice port chardev, it is possible to reenter > monitor_qapi_event_queue() (when the client disconnects for > example). This will dead-lock on monitor_lock. > > Instead, use some TLS variables to check for recursion and queue the > events. [...] > > Note that error report is now moved to the first caller, which may > receive an error for a recursed event. This is probably fine (95% of > callers use &error_abort, the rest have NULL error and ignore it) I'm not 100% sure I understand this paragraph, but it might be moot; see [*] below. > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> > --- > monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 50 insertions(+), 1 deletion(-) > > diff --git a/monitor.c b/monitor.c > index d8d8211ae4..d580c5a79c 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); > * applying any rate limiting if required. > */ > static void > -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) > { > MonitorQAPIEventConf *evconf; > MonitorQAPIEventState *evstate; > @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > qemu_mutex_unlock(&monitor_lock); > } > > +static void > +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) > +{ > + Error *local_err = NULL; > + /* > + * If the function recurse, monitor_lock will dead-lock. > + * Instead, queue pending events in TLS. recurses The claim is correct before the patch. But the comment needs to explain current code. Perhaps: * monitor_qapi_event_queue_no_recurse() is not reentrant: it * would deadlock on monitor_lock. Work around by queueing * events in thread-local storage. > + * TODO: remove this, make it re-enter safe. > + */ > + static __thread bool recurse; > + typedef struct MonitorQapiEvent { > + QAPIEvent event; > + QDict *qdict; > + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; > + } MonitorQapiEvent; > + MonitorQapiEvent *ev; > + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; Let's put the static variables first. > + > + if (!recurse) { > + QSIMPLEQ_INIT(&event_queue); > + } > + > + ev = g_new(MonitorQapiEvent, 1); > + ev->qdict = qobject_ref(qdict); > + ev->event = event; > + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); > + if (recurse) { > + return; > + } > + > + recurse = true; > + > + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { > + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); Could we use QSIMPLEQ_FOREACH_SAFE()? > + if (!local_err) { > + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, > + &local_err); > + } [*] This looks scary: we silently throw away events after event queuing fails. Fortunately, monitor_qapi_event_queue_no_recurse() cannot fail. It takes an Error ** parameter only so you can put it into @qmp_emit. Aside: I wish we'd get rid of the indirection through @qmp_emit. Let's pass &error_abort and drop @local_err. > + qobject_unref(ev->qdict); > + g_free(ev); > + } > + > + recurse = false; Aha: @recurse means we've reentered the function. "Recurse" is imperative mood. Misleading, as it's not an order to recurse. Rename to @recursed? @reentered? > + > + if (local_err) { > + error_propagate(errp, local_err); > + } > +} > + > /* > * This function runs evconf->rate ns after sending a throttled > * event. monitor_lock clearly needs a rethink. Your TODO comment suggests you agree. But that's something for 3.1. I hate messing with locks at -rc3, but I also hate shipping known deadlocks. Your patch isn't pretty, but probably as simple as we can make it for 3.0. A few more review eyeballs would be nice. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion 2018-07-31 7:05 ` Markus Armbruster @ 2018-07-31 14:45 ` Marc-André Lureau 2018-07-31 15:15 ` Markus Armbruster 0 siblings, 1 reply; 7+ messages in thread From: Marc-André Lureau @ 2018-07-31 14:45 UTC (permalink / raw) To: Markus Armbruster; +Cc: QEMU, Dr. David Alan Gilbert Hi On Tue, Jul 31, 2018 at 9:05 AM, Markus Armbruster <armbru@redhat.com> wrote: > Marc-André Lureau <marcandre.lureau@redhat.com> writes: > >> With a Spice port chardev, it is possible to reenter >> monitor_qapi_event_queue() (when the client disconnects for >> example). This will dead-lock on monitor_lock. >> >> Instead, use some TLS variables to check for recursion and queue the >> events. > [...] >> >> Note that error report is now moved to the first caller, which may >> receive an error for a recursed event. This is probably fine (95% of >> callers use &error_abort, the rest have NULL error and ignore it) > > I'm not 100% sure I understand this paragraph, but it might be moot; see > [*] below. > >> >> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> >> --- >> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 50 insertions(+), 1 deletion(-) >> >> diff --git a/monitor.c b/monitor.c >> index d8d8211ae4..d580c5a79c 100644 >> --- a/monitor.c >> +++ b/monitor.c >> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); >> * applying any rate limiting if required. >> */ >> static void >> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) >> { >> MonitorQAPIEventConf *evconf; >> MonitorQAPIEventState *evstate; >> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >> qemu_mutex_unlock(&monitor_lock); >> } >> >> +static void >> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >> +{ >> + Error *local_err = NULL; >> + /* >> + * If the function recurse, monitor_lock will dead-lock. >> + * Instead, queue pending events in TLS. > > recurses > > The claim is correct before the patch. But the comment needs to explain > current code. Perhaps: > > * monitor_qapi_event_queue_no_recurse() is not reentrant: it > * would deadlock on monitor_lock. Work around by queueing > * events in thread-local storage. ok > >> + * TODO: remove this, make it re-enter safe. >> + */ >> + static __thread bool recurse; >> + typedef struct MonitorQapiEvent { >> + QAPIEvent event; >> + QDict *qdict; >> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; >> + } MonitorQapiEvent; >> + MonitorQapiEvent *ev; >> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; > > Let's put the static variables first. ok >> + >> + if (!recurse) { >> + QSIMPLEQ_INIT(&event_queue); >> + } >> + >> + ev = g_new(MonitorQapiEvent, 1); >> + ev->qdict = qobject_ref(qdict); >> + ev->event = event; >> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); >> + if (recurse) { >> + return; >> + } >> + >> + recurse = true; >> + >> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { >> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); > > Could we use QSIMPLEQ_FOREACH_SAFE()? I don't think so, the next variable could be NULL, while a recursive call could be adding an element. > >> + if (!local_err) { >> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, >> + &local_err); >> + } > > [*] This looks scary: we silently throw away events after event queuing > fails. > > Fortunately, monitor_qapi_event_queue_no_recurse() cannot fail. It > takes an Error ** parameter only so you can put it into @qmp_emit. > right > Aside: I wish we'd get rid of the indirection through @qmp_emit. > for later: You would hard code a function name instead? > Let's pass &error_abort and drop @local_err. actually, we can just ignore errp, right? for later: What about dropping error from the emit function? > >> + qobject_unref(ev->qdict); >> + g_free(ev); >> + } >> + >> + recurse = false; > > Aha: @recurse means we've reentered the function. > > "Recurse" is imperative mood. Misleading, as it's not an order to > recurse. > > Rename to @recursed? @reentered? > ok, let's rename it "reentered" >> + >> + if (local_err) { >> + error_propagate(errp, local_err); >> + } >> +} >> + >> /* >> * This function runs evconf->rate ns after sending a throttled >> * event. > > monitor_lock clearly needs a rethink. Your TODO comment suggests you > agree. But that's something for 3.1. > > I hate messing with locks at -rc3, but I also hate shipping known > deadlocks. Your patch isn't pretty, but probably as simple as we can > make it for 3.0. A few more review eyeballs would be nice. > thanks, I'll send v2 -- Marc-André Lureau ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion 2018-07-31 14:45 ` Marc-André Lureau @ 2018-07-31 15:15 ` Markus Armbruster 0 siblings, 0 replies; 7+ messages in thread From: Markus Armbruster @ 2018-07-31 15:15 UTC (permalink / raw) To: Marc-André Lureau; +Cc: Markus Armbruster, QEMU, Dr. David Alan Gilbert Marc-André Lureau <marcandre.lureau@gmail.com> writes: > Hi > > On Tue, Jul 31, 2018 at 9:05 AM, Markus Armbruster <armbru@redhat.com> wrote: >> Marc-André Lureau <marcandre.lureau@redhat.com> writes: >> >>> With a Spice port chardev, it is possible to reenter >>> monitor_qapi_event_queue() (when the client disconnects for >>> example). This will dead-lock on monitor_lock. >>> >>> Instead, use some TLS variables to check for recursion and queue the >>> events. >> [...] >>> >>> Note that error report is now moved to the first caller, which may >>> receive an error for a recursed event. This is probably fine (95% of >>> callers use &error_abort, the rest have NULL error and ignore it) >> >> I'm not 100% sure I understand this paragraph, but it might be moot; see >> [*] below. >> >>> >>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> >>> --- >>> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 50 insertions(+), 1 deletion(-) >>> >>> diff --git a/monitor.c b/monitor.c >>> index d8d8211ae4..d580c5a79c 100644 >>> --- a/monitor.c >>> +++ b/monitor.c >>> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque); >>> * applying any rate limiting if required. >>> */ >>> static void >>> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >>> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp) >>> { >>> MonitorQAPIEventConf *evconf; >>> MonitorQAPIEventState *evstate; >>> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >>> qemu_mutex_unlock(&monitor_lock); >>> } >>> >>> +static void >>> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp) >>> +{ >>> + Error *local_err = NULL; >>> + /* >>> + * If the function recurse, monitor_lock will dead-lock. >>> + * Instead, queue pending events in TLS. >> >> recurses >> >> The claim is correct before the patch. But the comment needs to explain >> current code. Perhaps: >> >> * monitor_qapi_event_queue_no_recurse() is not reentrant: it >> * would deadlock on monitor_lock. Work around by queueing >> * events in thread-local storage. > > ok > >> >>> + * TODO: remove this, make it re-enter safe. >>> + */ >>> + static __thread bool recurse; >>> + typedef struct MonitorQapiEvent { >>> + QAPIEvent event; >>> + QDict *qdict; >>> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry; >>> + } MonitorQapiEvent; >>> + MonitorQapiEvent *ev; >>> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue; >> >> Let's put the static variables first. > > ok > >>> + >>> + if (!recurse) { >>> + QSIMPLEQ_INIT(&event_queue); >>> + } >>> + >>> + ev = g_new(MonitorQapiEvent, 1); >>> + ev->qdict = qobject_ref(qdict); >>> + ev->event = event; >>> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry); >>> + if (recurse) { >>> + return; >>> + } >>> + >>> + recurse = true; >>> + >>> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) { >>> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry); >> >> Could we use QSIMPLEQ_FOREACH_SAFE()? > > I don't think so, the next variable could be NULL, while a recursive > call could be adding an element. > >> >>> + if (!local_err) { >>> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict, >>> + &local_err); >>> + } >> >> [*] This looks scary: we silently throw away events after event queuing >> fails. >> >> Fortunately, monitor_qapi_event_queue_no_recurse() cannot fail. It >> takes an Error ** parameter only so you can put it into @qmp_emit. >> > > right > >> Aside: I wish we'd get rid of the indirection through @qmp_emit. >> > > for later: You would hard code a function name instead? Yes. @qmp_emit is always monitor_qapi_event_queue in qemu-system-FOO, and event_test_emit in test-qmp-event. That's a linker job, not an indirect call job. >> Let's pass &error_abort and drop @local_err. > > actually, we can just ignore errp, right? Not sure what you mean, but you've since sent v2, so let me figure it out there :) > for later: What about dropping error from the emit function? Fine with me. Easy enough to bring back if we find a need. > >> >>> + qobject_unref(ev->qdict); >>> + g_free(ev); >>> + } >>> + >>> + recurse = false; >> >> Aha: @recurse means we've reentered the function. >> >> "Recurse" is imperative mood. Misleading, as it's not an order to >> recurse. >> >> Rename to @recursed? @reentered? >> > > ok, let's rename it "reentered" > >>> + >>> + if (local_err) { >>> + error_propagate(errp, local_err); >>> + } >>> +} >>> + >>> /* >>> * This function runs evconf->rate ns after sending a throttled >>> * event. >> >> monitor_lock clearly needs a rethink. Your TODO comment suggests you >> agree. But that's something for 3.1. >> >> I hate messing with locks at -rc3, but I also hate shipping known >> deadlocks. Your patch isn't pretty, but probably as simple as we can >> make it for 3.0. A few more review eyeballs would be nice. >> > > thanks, I'll send v2 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-07-31 15:16 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau 2018-07-30 13:05 ` Daniel P. Berrangé 2018-07-30 13:09 ` Marc-André Lureau 2018-07-30 13:17 ` Daniel P. Berrangé 2018-07-31 7:05 ` Markus Armbruster 2018-07-31 14:45 ` Marc-André Lureau 2018-07-31 15:15 ` Markus Armbruster
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).