* [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
@ 2018-07-30 12:57 Marc-André Lureau
2018-07-30 13:05 ` Daniel P. Berrangé
2018-07-31 7:05 ` Markus Armbruster
0 siblings, 2 replies; 7+ messages in thread
From: Marc-André Lureau @ 2018-07-30 12:57 UTC (permalink / raw)
To: qemu-devel; +Cc: armbru, Marc-André Lureau, Dr. David Alan Gilbert
With a Spice port chardev, it is possible to reenter
monitor_qapi_event_queue() (when the client disconnects for
example). This will dead-lock on monitor_lock.
Instead, use some TLS variables to check for recursion and queue the
events.
Fixes:
(gdb) bt
#0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
#1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
#3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
#4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
#5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
#6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
#7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
#8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
#9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
#10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
#11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
#12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
#13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
#14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
#15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
#16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
#17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
#18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
#19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
#20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
#21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
#22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
#23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
#24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
#25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
#26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
#27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
#28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
#29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
#30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
Note that error report is now moved to the first caller, which may
receive an error for a recursed event. This is probably fine (95% of
callers use &error_abort, the rest have NULL error and ignore it)
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 50 insertions(+), 1 deletion(-)
diff --git a/monitor.c b/monitor.c
index d8d8211ae4..d580c5a79c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
* applying any rate limiting if required.
*/
static void
-monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
+monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
{
MonitorQAPIEventConf *evconf;
MonitorQAPIEventState *evstate;
@@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
qemu_mutex_unlock(&monitor_lock);
}
+static void
+monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
+{
+ Error *local_err = NULL;
+ /*
+ * If the function recurse, monitor_lock will dead-lock.
+ * Instead, queue pending events in TLS.
+ * TODO: remove this, make it re-enter safe.
+ */
+ static __thread bool recurse;
+ typedef struct MonitorQapiEvent {
+ QAPIEvent event;
+ QDict *qdict;
+ QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
+ } MonitorQapiEvent;
+ MonitorQapiEvent *ev;
+ static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
+
+ if (!recurse) {
+ QSIMPLEQ_INIT(&event_queue);
+ }
+
+ ev = g_new(MonitorQapiEvent, 1);
+ ev->qdict = qobject_ref(qdict);
+ ev->event = event;
+ QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
+ if (recurse) {
+ return;
+ }
+
+ recurse = true;
+
+ while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
+ QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
+ if (!local_err) {
+ monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
+ &local_err);
+ }
+ qobject_unref(ev->qdict);
+ g_free(ev);
+ }
+
+ recurse = false;
+
+ if (local_err) {
+ error_propagate(errp, local_err);
+ }
+}
+
/*
* This function runs evconf->rate ns after sending a throttled
* event.
--
2.18.0.232.gb7bd9486b0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau
@ 2018-07-30 13:05 ` Daniel P. Berrangé
2018-07-30 13:09 ` Marc-André Lureau
2018-07-31 7:05 ` Markus Armbruster
1 sibling, 1 reply; 7+ messages in thread
From: Daniel P. Berrangé @ 2018-07-30 13:05 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel, armbru, Dr. David Alan Gilbert
On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote:
> With a Spice port chardev, it is possible to reenter
> monitor_qapi_event_queue() (when the client disconnects for
> example). This will dead-lock on monitor_lock.
>
> Instead, use some TLS variables to check for recursion and queue the
> events.
I wonder if it would be clearer to just change
qapi_event_send_spice_disconnected() so that it use g_idle_add() to
send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong
reason to sent that sychronously. This would avoid the recursion problem.
>
> Fixes:
> (gdb) bt
> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
>
> Note that error report is now moved to the first caller, which may
> receive an error for a recursed event. This is probably fine (95% of
> callers use &error_abort, the rest have NULL error and ignore it)
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/monitor.c b/monitor.c
> index d8d8211ae4..d580c5a79c 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
> * applying any rate limiting if required.
> */
> static void
> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
> {
> MonitorQAPIEventConf *evconf;
> MonitorQAPIEventState *evstate;
> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> qemu_mutex_unlock(&monitor_lock);
> }
>
> +static void
> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> +{
> + Error *local_err = NULL;
> + /*
> + * If the function recurse, monitor_lock will dead-lock.
> + * Instead, queue pending events in TLS.
> + * TODO: remove this, make it re-enter safe.
> + */
> + static __thread bool recurse;
> + typedef struct MonitorQapiEvent {
> + QAPIEvent event;
> + QDict *qdict;
> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
> + } MonitorQapiEvent;
> + MonitorQapiEvent *ev;
> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
> +
> + if (!recurse) {
> + QSIMPLEQ_INIT(&event_queue);
> + }
> +
> + ev = g_new(MonitorQapiEvent, 1);
> + ev->qdict = qobject_ref(qdict);
> + ev->event = event;
> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
> + if (recurse) {
> + return;
> + }
> +
> + recurse = true;
> +
> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
> + if (!local_err) {
> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
> + &local_err);
> + }
> + qobject_unref(ev->qdict);
> + g_free(ev);
> + }
> +
> + recurse = false;
> +
> + if (local_err) {
> + error_propagate(errp, local_err);
> + }
> +}
> +
> /*
> * This function runs evconf->rate ns after sending a throttled
> * event.
> --
> 2.18.0.232.gb7bd9486b0
>
>
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
2018-07-30 13:05 ` Daniel P. Berrangé
@ 2018-07-30 13:09 ` Marc-André Lureau
2018-07-30 13:17 ` Daniel P. Berrangé
0 siblings, 1 reply; 7+ messages in thread
From: Marc-André Lureau @ 2018-07-30 13:09 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: qemu-devel, Armbruster, Markus, Dr. David Alan Gilbert
Hi
On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote:
>> With a Spice port chardev, it is possible to reenter
>> monitor_qapi_event_queue() (when the client disconnects for
>> example). This will dead-lock on monitor_lock.
>>
>> Instead, use some TLS variables to check for recursion and queue the
>> events.
>
> I wonder if it would be clearer to just change
> qapi_event_send_spice_disconnected() so that it use g_idle_add() to
> send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong
> reason to sent that sychronously. This would avoid the recursion problem.
Yes, I seriously thought about that solution. But in the end, the bug
is in monitor code: it should be fixed there at some point. And it
could in theory happen with a different code path than Spice.
>>
>> Fixes:
>> (gdb) bt
>> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
>> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
>> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
>> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
>> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
>> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
>> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
>> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
>> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
>> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
>> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
>> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
>> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
>> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
>> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
>> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
>> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
>> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
>> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
>> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
>> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
>> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
>> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
>> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
>> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
>> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
>> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
>> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
>> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
>> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
>> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
>>
>> Note that error report is now moved to the first caller, which may
>> receive an error for a recursed event. This is probably fine (95% of
>> callers use &error_abort, the rest have NULL error and ignore it)
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index d8d8211ae4..d580c5a79c 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
>> * applying any rate limiting if required.
>> */
>> static void
>> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
>> {
>> MonitorQAPIEventConf *evconf;
>> MonitorQAPIEventState *evstate;
>> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> qemu_mutex_unlock(&monitor_lock);
>> }
>>
>> +static void
>> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> +{
>> + Error *local_err = NULL;
>> + /*
>> + * If the function recurse, monitor_lock will dead-lock.
>> + * Instead, queue pending events in TLS.
>> + * TODO: remove this, make it re-enter safe.
>> + */
>> + static __thread bool recurse;
>> + typedef struct MonitorQapiEvent {
>> + QAPIEvent event;
>> + QDict *qdict;
>> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
>> + } MonitorQapiEvent;
>> + MonitorQapiEvent *ev;
>> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
>> +
>> + if (!recurse) {
>> + QSIMPLEQ_INIT(&event_queue);
>> + }
>> +
>> + ev = g_new(MonitorQapiEvent, 1);
>> + ev->qdict = qobject_ref(qdict);
>> + ev->event = event;
>> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
>> + if (recurse) {
>> + return;
>> + }
>> +
>> + recurse = true;
>> +
>> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
>> + if (!local_err) {
>> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
>> + &local_err);
>> + }
>> + qobject_unref(ev->qdict);
>> + g_free(ev);
>> + }
>> +
>> + recurse = false;
>> +
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + }
>> +}
>> +
>> /*
>> * This function runs evconf->rate ns after sending a throttled
>> * event.
>> --
>> 2.18.0.232.gb7bd9486b0
>>
>>
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
2018-07-30 13:09 ` Marc-André Lureau
@ 2018-07-30 13:17 ` Daniel P. Berrangé
0 siblings, 0 replies; 7+ messages in thread
From: Daniel P. Berrangé @ 2018-07-30 13:17 UTC (permalink / raw)
To: Marc-André Lureau
Cc: qemu-devel, Armbruster, Markus, Dr. David Alan Gilbert
On Mon, Jul 30, 2018 at 03:09:08PM +0200, Marc-André Lureau wrote:
> Hi
>
> On Mon, Jul 30, 2018 at 3:05 PM, Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Mon, Jul 30, 2018 at 02:57:46PM +0200, Marc-André Lureau wrote:
> >> With a Spice port chardev, it is possible to reenter
> >> monitor_qapi_event_queue() (when the client disconnects for
> >> example). This will dead-lock on monitor_lock.
> >>
> >> Instead, use some TLS variables to check for recursion and queue the
> >> events.
> >
> > I wonder if it would be clearer to just change
> > qapi_event_send_spice_disconnected() so that it use g_idle_add() to
> > send the QAPI_EVENT_SPICE_DISCONNECTED event, as I don't see a strong
> > reason to sent that sychronously. This would avoid the recursion problem.
>
> Yes, I seriously thought about that solution. But in the end, the bug
> is in monitor code: it should be fixed there at some point. And it
> could in theory happen with a different code path than Spice.
The question is what we consider the API contract to be for the chardevs
that are used by the monitor. eg is the chardev I/O function permitted to
emit monitor events. Given that chardevs are widely used for lots of
backends, I guess I would probably say chardevs should be allowed to
emit monitor events, which does put the burden on the monitor code to
handle recursion.
I wonder if we could still use g_idle_add() though but in a different
place. The monitor_qapi_event_queue() method already emits events
asynchronously if they are tagged as needing rate limiting. So instead
of directly calling monitor_qapi_event_emit() in the non-rate limited
branch, we could send that call via g_idle_add(), so that all events
are emitted asynchronously. This would avoid monitor_qapi_event_queue()
becoming a re-entrancy problem.
I think g_idle_add() calls are processed in the order in which they
are registered, so event orderin should still be preserved.
> >> Fixes:
> >> (gdb) bt
> >> #0 0x00007fa69e7217fd in __lll_lock_wait () at /lib64/libpthread.so.0
> >> #1 0x00007fa69e71acf4 in pthread_mutex_lock () at /lib64/libpthread.so.0
> >> #2 0x0000563303567619 in qemu_mutex_lock_impl (mutex=0x563303d3e220 <monitor_lock>, file=0x5633036589a8 "/home/elmarco/src/qq/monitor.c", line=645) at /home/elmarco/src/qq/util/qemu-thread-posix.c:66
> >> #3 0x0000563302fa6c25 in monitor_qapi_event_queue (event=QAPI_EVENT_SPICE_DISCONNECTED, qdict=0x56330602bde0, errp=0x7ffc6ab5e728) at /home/elmarco/src/qq/monitor.c:645
> >> #4 0x0000563303549aca in qapi_event_send_spice_disconnected (server=0x563305afd630, client=0x563305745360, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-ui.c:149
> >> #5 0x00005633033e600f in channel_event (event=3, info=0x5633061b0050) at /home/elmarco/src/qq/ui/spice-core.c:235
> >> #6 0x00007fa69f6c86bb in reds_handle_channel_event (reds=<optimized out>, event=3, info=0x5633061b0050) at reds.c:316
> >> #7 0x00007fa69f6b193b in main_dispatcher_self_handle_channel_event (info=0x5633061b0050, event=3, self=0x563304e088c0) at main-dispatcher.c:197
> >> #8 0x00007fa69f6b193b in main_dispatcher_channel_event (self=0x563304e088c0, event=event@entry=3, info=0x5633061b0050) at main-dispatcher.c:197
> >> #9 0x00007fa69f6d0833 in red_stream_push_channel_event (s=s@entry=0x563305ad8f50, event=event@entry=3) at red-stream.c:414
> >> #10 0x00007fa69f6d086b in red_stream_free (s=0x563305ad8f50) at red-stream.c:388
> >> #11 0x00007fa69f6b7ddc in red_channel_client_finalize (object=0x563304df2360) at red-channel-client.c:347
> >> #12 0x00007fa6a56b7fb9 in g_object_unref () at /lib64/libgobject-2.0.so.0
> >> #13 0x00007fa69f6ba212 in red_channel_client_push (rcc=0x563304df2360) at red-channel-client.c:1341
> >> #14 0x00007fa69f68b259 in red_char_device_send_msg_to_client (client=<optimized out>, msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> >> #15 0x00007fa69f68b259 in red_char_device_send_msg_to_clients (msg=0x5633059b6310, dev=0x563304e08bc0) at char-device.c:305
> >> #16 0x00007fa69f68b259 in red_char_device_read_from_device (dev=0x563304e08bc0) at char-device.c:353
> >> #17 0x000056330317d01d in spice_chr_write (chr=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/spice.c:199
> >> #18 0x00005633034deee7 in qemu_chr_write_buffer (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, offset=0x7ffc6ab5ea70, write_all=false) at /home/elmarco/src/qq/chardev/char.c:112
> >> #19 0x00005633034df054 in qemu_chr_write (s=0x563304cafe20, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111, write_all=false) at /home/elmarco/src/qq/chardev/char.c:147
> >> #20 0x00005633034e1e13 in qemu_chr_fe_write (be=0x563304dbb800, buf=0x563304cc50b0 "{\"timestamp\": {\"seconds\": 1532944763, \"microseconds\": 326636}, \"event\": \"SHUTDOWN\", \"data\": {\"guest\": false}}\r\n", len=111) at /home/elmarco/src/qq/chardev/char-fe.c:42
> >> #21 0x0000563302fa6334 in monitor_flush_locked (mon=0x563304dbb800) at /home/elmarco/src/qq/monitor.c:425
> >> #22 0x0000563302fa6520 in monitor_puts (mon=0x563304dbb800, str=0x563305de7e9e "") at /home/elmarco/src/qq/monitor.c:468
> >> #23 0x0000563302fa680c in qmp_send_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:517
> >> #24 0x0000563302fa6905 in qmp_queue_response (mon=0x563304dbb800, rsp=0x563304df5730) at /home/elmarco/src/qq/monitor.c:538
> >> #25 0x0000563302fa6b5b in monitor_qapi_event_emit (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730) at /home/elmarco/src/qq/monitor.c:624
> >> #26 0x0000563302fa6c4b in monitor_qapi_event_queue (event=QAPI_EVENT_SHUTDOWN, qdict=0x563304df5730, errp=0x7ffc6ab5ed00) at /home/elmarco/src/qq/monitor.c:649
> >> #27 0x0000563303548cce in qapi_event_send_shutdown (guest=false, errp=0x563303d8d0f0 <error_abort>) at qapi/qapi-events-run-state.c:58
> >> #28 0x000056330313bcd7 in main_loop_should_exit () at /home/elmarco/src/qq/vl.c:1822
> >> #29 0x000056330313bde3 in main_loop () at /home/elmarco/src/qq/vl.c:1862
> >> #30 0x0000563303143781 in main (argc=3, argv=0x7ffc6ab5f068, envp=0x7ffc6ab5f088) at /home/elmarco/src/qq/vl.c:4644
> >>
> >> Note that error report is now moved to the first caller, which may
> >> receive an error for a recursed event. This is probably fine (95% of
> >> callers use &error_abort, the rest have NULL error and ignore it)
> >>
> >> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> >> ---
> >> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 50 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/monitor.c b/monitor.c
> >> index d8d8211ae4..d580c5a79c 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
> >> * applying any rate limiting if required.
> >> */
> >> static void
> >> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
> >> {
> >> MonitorQAPIEventConf *evconf;
> >> MonitorQAPIEventState *evstate;
> >> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> qemu_mutex_unlock(&monitor_lock);
> >> }
> >>
> >> +static void
> >> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> >> +{
> >> + Error *local_err = NULL;
> >> + /*
> >> + * If the function recurse, monitor_lock will dead-lock.
> >> + * Instead, queue pending events in TLS.
> >> + * TODO: remove this, make it re-enter safe.
> >> + */
> >> + static __thread bool recurse;
> >> + typedef struct MonitorQapiEvent {
> >> + QAPIEvent event;
> >> + QDict *qdict;
> >> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
> >> + } MonitorQapiEvent;
> >> + MonitorQapiEvent *ev;
> >> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
> >> +
> >> + if (!recurse) {
> >> + QSIMPLEQ_INIT(&event_queue);
> >> + }
> >> +
> >> + ev = g_new(MonitorQapiEvent, 1);
> >> + ev->qdict = qobject_ref(qdict);
> >> + ev->event = event;
> >> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
> >> + if (recurse) {
> >> + return;
> >> + }
> >> +
> >> + recurse = true;
> >> +
> >> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
> >> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
> >> + if (!local_err) {
> >> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
> >> + &local_err);
> >> + }
> >> + qobject_unref(ev->qdict);
> >> + g_free(ev);
> >> + }
> >> +
> >> + recurse = false;
> >> +
> >> + if (local_err) {
> >> + error_propagate(errp, local_err);
> >> + }
> >> +}
> >> +
> >> /*
> >> * This function runs evconf->rate ns after sending a throttled
> >> * event.
> >> --
> >> 2.18.0.232.gb7bd9486b0
> >>
> >>
> >
> > Regards,
> > Daniel
> > --
> > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> > |: https://libvirt.org -o- https://fstop138.berrange.com :|
> > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau
2018-07-30 13:05 ` Daniel P. Berrangé
@ 2018-07-31 7:05 ` Markus Armbruster
2018-07-31 14:45 ` Marc-André Lureau
1 sibling, 1 reply; 7+ messages in thread
From: Markus Armbruster @ 2018-07-31 7:05 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: qemu-devel, Dr. David Alan Gilbert
Marc-André Lureau <marcandre.lureau@redhat.com> writes:
> With a Spice port chardev, it is possible to reenter
> monitor_qapi_event_queue() (when the client disconnects for
> example). This will dead-lock on monitor_lock.
>
> Instead, use some TLS variables to check for recursion and queue the
> events.
[...]
>
> Note that error report is now moved to the first caller, which may
> receive an error for a recursed event. This is probably fine (95% of
> callers use &error_abort, the rest have NULL error and ignore it)
I'm not 100% sure I understand this paragraph, but it might be moot; see
[*] below.
>
> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
> ---
> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 50 insertions(+), 1 deletion(-)
>
> diff --git a/monitor.c b/monitor.c
> index d8d8211ae4..d580c5a79c 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
> * applying any rate limiting if required.
> */
> static void
> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
> {
> MonitorQAPIEventConf *evconf;
> MonitorQAPIEventState *evstate;
> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> qemu_mutex_unlock(&monitor_lock);
> }
>
> +static void
> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
> +{
> + Error *local_err = NULL;
> + /*
> + * If the function recurse, monitor_lock will dead-lock.
> + * Instead, queue pending events in TLS.
recurses
The claim is correct before the patch. But the comment needs to explain
current code. Perhaps:
* monitor_qapi_event_queue_no_recurse() is not reentrant: it
* would deadlock on monitor_lock. Work around by queueing
* events in thread-local storage.
> + * TODO: remove this, make it re-enter safe.
> + */
> + static __thread bool recurse;
> + typedef struct MonitorQapiEvent {
> + QAPIEvent event;
> + QDict *qdict;
> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
> + } MonitorQapiEvent;
> + MonitorQapiEvent *ev;
> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
Let's put the static variables first.
> +
> + if (!recurse) {
> + QSIMPLEQ_INIT(&event_queue);
> + }
> +
> + ev = g_new(MonitorQapiEvent, 1);
> + ev->qdict = qobject_ref(qdict);
> + ev->event = event;
> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
> + if (recurse) {
> + return;
> + }
> +
> + recurse = true;
> +
> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
Could we use QSIMPLEQ_FOREACH_SAFE()?
> + if (!local_err) {
> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
> + &local_err);
> + }
[*] This looks scary: we silently throw away events after event queuing
fails.
Fortunately, monitor_qapi_event_queue_no_recurse() cannot fail. It
takes an Error ** parameter only so you can put it into @qmp_emit.
Aside: I wish we'd get rid of the indirection through @qmp_emit.
Let's pass &error_abort and drop @local_err.
> + qobject_unref(ev->qdict);
> + g_free(ev);
> + }
> +
> + recurse = false;
Aha: @recurse means we've reentered the function.
"Recurse" is imperative mood. Misleading, as it's not an order to
recurse.
Rename to @recursed? @reentered?
> +
> + if (local_err) {
> + error_propagate(errp, local_err);
> + }
> +}
> +
> /*
> * This function runs evconf->rate ns after sending a throttled
> * event.
monitor_lock clearly needs a rethink. Your TODO comment suggests you
agree. But that's something for 3.1.
I hate messing with locks at -rc3, but I also hate shipping known
deadlocks. Your patch isn't pretty, but probably as simple as we can
make it for 3.0. A few more review eyeballs would be nice.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
2018-07-31 7:05 ` Markus Armbruster
@ 2018-07-31 14:45 ` Marc-André Lureau
2018-07-31 15:15 ` Markus Armbruster
0 siblings, 1 reply; 7+ messages in thread
From: Marc-André Lureau @ 2018-07-31 14:45 UTC (permalink / raw)
To: Markus Armbruster; +Cc: QEMU, Dr. David Alan Gilbert
Hi
On Tue, Jul 31, 2018 at 9:05 AM, Markus Armbruster <armbru@redhat.com> wrote:
> Marc-André Lureau <marcandre.lureau@redhat.com> writes:
>
>> With a Spice port chardev, it is possible to reenter
>> monitor_qapi_event_queue() (when the client disconnects for
>> example). This will dead-lock on monitor_lock.
>>
>> Instead, use some TLS variables to check for recursion and queue the
>> events.
> [...]
>>
>> Note that error report is now moved to the first caller, which may
>> receive an error for a recursed event. This is probably fine (95% of
>> callers use &error_abort, the rest have NULL error and ignore it)
>
> I'm not 100% sure I understand this paragraph, but it might be moot; see
> [*] below.
>
>>
>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>> ---
>> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 50 insertions(+), 1 deletion(-)
>>
>> diff --git a/monitor.c b/monitor.c
>> index d8d8211ae4..d580c5a79c 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
>> * applying any rate limiting if required.
>> */
>> static void
>> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
>> {
>> MonitorQAPIEventConf *evconf;
>> MonitorQAPIEventState *evstate;
>> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> qemu_mutex_unlock(&monitor_lock);
>> }
>>
>> +static void
>> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>> +{
>> + Error *local_err = NULL;
>> + /*
>> + * If the function recurse, monitor_lock will dead-lock.
>> + * Instead, queue pending events in TLS.
>
> recurses
>
> The claim is correct before the patch. But the comment needs to explain
> current code. Perhaps:
>
> * monitor_qapi_event_queue_no_recurse() is not reentrant: it
> * would deadlock on monitor_lock. Work around by queueing
> * events in thread-local storage.
ok
>
>> + * TODO: remove this, make it re-enter safe.
>> + */
>> + static __thread bool recurse;
>> + typedef struct MonitorQapiEvent {
>> + QAPIEvent event;
>> + QDict *qdict;
>> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
>> + } MonitorQapiEvent;
>> + MonitorQapiEvent *ev;
>> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
>
> Let's put the static variables first.
ok
>> +
>> + if (!recurse) {
>> + QSIMPLEQ_INIT(&event_queue);
>> + }
>> +
>> + ev = g_new(MonitorQapiEvent, 1);
>> + ev->qdict = qobject_ref(qdict);
>> + ev->event = event;
>> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
>> + if (recurse) {
>> + return;
>> + }
>> +
>> + recurse = true;
>> +
>> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
>> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
>
> Could we use QSIMPLEQ_FOREACH_SAFE()?
I don't think so, the next variable could be NULL, while a recursive
call could be adding an element.
>
>> + if (!local_err) {
>> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
>> + &local_err);
>> + }
>
> [*] This looks scary: we silently throw away events after event queuing
> fails.
>
> Fortunately, monitor_qapi_event_queue_no_recurse() cannot fail. It
> takes an Error ** parameter only so you can put it into @qmp_emit.
>
right
> Aside: I wish we'd get rid of the indirection through @qmp_emit.
>
for later: You would hard code a function name instead?
> Let's pass &error_abort and drop @local_err.
actually, we can just ignore errp, right?
for later: What about dropping error from the emit function?
>
>> + qobject_unref(ev->qdict);
>> + g_free(ev);
>> + }
>> +
>> + recurse = false;
>
> Aha: @recurse means we've reentered the function.
>
> "Recurse" is imperative mood. Misleading, as it's not an order to
> recurse.
>
> Rename to @recursed? @reentered?
>
ok, let's rename it "reentered"
>> +
>> + if (local_err) {
>> + error_propagate(errp, local_err);
>> + }
>> +}
>> +
>> /*
>> * This function runs evconf->rate ns after sending a throttled
>> * event.
>
> monitor_lock clearly needs a rethink. Your TODO comment suggests you
> agree. But that's something for 3.1.
>
> I hate messing with locks at -rc3, but I also hate shipping known
> deadlocks. Your patch isn't pretty, but probably as simple as we can
> make it for 3.0. A few more review eyeballs would be nice.
>
thanks, I'll send v2
--
Marc-André Lureau
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion
2018-07-31 14:45 ` Marc-André Lureau
@ 2018-07-31 15:15 ` Markus Armbruster
0 siblings, 0 replies; 7+ messages in thread
From: Markus Armbruster @ 2018-07-31 15:15 UTC (permalink / raw)
To: Marc-André Lureau; +Cc: Markus Armbruster, QEMU, Dr. David Alan Gilbert
Marc-André Lureau <marcandre.lureau@gmail.com> writes:
> Hi
>
> On Tue, Jul 31, 2018 at 9:05 AM, Markus Armbruster <armbru@redhat.com> wrote:
>> Marc-André Lureau <marcandre.lureau@redhat.com> writes:
>>
>>> With a Spice port chardev, it is possible to reenter
>>> monitor_qapi_event_queue() (when the client disconnects for
>>> example). This will dead-lock on monitor_lock.
>>>
>>> Instead, use some TLS variables to check for recursion and queue the
>>> events.
>> [...]
>>>
>>> Note that error report is now moved to the first caller, which may
>>> receive an error for a recursed event. This is probably fine (95% of
>>> callers use &error_abort, the rest have NULL error and ignore it)
>>
>> I'm not 100% sure I understand this paragraph, but it might be moot; see
>> [*] below.
>>
>>>
>>> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>>> ---
>>> monitor.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>> 1 file changed, 50 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/monitor.c b/monitor.c
>>> index d8d8211ae4..d580c5a79c 100644
>>> --- a/monitor.c
>>> +++ b/monitor.c
>>> @@ -633,7 +633,7 @@ static void monitor_qapi_event_handler(void *opaque);
>>> * applying any rate limiting if required.
>>> */
>>> static void
>>> -monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>>> +monitor_qapi_event_queue_no_recurse(QAPIEvent event, QDict *qdict, Error **errp)
>>> {
>>> MonitorQAPIEventConf *evconf;
>>> MonitorQAPIEventState *evstate;
>>> @@ -688,6 +688,55 @@ monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>>> qemu_mutex_unlock(&monitor_lock);
>>> }
>>>
>>> +static void
>>> +monitor_qapi_event_queue(QAPIEvent event, QDict *qdict, Error **errp)
>>> +{
>>> + Error *local_err = NULL;
>>> + /*
>>> + * If the function recurse, monitor_lock will dead-lock.
>>> + * Instead, queue pending events in TLS.
>>
>> recurses
>>
>> The claim is correct before the patch. But the comment needs to explain
>> current code. Perhaps:
>>
>> * monitor_qapi_event_queue_no_recurse() is not reentrant: it
>> * would deadlock on monitor_lock. Work around by queueing
>> * events in thread-local storage.
>
> ok
>
>>
>>> + * TODO: remove this, make it re-enter safe.
>>> + */
>>> + static __thread bool recurse;
>>> + typedef struct MonitorQapiEvent {
>>> + QAPIEvent event;
>>> + QDict *qdict;
>>> + QSIMPLEQ_ENTRY(MonitorQapiEvent) entry;
>>> + } MonitorQapiEvent;
>>> + MonitorQapiEvent *ev;
>>> + static __thread QSIMPLEQ_HEAD(, MonitorQapiEvent) event_queue;
>>
>> Let's put the static variables first.
>
> ok
>
>>> +
>>> + if (!recurse) {
>>> + QSIMPLEQ_INIT(&event_queue);
>>> + }
>>> +
>>> + ev = g_new(MonitorQapiEvent, 1);
>>> + ev->qdict = qobject_ref(qdict);
>>> + ev->event = event;
>>> + QSIMPLEQ_INSERT_TAIL(&event_queue, ev, entry);
>>> + if (recurse) {
>>> + return;
>>> + }
>>> +
>>> + recurse = true;
>>> +
>>> + while ((ev = QSIMPLEQ_FIRST(&event_queue)) != NULL) {
>>> + QSIMPLEQ_REMOVE_HEAD(&event_queue, entry);
>>
>> Could we use QSIMPLEQ_FOREACH_SAFE()?
>
> I don't think so, the next variable could be NULL, while a recursive
> call could be adding an element.
>
>>
>>> + if (!local_err) {
>>> + monitor_qapi_event_queue_no_recurse(ev->event, ev->qdict,
>>> + &local_err);
>>> + }
>>
>> [*] This looks scary: we silently throw away events after event queuing
>> fails.
>>
>> Fortunately, monitor_qapi_event_queue_no_recurse() cannot fail. It
>> takes an Error ** parameter only so you can put it into @qmp_emit.
>>
>
> right
>
>> Aside: I wish we'd get rid of the indirection through @qmp_emit.
>>
>
> for later: You would hard code a function name instead?
Yes.
@qmp_emit is always monitor_qapi_event_queue in qemu-system-FOO, and
event_test_emit in test-qmp-event. That's a linker job, not an indirect
call job.
>> Let's pass &error_abort and drop @local_err.
>
> actually, we can just ignore errp, right?
Not sure what you mean, but you've since sent v2, so let me figure it
out there :)
> for later: What about dropping error from the emit function?
Fine with me. Easy enough to bring back if we find a need.
>
>>
>>> + qobject_unref(ev->qdict);
>>> + g_free(ev);
>>> + }
>>> +
>>> + recurse = false;
>>
>> Aha: @recurse means we've reentered the function.
>>
>> "Recurse" is imperative mood. Misleading, as it's not an order to
>> recurse.
>>
>> Rename to @recursed? @reentered?
>>
>
> ok, let's rename it "reentered"
>
>>> +
>>> + if (local_err) {
>>> + error_propagate(errp, local_err);
>>> + }
>>> +}
>>> +
>>> /*
>>> * This function runs evconf->rate ns after sending a throttled
>>> * event.
>>
>> monitor_lock clearly needs a rethink. Your TODO comment suggests you
>> agree. But that's something for 3.1.
>>
>> I hate messing with locks at -rc3, but I also hate shipping known
>> deadlocks. Your patch isn't pretty, but probably as simple as we can
>> make it for 3.0. A few more review eyeballs would be nice.
>>
>
> thanks, I'll send v2
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-07-31 15:16 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-30 12:57 [Qemu-devel] [PATCH RFC] monitor: temporary fix for dead-lock on event recursion Marc-André Lureau
2018-07-30 13:05 ` Daniel P. Berrangé
2018-07-30 13:09 ` Marc-André Lureau
2018-07-30 13:17 ` Daniel P. Berrangé
2018-07-31 7:05 ` Markus Armbruster
2018-07-31 14:45 ` Marc-André Lureau
2018-07-31 15:15 ` Markus Armbruster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).