[RFC 0/3] qmp: make qmp_device

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [RFC 0/3] qmp: make qmp_device_add() a coroutine
@ 2023-09-06 19:01 Stefan Hajnoczi
  2023-09-06 19:01 ` [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command() Stefan Hajnoczi
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-06 19:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr. David Alan Gilbert, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé, Stefan Hajnoczi

It is not safe to call drain_call_rcu() from qmp_device_add() because
some call stacks are not prepared for drain_call_rcu() to drop the Big
QEMU Lock (BQL).

For example, device emulation code is protected by the BQL but when it
calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
BQL is dropped. See https://bugzilla.redhat.com/show_bug.cgi?id=2215192 for a
concrete bug of this type.

Another limitation of drain_call_rcu() is that it cannot be invoked within an
RCU read-side critical section since the reclamation phase cannot complete
until the end of the critical section. Unfortunately, call stacks have been
seen where this happens (see
https://bugzilla.redhat.com/show_bug.cgi?id=2214985).

This patch series introduces drain_call_rcu_co(), which does the same thing as
drain_call_rcu() but asynchronously. By yielding back to the event loop we can
wait until the caller drops the BQL and leaves its RCU read-side critical
section.

Patch 1 changes HMP so that coroutine monitor commands yield back to the event
loop instead of running inside a nested event loop.

Patch 2 introduces the new drain_call_rcu_co() API.

Patch 3 converts qmp_device_add() into a coroutine monitor command and uses
drain_call_rcu_co().

I'm sending this as an RFC because I don't have confirmation yet that the bugs
mentioned above are fixed by this patch series.

Stefan Hajnoczi (3):
  hmp: avoid the nested event loop in handle_hmp_command()
  rcu: add drain_call_rcu_co() API
  qmp: make qmp_device_add() a coroutine

 MAINTAINERS            |  2 ++
 docs/devel/rcu.txt     | 21 ++++++++++++++++
 qapi/qdev.json         |  1 +
 include/monitor/qdev.h |  3 ++-
 include/qemu/rcu.h     |  1 +
 util/rcu-internal.h    |  8 ++++++
 monitor/hmp.c          | 28 +++++++++++----------
 monitor/qmp-cmds.c     |  2 +-
 softmmu/qdev-monitor.c | 34 +++++++++++++++++++++++---
 util/rcu-co.c          | 55 ++++++++++++++++++++++++++++++++++++++++++
 util/rcu.c             |  3 ++-
 hmp-commands.hx        |  1 +
 util/meson.build       |  2 +-
 13 files changed, 140 insertions(+), 21 deletions(-)
 create mode 100644 util/rcu-internal.h
 create mode 100644 util/rcu-co.c

-- 
2.41.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-06 19:01 [RFC 0/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
@ 2023-09-06 19:01 ` Stefan Hajnoczi
  2023-09-07  1:06   ` Dr. David Alan Gilbert
  2023-09-06 19:01 ` [RFC 2/3] rcu: add drain_call_rcu_co() API Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-06 19:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr. David Alan Gilbert, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé, Stefan Hajnoczi

Coroutine HMP commands currently run to completion in a nested event
loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
the BQL and cannot process work while the coroutine monitor command is
running. A deadlock occurs when monitor commands attempt to wait for
call_rcu work to finish.

This patch refactors the HMP monitor to use the existing event loop
instead of creating a nested event loop. This will allow the next
patches to rely on draining call_rcu work.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 monitor/hmp.c | 28 +++++++++++++++-------------
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/monitor/hmp.c b/monitor/hmp.c
index 69c1b7e98a..6cff2810aa 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
     Monitor *mon;
     const HMPCommand *cmd;
     QDict *qdict;
-    bool done;
 } HandleHmpCommandCo;
 
-static void handle_hmp_command_co(void *opaque)
+static void coroutine_fn handle_hmp_command_co(void *opaque)
 {
     HandleHmpCommandCo *data = opaque;
+
     handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
     monitor_set_cur(qemu_coroutine_self(), NULL);
-    data->done = true;
+    qobject_unref(data->qdict);
+    monitor_resume(data->mon);
+    g_free(data);
 }
 
 void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
@@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
         Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
         handle_hmp_command_exec(&mon->common, cmd, qdict);
         monitor_set_cur(qemu_coroutine_self(), old_mon);
+        qobject_unref(qdict);
     } else {
-        HandleHmpCommandCo data = {
-            .mon = &mon->common,
-            .cmd = cmd,
-            .qdict = qdict,
-            .done = false,
-        };
-        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
+        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
+
+        data = g_new(HandleHmpCommandCo, 1);
+        data->mon = &mon->common;
+        data->cmd = cmd;
+        data->qdict = qdict; /* freed by handle_hmp_command_co() */
+
+        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
+        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
         monitor_set_cur(co, &mon->common);
         aio_co_enter(qemu_get_aio_context(), co);
-        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
     }
-
-    qobject_unref(qdict);
 }
 
 static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 2/3] rcu: add drain_call_rcu_co() API
  2023-09-06 19:01 [RFC 0/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
  2023-09-06 19:01 ` [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command() Stefan Hajnoczi
@ 2023-09-06 19:01 ` Stefan Hajnoczi
  2023-09-12 16:36   ` Kevin Wolf
  2023-09-06 19:01 ` [RFC 3/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
  2023-09-07 11:28 ` [RFC 0/3] " Paolo Bonzini
  3 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-06 19:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr. David Alan Gilbert, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé, Stefan Hajnoczi

call_drain_rcu() has limitations that make it unsuitable for use in
qmp_device_add(). Introduce a new coroutine version of drain_call_rcu()
with the same functionality but that does not drop the BQL. The next
patch will use it to fix qmp_device_add().

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 MAINTAINERS         |  2 ++
 docs/devel/rcu.txt  | 21 +++++++++++++++++
 include/qemu/rcu.h  |  1 +
 util/rcu-internal.h |  8 +++++++
 util/rcu-co.c       | 55 +++++++++++++++++++++++++++++++++++++++++++++
 util/rcu.c          |  3 ++-
 util/meson.build    |  2 +-
 7 files changed, 90 insertions(+), 2 deletions(-)
 create mode 100644 util/rcu-internal.h
 create mode 100644 util/rcu-co.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 3b29568ed4..7f98253bda 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2908,6 +2908,8 @@ F: include/qemu/rcu*.h
 F: tests/unit/rcutorture.c
 F: tests/unit/test-rcu-*.c
 F: util/rcu.c
+F: util/rcu-co.c
+F: util/rcu-internal.h
 
 Human Monitor (HMP)
 M: Dr. David Alan Gilbert <dave@treblig.org>
diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.txt
index 2e6cc607a1..344764527f 100644
--- a/docs/devel/rcu.txt
+++ b/docs/devel/rcu.txt
@@ -130,6 +130,27 @@ The core RCU API is small:
 
             g_free_rcu(&foo, rcu);
 
+     void coroutine_fn drain_call_rcu_co(void);
+
+        drain_call_rcu_co() yields until the reclamation phase is finished.
+        Reclaimer functions previously submitted with call_rcu1() in this
+        thread will have finished by the time drain_call_rcu_co() returns.
+
+     void drain_call_rcu(void);
+
+        drain_call_rcu() releases the Big QEMU Lock (BQL), if held, waits until
+        the reclamation phase is finished, and then re-acquires the BQL, if
+        previously held.  Reclaimer functions previously submitted with
+        call_rcu1() in this thread will have finished by the time
+        drain_call_rcu() returns.
+
+        drain_call_rcu() has the following limitations:
+        1. It deadlocks when called within an RCU read-side critical section.
+        2. All functions on the call stack must be designed to handle dropping
+           the BQL.
+
+        Prefer drain_call_rcu_co() over drain_call_rcu().
+
      typeof(*p) qatomic_rcu_read(p);
 
         qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index fea058aa9f..53055df1dc 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -141,6 +141,7 @@ struct rcu_head {
 };
 
 void call_rcu1(struct rcu_head *head, RCUCBFunc *func);
+void coroutine_fn drain_call_rcu_co(void);
 void drain_call_rcu(void);
 
 /* The operands of the minus operator must have the same type,
diff --git a/util/rcu-internal.h b/util/rcu-internal.h
new file mode 100644
index 0000000000..7d85366d54
--- /dev/null
+++ b/util/rcu-internal.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+
+#ifndef RCU_INTERNAL_H
+#define RCU_INTERNAL_H
+
+extern int in_drain_call_rcu;
+
+#endif /* RCU_INTERNAL_H */
diff --git a/util/rcu-co.c b/util/rcu-co.c
new file mode 100644
index 0000000000..920fcacb7a
--- /dev/null
+++ b/util/rcu-co.c
@@ -0,0 +1,55 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+/*
+ * RCU APIs for coroutines
+ *
+ * The RCU coroutine APIs are kept separate from the main RCU code to avoid
+ * depending on AioContext APIs in rcu.c. This is necessary because at least
+ * tests/unit/ptimer-test.c has replacement functions for AioContext APIs that
+ * conflict with the real functions.
+ *
+ * It's also nice to logically separate the core RCU code from the coroutine
+ * APIs :).
+ */
+#include "qemu/osdep.h"
+#include "block/aio.h"
+#include "qemu/atomic.h"
+#include "qemu/coroutine.h"
+#include "qemu/rcu.h"
+#include "rcu-internal.h"
+
+typedef struct {
+    struct rcu_head rcu;
+    Coroutine *co;
+} RcuDrainCo;
+
+static void drain_call_rcu_co_bh(void *opaque)
+{
+    RcuDrainCo *data = opaque;
+
+    /* Re-enter drain_call_rcu_co() where it yielded */
+    aio_co_wake(data->co);
+}
+
+static void drain_call_rcu_co_cb(struct rcu_head *node)
+{
+    RcuDrainCo *data = container_of(node, RcuDrainCo, rcu);
+    AioContext *ctx = qemu_coroutine_get_aio_context(data->co);
+
+    /*
+     * drain_call_rcu_co() might still be running in its thread, so schedule a
+     * BH in its thread. The BH only runs after the coroutine has yielded.
+     */
+    aio_bh_schedule_oneshot(ctx, drain_call_rcu_co_bh, data);
+}
+
+void coroutine_fn drain_call_rcu_co(void)
+{
+    RcuDrainCo data = {
+        .co = qemu_coroutine_self(),
+    };
+
+    qatomic_inc(&in_drain_call_rcu);
+    call_rcu1(&data.rcu, drain_call_rcu_co_cb);
+    qemu_coroutine_yield(); /* wait for drain_rcu_co_bh() */
+    qatomic_dec(&in_drain_call_rcu);
+}
diff --git a/util/rcu.c b/util/rcu.c
index e587bcc483..2519bd7d5c 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -32,6 +32,7 @@
 #include "qemu/thread.h"
 #include "qemu/main-loop.h"
 #include "qemu/lockable.h"
+#include "rcu-internal.h"
 #if defined(CONFIG_MALLOC_TRIM)
 #include <malloc.h>
 #endif
@@ -46,7 +47,7 @@
 unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
 
 QemuEvent rcu_gp_event;
-static int in_drain_call_rcu;
+int in_drain_call_rcu;
 static QemuMutex rcu_registry_lock;
 static QemuMutex rcu_sync_lock;
 
diff --git a/util/meson.build b/util/meson.build
index a375160286..849d56f756 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -43,7 +43,7 @@ util_ss.add(files('keyval.c'))
 util_ss.add(files('crc32c.c'))
 util_ss.add(files('uuid.c'))
 util_ss.add(files('getauxval.c'))
-util_ss.add(files('rcu.c'))
+util_ss.add(files('rcu.c', 'rcu-co.c'))
 if have_membarrier
   util_ss.add(files('sys_membarrier.c'))
 endif
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 3/3] qmp: make qmp_device_add() a coroutine
  2023-09-06 19:01 [RFC 0/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
  2023-09-06 19:01 ` [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command() Stefan Hajnoczi
  2023-09-06 19:01 ` [RFC 2/3] rcu: add drain_call_rcu_co() API Stefan Hajnoczi
@ 2023-09-06 19:01 ` Stefan Hajnoczi
  2023-09-12 16:47   ` Kevin Wolf
  2023-09-07 11:28 ` [RFC 0/3] " Paolo Bonzini
  3 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-06 19:01 UTC (permalink / raw)
  To: qemu-devel
  Cc: Dr. David Alan Gilbert, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé, Stefan Hajnoczi

It is not safe to call drain_call_rcu() from qmp_device_add() because
some call stacks are not prepared for drain_call_rcu() to drop the Big
QEMU Lock (BQL).

For example, device emulation code is protected by the BQL but when it
calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
BQL is dropped. See bz#2215192 below for a concrete bug of this type.

Another limitation of drain_call_rcu() is that it cannot be invoked
within an RCU read-side critical section since the reclamation phase
cannot complete until the end of the critical section. Unfortunately,
call stacks have been seen where this happens (see bz#2214985 below).

Switch to call_drain_rcu_co() to avoid these problems. This requires
making qmp_device_add() a coroutine. qdev_device_add() is not designed
to be called from coroutines, so it must be invoked from a BH and then
switch back to the coroutine.

Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in in qmp_device_add")
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/qdev.json         |  1 +
 include/monitor/qdev.h |  3 ++-
 monitor/qmp-cmds.c     |  2 +-
 softmmu/qdev-monitor.c | 34 ++++++++++++++++++++++++++++++----
 hmp-commands.hx        |  1 +
 5 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/qapi/qdev.json b/qapi/qdev.json
index 6bc5a733b8..78e9d7f7b8 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -79,6 +79,7 @@
 ##
 { 'command': 'device_add',
   'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
+  'coroutine': true,
   'gen': false, # so we can get the additional arguments
   'features': ['json-cli', 'json-cli-hotplug'] }
 
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 1d57bf6577..1fed9eb9ea 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -5,7 +5,8 @@
 
 void hmp_info_qtree(Monitor *mon, const QDict *qdict);
 void hmp_info_qdm(Monitor *mon, const QDict *qdict);
-void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
+void coroutine_fn
+qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
 
 int qdev_device_help(QemuOpts *opts);
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index b0f948d337..a7419226fe 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -202,7 +202,7 @@ static void __attribute__((__constructor__)) monitor_init_qmp_commands(void)
     qmp_init_marshal(&qmp_commands);
 
     qmp_register_command(&qmp_commands, "device_add",
-                         qmp_device_add, 0, 0);
+                         qmp_device_add, QCO_COROUTINE, 0);
 
     QTAILQ_INIT(&qmp_cap_negotiation_commands);
     qmp_register_command(&qmp_cap_negotiation_commands, "qmp_capabilities",
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 74f4e41338..85ae62f7cf 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -839,8 +839,28 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
     qdev_print_devinfos(true);
 }
 
-void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
+typedef struct {
+    Coroutine *co;
+    QemuOpts *opts;
+    Error **errp;
+    DeviceState *dev;
+} QmpDeviceAdd;
+
+static void qmp_device_add_bh(void *opaque)
 {
+    QmpDeviceAdd *data = opaque;
+
+    data->dev = qdev_device_add(data->opts, data->errp);
+    aio_co_wake(data->co);
+}
+
+void coroutine_fn
+qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
+{
+    QmpDeviceAdd data = {
+        .co = qemu_coroutine_self(),
+        .errp = errp,
+    };
     QemuOpts *opts;
     DeviceState *dev;
 
@@ -852,7 +872,13 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
         qemu_opts_del(opts);
         return;
     }
-    dev = qdev_device_add(opts, errp);
+
+    /* Perform qdev_device_add() call outside coroutine context */
+    data.opts = opts;
+    aio_bh_schedule_oneshot(qemu_coroutine_get_aio_context(data.co),
+                            qmp_device_add_bh, &data);
+    qemu_coroutine_yield();
+    dev = data.dev;
 
     /*
      * Drain all pending RCU callbacks. This is done because
@@ -863,7 +889,7 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
      * will finish its job completely once qmp command returns result
      * to the user
      */
-    drain_call_rcu();
+    drain_call_rcu_co();
 
     if (!dev) {
         qemu_opts_del(opts);
@@ -956,7 +982,7 @@ void qmp_device_del(const char *id, Error **errp)
     }
 }
 
-void hmp_device_add(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_device_add(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
 
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 2cbd0f77a0..c737d1fd64 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -695,6 +695,7 @@ ERST
         .params     = "driver[,prop=value][,...]",
         .help       = "add device, like -device on the command line",
         .cmd        = hmp_device_add,
+        .coroutine  = true,
         .command_completion = device_add_completion,
     },
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-06 19:01 ` [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command() Stefan Hajnoczi
@ 2023-09-07  1:06   ` Dr. David Alan Gilbert
  2023-09-07 14:04     ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2023-09-07  1:06 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Eduardo Habkost, pbonzini, Markus Armbruster,
	Eric Blake, kwolf, Maxim Levitsky, Daniel P. Berrangé

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> Coroutine HMP commands currently run to completion in a nested event
> loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> the BQL and cannot process work while the coroutine monitor command is
> running. A deadlock occurs when monitor commands attempt to wait for
> call_rcu work to finish.

I hate to think if there's anywhere else that ends up doing that
other than the monitors.

But, not knowing the semantics of the rcu code, it looks kind of OK to
me from the monitor.

(Do you ever get anything like qemu quitting from one of the other
monitors while this coroutine hasn't been run?)

Dave

> This patch refactors the HMP monitor to use the existing event loop
> instead of creating a nested event loop. This will allow the next
> patches to rely on draining call_rcu work.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  monitor/hmp.c | 28 +++++++++++++++-------------
>  1 file changed, 15 insertions(+), 13 deletions(-)
> 
> diff --git a/monitor/hmp.c b/monitor/hmp.c
> index 69c1b7e98a..6cff2810aa 100644
> --- a/monitor/hmp.c
> +++ b/monitor/hmp.c
> @@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
>      Monitor *mon;
>      const HMPCommand *cmd;
>      QDict *qdict;
> -    bool done;
>  } HandleHmpCommandCo;
>  
> -static void handle_hmp_command_co(void *opaque)
> +static void coroutine_fn handle_hmp_command_co(void *opaque)
>  {
>      HandleHmpCommandCo *data = opaque;
> +
>      handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
>      monitor_set_cur(qemu_coroutine_self(), NULL);
> -    data->done = true;
> +    qobject_unref(data->qdict);
> +    monitor_resume(data->mon);
> +    g_free(data);
>  }
>  
>  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
>          Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
>          handle_hmp_command_exec(&mon->common, cmd, qdict);
>          monitor_set_cur(qemu_coroutine_self(), old_mon);
> +        qobject_unref(qdict);
>      } else {
> -        HandleHmpCommandCo data = {
> -            .mon = &mon->common,
> -            .cmd = cmd,
> -            .qdict = qdict,
> -            .done = false,
> -        };
> -        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
> +        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> +
> +        data = g_new(HandleHmpCommandCo, 1);
> +        data->mon = &mon->common;
> +        data->cmd = cmd;
> +        data->qdict = qdict; /* freed by handle_hmp_command_co() */
> +
> +        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> +        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
>          monitor_set_cur(co, &mon->common);
>          aio_co_enter(qemu_get_aio_context(), co);
> -        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
>      }
> -
> -    qobject_unref(qdict);
>  }
>  
>  static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
> -- 
> 2.41.0
> 
> 
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/3] qmp: make qmp_device_add() a coroutine
  2023-09-06 19:01 [RFC 0/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2023-09-06 19:01 ` [RFC 3/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
@ 2023-09-07 11:28 ` Paolo Bonzini
  2023-09-07 14:00   ` Stefan Hajnoczi
  3 siblings, 1 reply; 20+ messages in thread
From: Paolo Bonzini @ 2023-09-07 11:28 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Dr. David Alan Gilbert, Eduardo Habkost, Markus Armbruster,
	Eric Blake, kwolf, Maxim Levitsky, Daniel P. Berrangé

On 9/6/23 21:01, Stefan Hajnoczi wrote:
> It is not safe to call drain_call_rcu() from qmp_device_add() because
> some call stacks are not prepared for drain_call_rcu() to drop the Big
> QEMU Lock (BQL).
> 
> For example, device emulation code is protected by the BQL but when it
> calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
> BQL is dropped. See https://bugzilla.redhat.com/show_bug.cgi?id=2215192 for a
> concrete bug of this type.
> 
> Another limitation of drain_call_rcu() is that it cannot be invoked within an
> RCU read-side critical section since the reclamation phase cannot complete
> until the end of the critical section. Unfortunately, call stacks have been
> seen where this happens (see
> https://bugzilla.redhat.com/show_bug.cgi?id=2214985).

I think the root cause here is that do_qmp_dispatch_bh is called on the 
wrong context, namely qemu_get_aio_context() instead of 
iohandler_get_aio_context().  This is what causes it to move to the vCPU 
thread.

Auditing all subsystems that use iohandler_get_aio_context(), for 
example via qemu_set_fd_handler(), together with bottom halves, would be 
a bit daunting.

I don't have any objection to this patch series actually, but I would 
like to see if using the right AioContext also fixes the bug---and then 
treat these changes as more of a cleanup.  Coroutines are pretty 
pervasive in QEMU and are not going away which, as you say in the 
updated docs, makes drain_call_rcu_co() preferrable to drain_call_rcu().

Paolo


> This patch series introduces drain_call_rcu_co(), which does the same thing as
> drain_call_rcu() but asynchronously. By yielding back to the event loop we can
> wait until the caller drops the BQL and leaves its RCU read-side critical
> section.
> 
> Patch 1 changes HMP so that coroutine monitor commands yield back to the event
> loop instead of running inside a nested event loop.
> 
> Patch 2 introduces the new drain_call_rcu_co() API.
> 
> Patch 3 converts qmp_device_add() into a coroutine monitor command and uses
> drain_call_rcu_co().
> 
> I'm sending this as an RFC because I don't have confirmation yet that the bugs
> mentioned above are fixed by this patch series.
> 
> Stefan Hajnoczi (3):
>    hmp: avoid the nested event loop in handle_hmp_command()
>    rcu: add drain_call_rcu_co() API
>    qmp: make qmp_device_add() a coroutine
> 
>   MAINTAINERS            |  2 ++
>   docs/devel/rcu.txt     | 21 ++++++++++++++++
>   qapi/qdev.json         |  1 +
>   include/monitor/qdev.h |  3 ++-
>   include/qemu/rcu.h     |  1 +
>   util/rcu-internal.h    |  8 ++++++
>   monitor/hmp.c          | 28 +++++++++++----------
>   monitor/qmp-cmds.c     |  2 +-
>   softmmu/qdev-monitor.c | 34 +++++++++++++++++++++++---
>   util/rcu-co.c          | 55 ++++++++++++++++++++++++++++++++++++++++++
>   util/rcu.c             |  3 ++-
>   hmp-commands.hx        |  1 +
>   util/meson.build       |  2 +-
>   13 files changed, 140 insertions(+), 21 deletions(-)
>   create mode 100644 util/rcu-internal.h
>   create mode 100644 util/rcu-co.c
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/3] qmp: make qmp_device_add() a coroutine
  2023-09-07 11:28 ` [RFC 0/3] " Paolo Bonzini
@ 2023-09-07 14:00   ` Stefan Hajnoczi
  2023-09-07 14:25     ` Paolo Bonzini
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 14:00 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: qemu-devel, Dr. David Alan Gilbert, Eduardo Habkost,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé

[-- Attachment #1: Type: text/plain, Size: 4058 bytes --]

On Thu, Sep 07, 2023 at 01:28:55PM +0200, Paolo Bonzini wrote:
> On 9/6/23 21:01, Stefan Hajnoczi wrote:
> > It is not safe to call drain_call_rcu() from qmp_device_add() because
> > some call stacks are not prepared for drain_call_rcu() to drop the Big
> > QEMU Lock (BQL).
> > 
> > For example, device emulation code is protected by the BQL but when it
> > calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
> > BQL is dropped. See https://bugzilla.redhat.com/show_bug.cgi?id=2215192 for a
> > concrete bug of this type.
> > 
> > Another limitation of drain_call_rcu() is that it cannot be invoked within an
> > RCU read-side critical section since the reclamation phase cannot complete
> > until the end of the critical section. Unfortunately, call stacks have been
> > seen where this happens (see
> > https://bugzilla.redhat.com/show_bug.cgi?id=2214985).
> 
> I think the root cause here is that do_qmp_dispatch_bh is called on the
> wrong context, namely qemu_get_aio_context() instead of
> iohandler_get_aio_context().  This is what causes it to move to the vCPU
> thread.
> 
> Auditing all subsystems that use iohandler_get_aio_context(), for example
> via qemu_set_fd_handler(), together with bottom halves, would be a bit
> daunting.
> 
> I don't have any objection to this patch series actually, but I would like
> to see if using the right AioContext also fixes the bug---and then treat
> these changes as more of a cleanup.  Coroutines are pretty pervasive in QEMU
> and are not going away which, as you say in the updated docs, makes
> drain_call_rcu_co() preferrable to drain_call_rcu().

While I agree that the issue would not happen if monitor commands only
ran in the iohandler AioContext, I don't think we can change that.

When Kevin implemented coroutine commands in commit 9ce44e2ce267 ("qmp:
Move dispatcher to a coroutine"), he used qemu_get_aio_context()
deliberately so that AIO_WAIT_WHILE() can make progress.

I'm not clear on the exact scenario though, because coroutines shouldn't
call AIO_WAIT_WHILE().

Kevin?

There is only one coroutine monitor command that calls the QEMU block
layer: qmp_block_resize(). If we're going to change how the AioContext
works then now is the time to do it before there are more commands that
need to be audited/refactored.

Stefan

> 
> Paolo
> 
> 
> > This patch series introduces drain_call_rcu_co(), which does the same thing as
> > drain_call_rcu() but asynchronously. By yielding back to the event loop we can
> > wait until the caller drops the BQL and leaves its RCU read-side critical
> > section.
> > 
> > Patch 1 changes HMP so that coroutine monitor commands yield back to the event
> > loop instead of running inside a nested event loop.
> > 
> > Patch 2 introduces the new drain_call_rcu_co() API.
> > 
> > Patch 3 converts qmp_device_add() into a coroutine monitor command and uses
> > drain_call_rcu_co().
> > 
> > I'm sending this as an RFC because I don't have confirmation yet that the bugs
> > mentioned above are fixed by this patch series.
> > 
> > Stefan Hajnoczi (3):
> >    hmp: avoid the nested event loop in handle_hmp_command()
> >    rcu: add drain_call_rcu_co() API
> >    qmp: make qmp_device_add() a coroutine
> > 
> >   MAINTAINERS            |  2 ++
> >   docs/devel/rcu.txt     | 21 ++++++++++++++++
> >   qapi/qdev.json         |  1 +
> >   include/monitor/qdev.h |  3 ++-
> >   include/qemu/rcu.h     |  1 +
> >   util/rcu-internal.h    |  8 ++++++
> >   monitor/hmp.c          | 28 +++++++++++----------
> >   monitor/qmp-cmds.c     |  2 +-
> >   softmmu/qdev-monitor.c | 34 +++++++++++++++++++++++---
> >   util/rcu-co.c          | 55 ++++++++++++++++++++++++++++++++++++++++++
> >   util/rcu.c             |  3 ++-
> >   hmp-commands.hx        |  1 +
> >   util/meson.build       |  2 +-
> >   13 files changed, 140 insertions(+), 21 deletions(-)
> >   create mode 100644 util/rcu-internal.h
> >   create mode 100644 util/rcu-co.c
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-07  1:06   ` Dr. David Alan Gilbert
@ 2023-09-07 14:04     ` Stefan Hajnoczi
  2023-09-07 14:07       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 14:04 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, Eduardo Habkost, pbonzini, Markus Armbruster,
	Eric Blake, kwolf, Maxim Levitsky, Daniel P. Berrangé

[-- Attachment #1: Type: text/plain, Size: 4013 bytes --]

On Thu, Sep 07, 2023 at 01:06:39AM +0000, Dr. David Alan Gilbert wrote:
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > Coroutine HMP commands currently run to completion in a nested event
> > loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> > the BQL and cannot process work while the coroutine monitor command is
> > running. A deadlock occurs when monitor commands attempt to wait for
> > call_rcu work to finish.
> 
> I hate to think if there's anywhere else that ends up doing that
> other than the monitors.

Luckily drain_call_rcu() has few callers: just
xen_block_device_destroy() and qmp_device_add(). We only need to worry
about their call stacks.

I haven't looked at the Xen code.

> 
> But, not knowing the semantics of the rcu code, it looks kind of OK to
> me from the monitor.
> 
> (Do you ever get anything like qemu quitting from one of the other
> monitors while this coroutine hasn't been run?)

Not sure what you mean?

Stefan

> 
> Dave
> 
> > This patch refactors the HMP monitor to use the existing event loop
> > instead of creating a nested event loop. This will allow the next
> > patches to rely on draining call_rcu work.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  monitor/hmp.c | 28 +++++++++++++++-------------
> >  1 file changed, 15 insertions(+), 13 deletions(-)
> > 
> > diff --git a/monitor/hmp.c b/monitor/hmp.c
> > index 69c1b7e98a..6cff2810aa 100644
> > --- a/monitor/hmp.c
> > +++ b/monitor/hmp.c
> > @@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
> >      Monitor *mon;
> >      const HMPCommand *cmd;
> >      QDict *qdict;
> > -    bool done;
> >  } HandleHmpCommandCo;
> >  
> > -static void handle_hmp_command_co(void *opaque)
> > +static void coroutine_fn handle_hmp_command_co(void *opaque)
> >  {
> >      HandleHmpCommandCo *data = opaque;
> > +
> >      handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
> >      monitor_set_cur(qemu_coroutine_self(), NULL);
> > -    data->done = true;
> > +    qobject_unref(data->qdict);
> > +    monitor_resume(data->mon);
> > +    g_free(data);
> >  }
> >  
> >  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> >          Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
> >          handle_hmp_command_exec(&mon->common, cmd, qdict);
> >          monitor_set_cur(qemu_coroutine_self(), old_mon);
> > +        qobject_unref(qdict);
> >      } else {
> > -        HandleHmpCommandCo data = {
> > -            .mon = &mon->common,
> > -            .cmd = cmd,
> > -            .qdict = qdict,
> > -            .done = false,
> > -        };
> > -        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
> > +        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> > +
> > +        data = g_new(HandleHmpCommandCo, 1);
> > +        data->mon = &mon->common;
> > +        data->cmd = cmd;
> > +        data->qdict = qdict; /* freed by handle_hmp_command_co() */
> > +
> > +        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> > +        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
> >          monitor_set_cur(co, &mon->common);
> >          aio_co_enter(qemu_get_aio_context(), co);
> > -        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
> >      }
> > -
> > -    qobject_unref(qdict);
> >  }
> >  
> >  static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
> > -- 
> > 2.41.0
> > 
> > 
> -- 
>  -----Open up your eyes, open up your mind, open up your code -------   
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-07 14:04     ` Stefan Hajnoczi
@ 2023-09-07 14:07       ` Dr. David Alan Gilbert
  2023-09-07 15:34         ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2023-09-07 14:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Eduardo Habkost, pbonzini, Markus Armbruster,
	Eric Blake, kwolf, Maxim Levitsky, Daniel P. Berrangé

* Stefan Hajnoczi (stefanha@redhat.com) wrote:
> On Thu, Sep 07, 2023 at 01:06:39AM +0000, Dr. David Alan Gilbert wrote:
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > Coroutine HMP commands currently run to completion in a nested event
> > > loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> > > the BQL and cannot process work while the coroutine monitor command is
> > > running. A deadlock occurs when monitor commands attempt to wait for
> > > call_rcu work to finish.
> > 
> > I hate to think if there's anywhere else that ends up doing that
> > other than the monitors.
> 
> Luckily drain_call_rcu() has few callers: just
> xen_block_device_destroy() and qmp_device_add(). We only need to worry
> about their call stacks.
> 
> I haven't looked at the Xen code.
> 
> > 
> > But, not knowing the semantics of the rcu code, it looks kind of OK to
> > me from the monitor.
> > 
> > (Do you ever get anything like qemu quitting from one of the other
> > monitors while this coroutine hasn't been run?)
> 
> Not sure what you mean?

Imagine that just after you create your coroutine, a vCPU does a
shutdown and qemu is configured to quit, or on another monitor someone
does a quit;  does your coroutine get executed or not?

Dave

> Stefan
> 
> > 
> > Dave
> > 
> > > This patch refactors the HMP monitor to use the existing event loop
> > > instead of creating a nested event loop. This will allow the next
> > > patches to rely on draining call_rcu work.
> > > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > ---
> > >  monitor/hmp.c | 28 +++++++++++++++-------------
> > >  1 file changed, 15 insertions(+), 13 deletions(-)
> > > 
> > > diff --git a/monitor/hmp.c b/monitor/hmp.c
> > > index 69c1b7e98a..6cff2810aa 100644
> > > --- a/monitor/hmp.c
> > > +++ b/monitor/hmp.c
> > > @@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
> > >      Monitor *mon;
> > >      const HMPCommand *cmd;
> > >      QDict *qdict;
> > > -    bool done;
> > >  } HandleHmpCommandCo;
> > >  
> > > -static void handle_hmp_command_co(void *opaque)
> > > +static void coroutine_fn handle_hmp_command_co(void *opaque)
> > >  {
> > >      HandleHmpCommandCo *data = opaque;
> > > +
> > >      handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
> > >      monitor_set_cur(qemu_coroutine_self(), NULL);
> > > -    data->done = true;
> > > +    qobject_unref(data->qdict);
> > > +    monitor_resume(data->mon);
> > > +    g_free(data);
> > >  }
> > >  
> > >  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > >          Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
> > >          handle_hmp_command_exec(&mon->common, cmd, qdict);
> > >          monitor_set_cur(qemu_coroutine_self(), old_mon);
> > > +        qobject_unref(qdict);
> > >      } else {
> > > -        HandleHmpCommandCo data = {
> > > -            .mon = &mon->common,
> > > -            .cmd = cmd,
> > > -            .qdict = qdict,
> > > -            .done = false,
> > > -        };
> > > -        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
> > > +        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> > > +
> > > +        data = g_new(HandleHmpCommandCo, 1);
> > > +        data->mon = &mon->common;
> > > +        data->cmd = cmd;
> > > +        data->qdict = qdict; /* freed by handle_hmp_command_co() */
> > > +
> > > +        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> > > +        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
> > >          monitor_set_cur(co, &mon->common);
> > >          aio_co_enter(qemu_get_aio_context(), co);
> > > -        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
> > >      }
> > > -
> > > -    qobject_unref(qdict);
> > >  }
> > >  
> > >  static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
> > > -- 
> > > 2.41.0
> > > 
> > > 
> > -- 
> >  -----Open up your eyes, open up your mind, open up your code -------   
> > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
> > \        dave @ treblig.org |                               | In Hex /
> >  \ _________________________|_____ http://www.treblig.org   |_______/
> > 


-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/3] qmp: make qmp_device_add() a coroutine
  2023-09-07 14:00   ` Stefan Hajnoczi
@ 2023-09-07 14:25     ` Paolo Bonzini
  2023-09-07 15:29       ` Stefan Hajnoczi
  2023-09-12 17:08       ` Kevin Wolf
  0 siblings, 2 replies; 20+ messages in thread
From: Paolo Bonzini @ 2023-09-07 14:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Dr. David Alan Gilbert, Eduardo Habkost,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé

On Thu, Sep 7, 2023 at 4:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> While I agree that the issue would not happen if monitor commands only
> ran in the iohandler AioContext, I don't think we can change that.
> When Kevin implemented coroutine commands in commit 9ce44e2ce267 ("qmp:
> Move dispatcher to a coroutine"), he used qemu_get_aio_context()
> deliberately so that AIO_WAIT_WHILE() can make progress.

Ah, you are referring to

+        /*
+         * Move the coroutine from iohandler_ctx to qemu_aio_context for
+         * executing the command handler so that it can make progress if it
+         * involves an AIO_WAIT_WHILE().
+         */
+        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
+        qemu_coroutine_yield();

> I'm not clear on the exact scenario though, because coroutines shouldn't
> call AIO_WAIT_WHILE().

I think he meant "so that an AIO_WAIT_WHILE() invoked through a bottom
half will make progress on the coroutine as well".

However I am not sure the comment applies here, because
do_qmp_dispatch_bh() only applies to non-coroutine commands; that
commit allowed monitor commands to run in vCPU threads when they
previously weren't.

Thinking more about it, I don't like that the

    if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
    }

check is in qmp_dispatch() rather than monitor_qmp_dispatch().

Any caller of qmp_dispatch() knows if it is in a coroutine or not.
qemu-ga uses neither a coroutine dispatcher nor coroutine commands.
QEMU uses non-coroutine dispatch for out-of-band commands (and we can
forbid coroutine + allow-oob at the same time), and coroutine dispatch
for the others.

So, moving out of coroutine context (through a bottom half) should be
done by monitor_qmp_dispatch(), and likewise moving temporarily out of
the iohandler context in the case of coroutine commands. In the case
of !req_obj->req you don't need to do either of those. qmp_dispatch()
can still assert that the coroutine-ness of the command matches the
context in which qmp_dispatch() is called.

Once this is done, I think moving out of coroutine context can use a
BH that runs in the iohandler context.

Paolo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/3] qmp: make qmp_device_add() a coroutine
  2023-09-07 14:25     ` Paolo Bonzini
@ 2023-09-07 15:29       ` Stefan Hajnoczi
  2023-09-12 17:08       ` Kevin Wolf
  1 sibling, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 15:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, qemu-devel, Dr. David Alan Gilbert,
	Eduardo Habkost, Markus Armbruster, Eric Blake, kwolf,
	Maxim Levitsky, Daniel P. Berrangé

On Thu, 7 Sept 2023 at 10:26, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Thu, Sep 7, 2023 at 4:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > While I agree that the issue would not happen if monitor commands only
> > ran in the iohandler AioContext, I don't think we can change that.
> > When Kevin implemented coroutine commands in commit 9ce44e2ce267 ("qmp:
> > Move dispatcher to a coroutine"), he used qemu_get_aio_context()
> > deliberately so that AIO_WAIT_WHILE() can make progress.
>
> Ah, you are referring to
>
> +        /*
> +         * Move the coroutine from iohandler_ctx to qemu_aio_context for
> +         * executing the command handler so that it can make progress if it
> +         * involves an AIO_WAIT_WHILE().
> +         */
> +        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
> +        qemu_coroutine_yield();
>
> > I'm not clear on the exact scenario though, because coroutines shouldn't
> > call AIO_WAIT_WHILE().
>
> I think he meant "so that an AIO_WAIT_WHILE() invoked through a bottom
> half will make progress on the coroutine as well".
>
> However I am not sure the comment applies here, because
> do_qmp_dispatch_bh() only applies to non-coroutine commands; that
> commit allowed monitor commands to run in vCPU threads when they
> previously weren't.
>
> Thinking more about it, I don't like that the
>
>     if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
>     }
>
> check is in qmp_dispatch() rather than monitor_qmp_dispatch().
>
> Any caller of qmp_dispatch() knows if it is in a coroutine or not.
> qemu-ga uses neither a coroutine dispatcher nor coroutine commands.
> QEMU uses non-coroutine dispatch for out-of-band commands (and we can
> forbid coroutine + allow-oob at the same time), and coroutine dispatch
> for the others.
>
> So, moving out of coroutine context (through a bottom half) should be
> done by monitor_qmp_dispatch(), and likewise moving temporarily out of
> the iohandler context in the case of coroutine commands. In the case
> of !req_obj->req you don't need to do either of those. qmp_dispatch()
> can still assert that the coroutine-ness of the command matches the
> context in which qmp_dispatch() is called.
>
> Once this is done, I think moving out of coroutine context can use a
> BH that runs in the iohandler context.

I'll wait for Kevin's input and will then revisit the patches based on
the conclusion we come to.

Stefan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-07 14:07       ` Dr. David Alan Gilbert
@ 2023-09-07 15:34         ` Stefan Hajnoczi
  2023-09-07 20:53           ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 15:34 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Stefan Hajnoczi, qemu-devel, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé

On Thu, 7 Sept 2023 at 10:07, Dr. David Alan Gilbert <dave@treblig.org> wrote:
>
> * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > On Thu, Sep 07, 2023 at 01:06:39AM +0000, Dr. David Alan Gilbert wrote:
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > Coroutine HMP commands currently run to completion in a nested event
> > > > loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> > > > the BQL and cannot process work while the coroutine monitor command is
> > > > running. A deadlock occurs when monitor commands attempt to wait for
> > > > call_rcu work to finish.
> > >
> > > I hate to think if there's anywhere else that ends up doing that
> > > other than the monitors.
> >
> > Luckily drain_call_rcu() has few callers: just
> > xen_block_device_destroy() and qmp_device_add(). We only need to worry
> > about their call stacks.
> >
> > I haven't looked at the Xen code.
> >
> > >
> > > But, not knowing the semantics of the rcu code, it looks kind of OK to
> > > me from the monitor.
> > >
> > > (Do you ever get anything like qemu quitting from one of the other
> > > monitors while this coroutine hasn't been run?)
> >
> > Not sure what you mean?
>
> Imagine that just after you create your coroutine, a vCPU does a
> shutdown and qemu is configured to quit, or on another monitor someone
> does a quit;  does your coroutine get executed or not?

I think the answer is that it depends.

A coroutine can run for a while and then yield while waiting for a
timer, BH, fd handler, etc. If the coroutine has yielded then I think
QEMU could terminate.

The behavior of entering a coroutine for the first time depends on the
API that is used (e.g. qemu_coroutine_enter()/aio_co_enter()/etc).
qemu_coroutine_enter() is immediate but aio_co_enter() contains
indirect code paths like scheduling a BH.

To summarize: ¯\_(ツ)_/¯

Stefan

>
> Dave
>
> > Stefan
> >
> > >
> > > Dave
> > >
> > > > This patch refactors the HMP monitor to use the existing event loop
> > > > instead of creating a nested event loop. This will allow the next
> > > > patches to rely on draining call_rcu work.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > ---
> > > >  monitor/hmp.c | 28 +++++++++++++++-------------
> > > >  1 file changed, 15 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/monitor/hmp.c b/monitor/hmp.c
> > > > index 69c1b7e98a..6cff2810aa 100644
> > > > --- a/monitor/hmp.c
> > > > +++ b/monitor/hmp.c
> > > > @@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
> > > >      Monitor *mon;
> > > >      const HMPCommand *cmd;
> > > >      QDict *qdict;
> > > > -    bool done;
> > > >  } HandleHmpCommandCo;
> > > >
> > > > -static void handle_hmp_command_co(void *opaque)
> > > > +static void coroutine_fn handle_hmp_command_co(void *opaque)
> > > >  {
> > > >      HandleHmpCommandCo *data = opaque;
> > > > +
> > > >      handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
> > > >      monitor_set_cur(qemu_coroutine_self(), NULL);
> > > > -    data->done = true;
> > > > +    qobject_unref(data->qdict);
> > > > +    monitor_resume(data->mon);
> > > > +    g_free(data);
> > > >  }
> > > >
> > > >  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > > @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > >          Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
> > > >          handle_hmp_command_exec(&mon->common, cmd, qdict);
> > > >          monitor_set_cur(qemu_coroutine_self(), old_mon);
> > > > +        qobject_unref(qdict);
> > > >      } else {
> > > > -        HandleHmpCommandCo data = {
> > > > -            .mon = &mon->common,
> > > > -            .cmd = cmd,
> > > > -            .qdict = qdict,
> > > > -            .done = false,
> > > > -        };
> > > > -        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
> > > > +        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> > > > +
> > > > +        data = g_new(HandleHmpCommandCo, 1);
> > > > +        data->mon = &mon->common;
> > > > +        data->cmd = cmd;
> > > > +        data->qdict = qdict; /* freed by handle_hmp_command_co() */
> > > > +
> > > > +        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> > > > +        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
> > > >          monitor_set_cur(co, &mon->common);
> > > >          aio_co_enter(qemu_get_aio_context(), co);
> > > > -        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
> > > >      }
> > > > -
> > > > -    qobject_unref(qdict);
> > > >  }
> > > >
> > > >  static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
> > > > --
> > > > 2.41.0
> > > >
> > > >
> > > --
> > >  -----Open up your eyes, open up your mind, open up your code -------
> > > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> > > \        dave @ treblig.org |                               | In Hex /
> > >  \ _________________________|_____ http://www.treblig.org   |_______/
> > >
>
>
> --
>  -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-07 15:34         ` Stefan Hajnoczi
@ 2023-09-07 20:53           ` Dr. David Alan Gilbert
  2023-09-07 21:25             ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Dr. David Alan Gilbert @ 2023-09-07 20:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Stefan Hajnoczi, qemu-devel, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé

* Stefan Hajnoczi (stefanha@gmail.com) wrote:
> On Thu, 7 Sept 2023 at 10:07, Dr. David Alan Gilbert <dave@treblig.org> wrote:
> >
> > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > On Thu, Sep 07, 2023 at 01:06:39AM +0000, Dr. David Alan Gilbert wrote:
> > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > Coroutine HMP commands currently run to completion in a nested event
> > > > > loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> > > > > the BQL and cannot process work while the coroutine monitor command is
> > > > > running. A deadlock occurs when monitor commands attempt to wait for
> > > > > call_rcu work to finish.
> > > >
> > > > I hate to think if there's anywhere else that ends up doing that
> > > > other than the monitors.
> > >
> > > Luckily drain_call_rcu() has few callers: just
> > > xen_block_device_destroy() and qmp_device_add(). We only need to worry
> > > about their call stacks.
> > >
> > > I haven't looked at the Xen code.
> > >
> > > >
> > > > But, not knowing the semantics of the rcu code, it looks kind of OK to
> > > > me from the monitor.
> > > >
> > > > (Do you ever get anything like qemu quitting from one of the other
> > > > monitors while this coroutine hasn't been run?)
> > >
> > > Not sure what you mean?
> >
> > Imagine that just after you create your coroutine, a vCPU does a
> > shutdown and qemu is configured to quit, or on another monitor someone
> > does a quit;  does your coroutine get executed or not?
> 
> I think the answer is that it depends.
> 
> A coroutine can run for a while and then yield while waiting for a
> timer, BH, fd handler, etc. If the coroutine has yielded then I think
> QEMU could terminate.
> 
> The behavior of entering a coroutine for the first time depends on the
> API that is used (e.g. qemu_coroutine_enter()/aio_co_enter()/etc).
> qemu_coroutine_enter() is immediate but aio_co_enter() contains
> indirect code paths like scheduling a BH.
> 
> To summarize: ¯\_(ツ)_/¯

That does mean you leave your g_new'd data and qdict allocated at
exit - meh

I'm not sure if it means you're making any other assumptions about what
happens if the coroutine gets run during the exit path; although I guess
there are plenty of other cases like that.

Dave

> Stefan
> 
> >
> > Dave
> >
> > > Stefan
> > >
> > > >
> > > > Dave
> > > >
> > > > > This patch refactors the HMP monitor to use the existing event loop
> > > > > instead of creating a nested event loop. This will allow the next
> > > > > patches to rely on draining call_rcu work.
> > > > >
> > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > ---
> > > > >  monitor/hmp.c | 28 +++++++++++++++-------------
> > > > >  1 file changed, 15 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/monitor/hmp.c b/monitor/hmp.c
> > > > > index 69c1b7e98a..6cff2810aa 100644
> > > > > --- a/monitor/hmp.c
> > > > > +++ b/monitor/hmp.c
> > > > > @@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
> > > > >      Monitor *mon;
> > > > >      const HMPCommand *cmd;
> > > > >      QDict *qdict;
> > > > > -    bool done;
> > > > >  } HandleHmpCommandCo;
> > > > >
> > > > > -static void handle_hmp_command_co(void *opaque)
> > > > > +static void coroutine_fn handle_hmp_command_co(void *opaque)
> > > > >  {
> > > > >      HandleHmpCommandCo *data = opaque;
> > > > > +
> > > > >      handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
> > > > >      monitor_set_cur(qemu_coroutine_self(), NULL);
> > > > > -    data->done = true;
> > > > > +    qobject_unref(data->qdict);
> > > > > +    monitor_resume(data->mon);
> > > > > +    g_free(data);
> > > > >  }
> > > > >
> > > > >  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > > > @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > > >          Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
> > > > >          handle_hmp_command_exec(&mon->common, cmd, qdict);
> > > > >          monitor_set_cur(qemu_coroutine_self(), old_mon);
> > > > > +        qobject_unref(qdict);
> > > > >      } else {
> > > > > -        HandleHmpCommandCo data = {
> > > > > -            .mon = &mon->common,
> > > > > -            .cmd = cmd,
> > > > > -            .qdict = qdict,
> > > > > -            .done = false,
> > > > > -        };
> > > > > -        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
> > > > > +        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> > > > > +
> > > > > +        data = g_new(HandleHmpCommandCo, 1);
> > > > > +        data->mon = &mon->common;
> > > > > +        data->cmd = cmd;
> > > > > +        data->qdict = qdict; /* freed by handle_hmp_command_co() */
> > > > > +
> > > > > +        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> > > > > +        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
> > > > >          monitor_set_cur(co, &mon->common);
> > > > >          aio_co_enter(qemu_get_aio_context(), co);
> > > > > -        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
> > > > >      }
> > > > > -
> > > > > -    qobject_unref(qdict);
> > > > >  }
> > > > >
> > > > >  static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
> > > > > --
> > > > > 2.41.0
> > > > >
> > > > >
> > > > --
> > > >  -----Open up your eyes, open up your mind, open up your code -------
> > > > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> > > > \        dave @ treblig.org |                               | In Hex /
> > > >  \ _________________________|_____ http://www.treblig.org   |_______/
> > > >
> >
> >
> > --
> >  -----Open up your eyes, open up your mind, open up your code -------
> > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> > \        dave @ treblig.org |                               | In Hex /
> >  \ _________________________|_____ http://www.treblig.org   |_______/
> >
-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command()
  2023-09-07 20:53           ` Dr. David Alan Gilbert
@ 2023-09-07 21:25             ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-07 21:25 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Stefan Hajnoczi, qemu-devel, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, kwolf, Maxim Levitsky,
	Daniel P. Berrangé

On Thu, 7 Sept 2023 at 16:53, Dr. David Alan Gilbert <dave@treblig.org> wrote:
>
> * Stefan Hajnoczi (stefanha@gmail.com) wrote:
> > On Thu, 7 Sept 2023 at 10:07, Dr. David Alan Gilbert <dave@treblig.org> wrote:
> > >
> > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > On Thu, Sep 07, 2023 at 01:06:39AM +0000, Dr. David Alan Gilbert wrote:
> > > > > * Stefan Hajnoczi (stefanha@redhat.com) wrote:
> > > > > > Coroutine HMP commands currently run to completion in a nested event
> > > > > > loop with the Big QEMU Lock (BQL) held. The call_rcu thread also uses
> > > > > > the BQL and cannot process work while the coroutine monitor command is
> > > > > > running. A deadlock occurs when monitor commands attempt to wait for
> > > > > > call_rcu work to finish.
> > > > >
> > > > > I hate to think if there's anywhere else that ends up doing that
> > > > > other than the monitors.
> > > >
> > > > Luckily drain_call_rcu() has few callers: just
> > > > xen_block_device_destroy() and qmp_device_add(). We only need to worry
> > > > about their call stacks.
> > > >
> > > > I haven't looked at the Xen code.
> > > >
> > > > >
> > > > > But, not knowing the semantics of the rcu code, it looks kind of OK to
> > > > > me from the monitor.
> > > > >
> > > > > (Do you ever get anything like qemu quitting from one of the other
> > > > > monitors while this coroutine hasn't been run?)
> > > >
> > > > Not sure what you mean?
> > >
> > > Imagine that just after you create your coroutine, a vCPU does a
> > > shutdown and qemu is configured to quit, or on another monitor someone
> > > does a quit;  does your coroutine get executed or not?
> >
> > I think the answer is that it depends.
> >
> > A coroutine can run for a while and then yield while waiting for a
> > timer, BH, fd handler, etc. If the coroutine has yielded then I think
> > QEMU could terminate.
> >
> > The behavior of entering a coroutine for the first time depends on the
> > API that is used (e.g. qemu_coroutine_enter()/aio_co_enter()/etc).
> > qemu_coroutine_enter() is immediate but aio_co_enter() contains
> > indirect code paths like scheduling a BH.
> >
> > To summarize: ¯\_(ツ)_/¯
>
> That does mean you leave your g_new'd data and qdict allocated at
> exit - meh
>
> I'm not sure if it means you're making any other assumptions about what
> happens if the coroutine gets run during the exit path; although I guess
> there are plenty of other cases like that.

Yes, I think QEMU has some resources (memory, file descriptors, etc)
that are not freed on exit.

Stefan

>
> Dave
>
> > Stefan
> >
> > >
> > > Dave
> > >
> > > > Stefan
> > > >
> > > > >
> > > > > Dave
> > > > >
> > > > > > This patch refactors the HMP monitor to use the existing event loop
> > > > > > instead of creating a nested event loop. This will allow the next
> > > > > > patches to rely on draining call_rcu work.
> > > > > >
> > > > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > > > > ---
> > > > > >  monitor/hmp.c | 28 +++++++++++++++-------------
> > > > > >  1 file changed, 15 insertions(+), 13 deletions(-)
> > > > > >
> > > > > > diff --git a/monitor/hmp.c b/monitor/hmp.c
> > > > > > index 69c1b7e98a..6cff2810aa 100644
> > > > > > --- a/monitor/hmp.c
> > > > > > +++ b/monitor/hmp.c
> > > > > > @@ -1111,15 +1111,17 @@ typedef struct HandleHmpCommandCo {
> > > > > >      Monitor *mon;
> > > > > >      const HMPCommand *cmd;
> > > > > >      QDict *qdict;
> > > > > > -    bool done;
> > > > > >  } HandleHmpCommandCo;
> > > > > >
> > > > > > -static void handle_hmp_command_co(void *opaque)
> > > > > > +static void coroutine_fn handle_hmp_command_co(void *opaque)
> > > > > >  {
> > > > > >      HandleHmpCommandCo *data = opaque;
> > > > > > +
> > > > > >      handle_hmp_command_exec(data->mon, data->cmd, data->qdict);
> > > > > >      monitor_set_cur(qemu_coroutine_self(), NULL);
> > > > > > -    data->done = true;
> > > > > > +    qobject_unref(data->qdict);
> > > > > > +    monitor_resume(data->mon);
> > > > > > +    g_free(data);
> > > > > >  }
> > > > > >
> > > > > >  void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > > > > @@ -1157,20 +1159,20 @@ void handle_hmp_command(MonitorHMP *mon, const char *cmdline)
> > > > > >          Monitor *old_mon = monitor_set_cur(qemu_coroutine_self(), &mon->common);
> > > > > >          handle_hmp_command_exec(&mon->common, cmd, qdict);
> > > > > >          monitor_set_cur(qemu_coroutine_self(), old_mon);
> > > > > > +        qobject_unref(qdict);
> > > > > >      } else {
> > > > > > -        HandleHmpCommandCo data = {
> > > > > > -            .mon = &mon->common,
> > > > > > -            .cmd = cmd,
> > > > > > -            .qdict = qdict,
> > > > > > -            .done = false,
> > > > > > -        };
> > > > > > -        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, &data);
> > > > > > +        HandleHmpCommandCo *data; /* freed by handle_hmp_command_co() */
> > > > > > +
> > > > > > +        data = g_new(HandleHmpCommandCo, 1);
> > > > > > +        data->mon = &mon->common;
> > > > > > +        data->cmd = cmd;
> > > > > > +        data->qdict = qdict; /* freed by handle_hmp_command_co() */
> > > > > > +
> > > > > > +        Coroutine *co = qemu_coroutine_create(handle_hmp_command_co, data);
> > > > > > +        monitor_suspend(&mon->common); /* resumed by handle_hmp_command_co() */
> > > > > >          monitor_set_cur(co, &mon->common);
> > > > > >          aio_co_enter(qemu_get_aio_context(), co);
> > > > > > -        AIO_WAIT_WHILE_UNLOCKED(NULL, !data.done);
> > > > > >      }
> > > > > > -
> > > > > > -    qobject_unref(qdict);
> > > > > >  }
> > > > > >
> > > > > >  static void cmd_completion(MonitorHMP *mon, const char *name, const char *list)
> > > > > > --
> > > > > > 2.41.0
> > > > > >
> > > > > >
> > > > > --
> > > > >  -----Open up your eyes, open up your mind, open up your code -------
> > > > > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> > > > > \        dave @ treblig.org |                               | In Hex /
> > > > >  \ _________________________|_____ http://www.treblig.org   |_______/
> > > > >
> > >
> > >
> > > --
> > >  -----Open up your eyes, open up your mind, open up your code -------
> > > / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> > > \        dave @ treblig.org |                               | In Hex /
> > >  \ _________________________|_____ http://www.treblig.org   |_______/
> > >
> --
>  -----Open up your eyes, open up your mind, open up your code -------
> / Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \
> \        dave @ treblig.org |                               | In Hex /
>  \ _________________________|_____ http://www.treblig.org   |_______/


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/3] rcu: add drain_call_rcu_co() API
  2023-09-06 19:01 ` [RFC 2/3] rcu: add drain_call_rcu_co() API Stefan Hajnoczi
@ 2023-09-12 16:36   ` Kevin Wolf
  2023-09-12 16:52     ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2023-09-12 16:36 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Dr. David Alan Gilbert, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, Maxim Levitsky,
	Daniel P. Berrangé

Am 06.09.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> call_drain_rcu() has limitations that make it unsuitable for use in
> qmp_device_add().

This sounds a bit vague with only alluding to some unnamed limitations.
I assume that you mean the two points you add to rcu.txt. If so, maybe
it would be better to add a reference to that in the commit message.

> Introduce a new coroutine version of drain_call_rcu()
> with the same functionality but that does not drop the BQL. The next
> patch will use it to fix qmp_device_add().
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

I don't understand the reasoning here. How does yielding from the
coroutine not effectively release the BQL, too? It's just that you won't
have explicit code here, but the mainloop will do it for you while
waiting for new events.

Is this about not dropping the BQL specifically in nested event loops,
but letting the coroutine wait until we return to the real main loop
where dropping the BQL is hopefully not a problem?

call_rcu_thread() still waits for the BQL to be dropped somewhere, so
from the perspective of the coroutine, it will definitely be dropped
during the yield.

So if this was your intention, the change probably makes sense, but the
description could be clearer. Took me a bit to understand what this is
really doing.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/3] qmp: make qmp_device_add() a coroutine
  2023-09-06 19:01 ` [RFC 3/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
@ 2023-09-12 16:47   ` Kevin Wolf
  2023-09-12 17:04     ` Stefan Hajnoczi
  0 siblings, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2023-09-12 16:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Dr. David Alan Gilbert, Eduardo Habkost, pbonzini,
	Markus Armbruster, Eric Blake, Maxim Levitsky,
	Daniel P. Berrangé

Am 06.09.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> It is not safe to call drain_call_rcu() from qmp_device_add() because
> some call stacks are not prepared for drain_call_rcu() to drop the Big
> QEMU Lock (BQL).
> 
> For example, device emulation code is protected by the BQL but when it
> calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
> BQL is dropped. See bz#2215192 below for a concrete bug of this type.
> 
> Another limitation of drain_call_rcu() is that it cannot be invoked
> within an RCU read-side critical section since the reclamation phase
> cannot complete until the end of the critical section. Unfortunately,
> call stacks have been seen where this happens (see bz#2214985 below).
> 
> Switch to call_drain_rcu_co() to avoid these problems. This requires
> making qmp_device_add() a coroutine. qdev_device_add() is not designed
> to be called from coroutines, so it must be invoked from a BH and then
> switch back to the coroutine.
> 
> Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in in qmp_device_add")
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

Can you please include the relevant information directly in the commit
message instead of only referencing Bugzilla? Both bugs only contain
half of the story - I'm not even sure if the link with the stack trace
is publically accessible - and then I think you got some information
only from reproducing it yourself, and this information is missing from
the bug reports. (The other question is how long the information will
still be available in Bugzilla.)

>  qapi/qdev.json         |  1 +
>  include/monitor/qdev.h |  3 ++-
>  monitor/qmp-cmds.c     |  2 +-
>  softmmu/qdev-monitor.c | 34 ++++++++++++++++++++++++++++++----
>  hmp-commands.hx        |  1 +
>  5 files changed, 35 insertions(+), 6 deletions(-)
> 
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 6bc5a733b8..78e9d7f7b8 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -79,6 +79,7 @@
>  ##
>  { 'command': 'device_add',
>    'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
> +  'coroutine': true,
>    'gen': false, # so we can get the additional arguments
>    'features': ['json-cli', 'json-cli-hotplug'] }
>  
> diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
> index 1d57bf6577..1fed9eb9ea 100644
> --- a/include/monitor/qdev.h
> +++ b/include/monitor/qdev.h
> @@ -5,7 +5,8 @@
>  
>  void hmp_info_qtree(Monitor *mon, const QDict *qdict);
>  void hmp_info_qdm(Monitor *mon, const QDict *qdict);
> -void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
> +void coroutine_fn
> +qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
>  
>  int qdev_device_help(QemuOpts *opts);
>  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
> diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> index b0f948d337..a7419226fe 100644
> --- a/monitor/qmp-cmds.c
> +++ b/monitor/qmp-cmds.c
> @@ -202,7 +202,7 @@ static void __attribute__((__constructor__)) monitor_init_qmp_commands(void)
>      qmp_init_marshal(&qmp_commands);
>  
>      qmp_register_command(&qmp_commands, "device_add",
> -                         qmp_device_add, 0, 0);
> +                         qmp_device_add, QCO_COROUTINE, 0);
>  
>      QTAILQ_INIT(&qmp_cap_negotiation_commands);
>      qmp_register_command(&qmp_cap_negotiation_commands, "qmp_capabilities",
> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 74f4e41338..85ae62f7cf 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -839,8 +839,28 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
>      qdev_print_devinfos(true);
>  }
>  
> -void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
> +typedef struct {
> +    Coroutine *co;
> +    QemuOpts *opts;
> +    Error **errp;
> +    DeviceState *dev;
> +} QmpDeviceAdd;
> +
> +static void qmp_device_add_bh(void *opaque)
>  {
> +    QmpDeviceAdd *data = opaque;
> +
> +    data->dev = qdev_device_add(data->opts, data->errp);
> +    aio_co_wake(data->co);
> +}
> +
> +void coroutine_fn
> +qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
> +{
> +    QmpDeviceAdd data = {
> +        .co = qemu_coroutine_self(),
> +        .errp = errp,
> +    };
>      QemuOpts *opts;
>      DeviceState *dev;
>  
> @@ -852,7 +872,13 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
>          qemu_opts_del(opts);
>          return;
>      }
> -    dev = qdev_device_add(opts, errp);
> +
> +    /* Perform qdev_device_add() call outside coroutine context */
> +    data.opts = opts;
> +    aio_bh_schedule_oneshot(qemu_coroutine_get_aio_context(data.co),
> +                            qmp_device_add_bh, &data);
> +    qemu_coroutine_yield();
> +    dev = data.dev;

I wonder if we should make no_co_wrapper etc. available outside of the
block layer, so we could just declare a qdev_co_device_add() and call it
here and the code would automatically be generated.

This doesn't work today because the script generates only a single
source file for all wrappers, which is linked with all of the tools. So
putting qdev functions there would make the build fail.

I had a similar case in the virtio_load() fix where I decided to write
the wrapper manually instead. But having two cases in such a short time
frame might be a sign that we actually have enough potential users that
making the generator work for it would be worth it.

Kevin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/3] rcu: add drain_call_rcu_co() API
  2023-09-12 16:36   ` Kevin Wolf
@ 2023-09-12 16:52     ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-12 16:52 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, qemu-devel, Dr. David Alan Gilbert,
	Eduardo Habkost, pbonzini, Markus Armbruster, Eric Blake,
	Maxim Levitsky, Daniel P. Berrangé

On Tue, 12 Sept 2023 at 12:37, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 06.09.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> > call_drain_rcu() has limitations that make it unsuitable for use in
> > qmp_device_add().
>
> This sounds a bit vague with only alluding to some unnamed limitations.
> I assume that you mean the two points you add to rcu.txt. If so, maybe
> it would be better to add a reference to that in the commit message.

Yes, exactly. I will add a reference to the commit message.

>
> > Introduce a new coroutine version of drain_call_rcu()
> > with the same functionality but that does not drop the BQL. The next
> > patch will use it to fix qmp_device_add().
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> I don't understand the reasoning here. How does yielding from the
> coroutine not effectively release the BQL, too? It's just that you won't
> have explicit code here, but the mainloop will do it for you while
> waiting for new events.
>
> Is this about not dropping the BQL specifically in nested event loops,
> but letting the coroutine wait until we return to the real main loop
> where dropping the BQL is hopefully not a problem?

Yes.

Stefan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/3] qmp: make qmp_device_add() a coroutine
  2023-09-12 16:47   ` Kevin Wolf
@ 2023-09-12 17:04     ` Stefan Hajnoczi
  0 siblings, 0 replies; 20+ messages in thread
From: Stefan Hajnoczi @ 2023-09-12 17:04 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, qemu-devel, Dr. David Alan Gilbert,
	Eduardo Habkost, pbonzini, Markus Armbruster, Eric Blake,
	Maxim Levitsky, Daniel P. Berrangé

On Tue, 12 Sept 2023 at 12:47, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 06.09.2023 um 21:01 hat Stefan Hajnoczi geschrieben:
> > It is not safe to call drain_call_rcu() from qmp_device_add() because
> > some call stacks are not prepared for drain_call_rcu() to drop the Big
> > QEMU Lock (BQL).
> >
> > For example, device emulation code is protected by the BQL but when it
> > calls aio_poll() -> ... -> qmp_device_add() -> drain_call_rcu() then the
> > BQL is dropped. See bz#2215192 below for a concrete bug of this type.
> >
> > Another limitation of drain_call_rcu() is that it cannot be invoked
> > within an RCU read-side critical section since the reclamation phase
> > cannot complete until the end of the critical section. Unfortunately,
> > call stacks have been seen where this happens (see bz#2214985 below).
> >
> > Switch to call_drain_rcu_co() to avoid these problems. This requires
> > making qmp_device_add() a coroutine. qdev_device_add() is not designed
> > to be called from coroutines, so it must be invoked from a BH and then
> > switch back to the coroutine.
> >
> > Fixes: 7bed89958bfbf40df9ca681cefbdca63abdde39d ("device_core: use drain_call_rcu in in qmp_device_add")
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215192
> > Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2214985
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> Can you please include the relevant information directly in the commit
> message instead of only referencing Bugzilla? Both bugs only contain
> half of the story - I'm not even sure if the link with the stack trace
> is publically accessible - and then I think you got some information
> only from reproducing it yourself, and this information is missing from
> the bug reports. (The other question is how long the information will
> still be available in Bugzilla.)

Yes, I'll include the details in the commit description.

>
> >  qapi/qdev.json         |  1 +
> >  include/monitor/qdev.h |  3 ++-
> >  monitor/qmp-cmds.c     |  2 +-
> >  softmmu/qdev-monitor.c | 34 ++++++++++++++++++++++++++++++----
> >  hmp-commands.hx        |  1 +
> >  5 files changed, 35 insertions(+), 6 deletions(-)
> >
> > diff --git a/qapi/qdev.json b/qapi/qdev.json
> > index 6bc5a733b8..78e9d7f7b8 100644
> > --- a/qapi/qdev.json
> > +++ b/qapi/qdev.json
> > @@ -79,6 +79,7 @@
> >  ##
> >  { 'command': 'device_add',
> >    'data': {'driver': 'str', '*bus': 'str', '*id': 'str'},
> > +  'coroutine': true,
> >    'gen': false, # so we can get the additional arguments
> >    'features': ['json-cli', 'json-cli-hotplug'] }
> >
> > diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
> > index 1d57bf6577..1fed9eb9ea 100644
> > --- a/include/monitor/qdev.h
> > +++ b/include/monitor/qdev.h
> > @@ -5,7 +5,8 @@
> >
> >  void hmp_info_qtree(Monitor *mon, const QDict *qdict);
> >  void hmp_info_qdm(Monitor *mon, const QDict *qdict);
> > -void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
> > +void coroutine_fn
> > +qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp);
> >
> >  int qdev_device_help(QemuOpts *opts);
> >  DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
> > diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
> > index b0f948d337..a7419226fe 100644
> > --- a/monitor/qmp-cmds.c
> > +++ b/monitor/qmp-cmds.c
> > @@ -202,7 +202,7 @@ static void __attribute__((__constructor__)) monitor_init_qmp_commands(void)
> >      qmp_init_marshal(&qmp_commands);
> >
> >      qmp_register_command(&qmp_commands, "device_add",
> > -                         qmp_device_add, 0, 0);
> > +                         qmp_device_add, QCO_COROUTINE, 0);
> >
> >      QTAILQ_INIT(&qmp_cap_negotiation_commands);
> >      qmp_register_command(&qmp_cap_negotiation_commands, "qmp_capabilities",
> > diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> > index 74f4e41338..85ae62f7cf 100644
> > --- a/softmmu/qdev-monitor.c
> > +++ b/softmmu/qdev-monitor.c
> > @@ -839,8 +839,28 @@ void hmp_info_qdm(Monitor *mon, const QDict *qdict)
> >      qdev_print_devinfos(true);
> >  }
> >
> > -void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
> > +typedef struct {
> > +    Coroutine *co;
> > +    QemuOpts *opts;
> > +    Error **errp;
> > +    DeviceState *dev;
> > +} QmpDeviceAdd;
> > +
> > +static void qmp_device_add_bh(void *opaque)
> >  {
> > +    QmpDeviceAdd *data = opaque;
> > +
> > +    data->dev = qdev_device_add(data->opts, data->errp);
> > +    aio_co_wake(data->co);
> > +}
> > +
> > +void coroutine_fn
> > +qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
> > +{
> > +    QmpDeviceAdd data = {
> > +        .co = qemu_coroutine_self(),
> > +        .errp = errp,
> > +    };
> >      QemuOpts *opts;
> >      DeviceState *dev;
> >
> > @@ -852,7 +872,13 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error **errp)
> >          qemu_opts_del(opts);
> >          return;
> >      }
> > -    dev = qdev_device_add(opts, errp);
> > +
> > +    /* Perform qdev_device_add() call outside coroutine context */
> > +    data.opts = opts;
> > +    aio_bh_schedule_oneshot(qemu_coroutine_get_aio_context(data.co),
> > +                            qmp_device_add_bh, &data);
> > +    qemu_coroutine_yield();
> > +    dev = data.dev;
>
> I wonder if we should make no_co_wrapper etc. available outside of the
> block layer, so we could just declare a qdev_co_device_add() and call it
> here and the code would automatically be generated.
>
> This doesn't work today because the script generates only a single
> source file for all wrappers, which is linked with all of the tools. So
> putting qdev functions there would make the build fail.
>
> I had a similar case in the virtio_load() fix where I decided to write
> the wrapper manually instead. But having two cases in such a short time
> frame might be a sign that we actually have enough potential users that
> making the generator work for it would be worth it.

In principal I'm happy to do that. Before I continue working on the
coroutine version of qmp_device_add(), please let us know your
thoughts about Paolo's idea.

If I understand correctly, Paolo's idea is to refactor the monitor
code so that non-coroutine monitor commands run in the iohandler
AioContext, thus avoiding the drain_call_rcu() vs nested event loops
issue. It would not be necessary to make qmp_device_add() a coroutine
anymore since drain_call_rcu() could be called safely.

Does that sound okay or are you aware of a case where this doesn't work?

Stefan


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/3] qmp: make qmp_device_add() a coroutine
  2023-09-07 14:25     ` Paolo Bonzini
  2023-09-07 15:29       ` Stefan Hajnoczi
@ 2023-09-12 17:08       ` Kevin Wolf
  2023-09-13 11:38         ` Paolo Bonzini
  1 sibling, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2023-09-12 17:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Stefan Hajnoczi, qemu-devel, Dr. David Alan Gilbert,
	Eduardo Habkost, Markus Armbruster, Eric Blake, Maxim Levitsky,
	Daniel P. Berrangé

Am 07.09.2023 um 16:25 hat Paolo Bonzini geschrieben:
> On Thu, Sep 7, 2023 at 4:00 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > While I agree that the issue would not happen if monitor commands only
> > ran in the iohandler AioContext, I don't think we can change that.
> > When Kevin implemented coroutine commands in commit 9ce44e2ce267 ("qmp:
> > Move dispatcher to a coroutine"), he used qemu_get_aio_context()
> > deliberately so that AIO_WAIT_WHILE() can make progress.
> 
> Ah, you are referring to
> 
> +        /*
> +         * Move the coroutine from iohandler_ctx to qemu_aio_context for
> +         * executing the command handler so that it can make progress if it
> +         * involves an AIO_WAIT_WHILE().
> +         */
> +        aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
> +        qemu_coroutine_yield();
> 
> > I'm not clear on the exact scenario though, because coroutines shouldn't
> > call AIO_WAIT_WHILE().
> 
> I think he meant "so that an AIO_WAIT_WHILE() invoked through a bottom
> half will make progress on the coroutine as well".

It's been a while, but I think I may have meant an AIO_WAIT_WHILE() that
is executed by someone else and that depends on the coroutine. For
example, I imagine this is what I could have seen:

1. The QMP command handler does some I/O and yields for it (like
   updating the qcow2 header for block_resize) with increased
   bs->in_flight

2. Something else calls drain, which polls qemu_aio_context, but not
   iohandler_ctx, until the request completes.

3. Nothing will ever resume the coroutine -> deadlock

> However I am not sure the comment applies here, because
> do_qmp_dispatch_bh() only applies to non-coroutine commands; that
> commit allowed monitor commands to run in vCPU threads when they
> previously weren't.
> 
> Thinking more about it, I don't like that the
> 
>     if (!!(cmd->options & QCO_COROUTINE) == qemu_in_coroutine()) {
>     }
> 
> check is in qmp_dispatch() rather than monitor_qmp_dispatch().
> 
> Any caller of qmp_dispatch() knows if it is in a coroutine or not.
> qemu-ga uses neither a coroutine dispatcher nor coroutine commands.
> QEMU uses non-coroutine dispatch for out-of-band commands (and we can
> forbid coroutine + allow-oob at the same time), and coroutine dispatch
> for the others.
> 
> So, moving out of coroutine context (through a bottom half) should be
> done by monitor_qmp_dispatch(), and likewise moving temporarily out of
> the iohandler context in the case of coroutine commands. In the case
> of !req_obj->req you don't need to do either of those. qmp_dispatch()
> can still assert that the coroutine-ness of the command matches the
> context in which qmp_dispatch() is called.
> 
> Once this is done, I think moving out of coroutine context can use a
> BH that runs in the iohandler context.

Non-coroutine handlers could probably stay in iothread_ctx, but I don't
think we can avoid switching to a different for coroutine handlers.

So maybe we can just move the rescheduling down to the coroutine case in
qmp_dispatch().

Kevin



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/3] qmp: make qmp_device_add() a coroutine
  2023-09-12 17:08       ` Kevin Wolf
@ 2023-09-13 11:38         ` Paolo Bonzini
  0 siblings, 0 replies; 20+ messages in thread
From: Paolo Bonzini @ 2023-09-13 11:38 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Stefan Hajnoczi, qemu-devel, Dr. David Alan Gilbert,
	Eduardo Habkost, Markus Armbruster, Eric Blake, Maxim Levitsky,
	Daniel P. Berrangé

On Tue, Sep 12, 2023 at 7:08 PM Kevin Wolf <kwolf@redhat.com> wrote:
> > Any caller of qmp_dispatch() knows if it is in a coroutine or not.
> > qemu-ga uses neither a coroutine dispatcher nor coroutine commands.
> > QEMU uses non-coroutine dispatch for out-of-band commands (and we can
> > forbid coroutine + allow-oob at the same time), and coroutine dispatch
> > for the others.
> >
> > So, moving out of coroutine context (through a bottom half) should be
> > done by monitor_qmp_dispatch(), and likewise moving temporarily out of
> > the iohandler context in the case of coroutine commands. In the case
> > of !req_obj->req you don't need to do either of those. qmp_dispatch()
> > can still assert that the coroutine-ness of the command matches the
> > context in which qmp_dispatch() is called.
> >
> > Once this is done, I think moving out of coroutine context can use a
> > BH that runs in the iohandler context.
>
> Non-coroutine handlers could probably stay in iothread_ctx, but I don't
> think we can avoid switching to a different for coroutine handlers.

Agreed.

> So maybe we can just move the rescheduling down to the coroutine case in
> qmp_dispatch().

Not sure about qmp_dispatch (see above: any caller of the function
knows if it is in a coroutine or not, and qemu-ga need not know about
coroutines at all). But what you said also applies if the rescheduling
is only pushed to monitor_qmp_dispatch(), which would be my first
option.

Thanks!

Paolo



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-09-13 11:38 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-06 19:01 [RFC 0/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
2023-09-06 19:01 ` [RFC 1/3] hmp: avoid the nested event loop in handle_hmp_command() Stefan Hajnoczi
2023-09-07  1:06   ` Dr. David Alan Gilbert
2023-09-07 14:04     ` Stefan Hajnoczi
2023-09-07 14:07       ` Dr. David Alan Gilbert
2023-09-07 15:34         ` Stefan Hajnoczi
2023-09-07 20:53           ` Dr. David Alan Gilbert
2023-09-07 21:25             ` Stefan Hajnoczi
2023-09-06 19:01 ` [RFC 2/3] rcu: add drain_call_rcu_co() API Stefan Hajnoczi
2023-09-12 16:36   ` Kevin Wolf
2023-09-12 16:52     ` Stefan Hajnoczi
2023-09-06 19:01 ` [RFC 3/3] qmp: make qmp_device_add() a coroutine Stefan Hajnoczi
2023-09-12 16:47   ` Kevin Wolf
2023-09-12 17:04     ` Stefan Hajnoczi
2023-09-07 11:28 ` [RFC 0/3] " Paolo Bonzini
2023-09-07 14:00   ` Stefan Hajnoczi
2023-09-07 14:25     ` Paolo Bonzini
2023-09-07 15:29       ` Stefan Hajnoczi
2023-09-12 17:08       ` Kevin Wolf
2023-09-13 11:38         ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).