[PATCH 0/2] accel/tcg: Fix monitor deadlock

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] accel/tcg: Fix monitor deadlock
@ 2021-10-15 16:12 Greg Kurz
  2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
  2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
  0 siblings, 2 replies; 5+ messages in thread
From: Greg Kurz @ 2021-10-15 16:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Richard Henderson, Greg Kurz, qemu-stable,
	Paolo Bonzini

Commit 7bed89958bfb ("device_core: use drain_call_rcu in in qmp_device_add")
introduced a regression in QEMU 6.0 : passing device_add without argument
hangs the monitor. This was reported against qemu-system-mips64 with TGC,
but I could consistently reproduce it with other targets (x86 and ppc64).

See https://gitlab.com/qemu-project/qemu/-/issues/650 for details.

The problem is that an emulated busy-looping vCPU can stay forever in
its RCU read-side critical section and prevent drain_call_rcu() to return.
This series fixes the issue by letting RCU kick vCPUs out of the read-side
critical section when drain_call_rcu() is in progress. This is achieved
through notifiers, as suggested by Paolo Bonzini.

Greg Kurz (2):
  rcu: Introduce force_rcu notifier
  accel/tcg: Register a force_rcu notifier

 accel/tcg/tcg-accel-ops-mttcg.c |  3 ++-
 accel/tcg/tcg-accel-ops-rr.c    |  3 ++-
 accel/tcg/tcg-accel-ops.c       | 11 +++++++++++
 accel/tcg/tcg-accel-ops.h       |  2 ++
 include/hw/core/cpu.h           |  2 ++
 include/qemu/rcu.h              | 21 ++++++++++++++++++++-
 util/rcu.c                      | 23 +++++++++++++++++++++--
 7 files changed, 60 insertions(+), 5 deletions(-)

-- 
2.31.1

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/2] rcu: Introduce force_rcu notifier
  2021-10-15 16:12 [PATCH 0/2] accel/tcg: Fix monitor deadlock Greg Kurz
@ 2021-10-15 16:12 ` Greg Kurz
  2021-10-18  9:17   ` Paolo Bonzini
  2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
  1 sibling, 1 reply; 5+ messages in thread
From: Greg Kurz @ 2021-10-15 16:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Richard Henderson, Greg Kurz, qemu-stable,
	Paolo Bonzini

The drain_rcu_call() function can be blocked as long as an RCU reader
stays in a read-side critical section. This is typically what happens
when a TCG vCPU is executing a busy loop. It can deadlock the QEMU
monitor as reported in https://gitlab.com/qemu-project/qemu/-/issues/650 .

This can be avoided by allowing drain_rcu_call() to enforce an RCU grace
period. Since each reader might need to do specific actions to end a
read-side critical section, do it with notifiers.

Prepare ground for this by adding a NotifierList and use it in
wait_for_readers() if drain_rcu_call() is in progress. Readers can
now optionally specify a Notifier to be called in this case at
thread registration time. The current rcu_register_thread() API is
preserved for readers that don't need this. The notifier is removed
automatically when the thread unregisters.

This is largely based on a draft from Paolo Bonzini.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/qemu/rcu.h | 21 ++++++++++++++++++++-
 util/rcu.c         | 23 +++++++++++++++++++++--
 2 files changed, 41 insertions(+), 3 deletions(-)

diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 515d327cf11c..498e4e5e3479 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -27,6 +27,7 @@
 #include "qemu/thread.h"
 #include "qemu/queue.h"
 #include "qemu/atomic.h"
+#include "qemu/notify.h"
 #include "qemu/sys_membarrier.h"
 
 #ifdef __cplusplus
@@ -66,6 +67,13 @@ struct rcu_reader_data {
 
     /* Data used for registry, protected by rcu_registry_lock */
     QLIST_ENTRY(rcu_reader_data) node;
+
+    /*
+     * Notifier used to force an RCU grace period.  Accessed under
+     * rcu_registry_lock.  Note that the notifier is called _outside_
+     * the thread!
+     */
+    Notifier *force_rcu;
 };
 
 extern __thread struct rcu_reader_data rcu_reader;
@@ -114,8 +122,19 @@ extern void synchronize_rcu(void);
 
 /*
  * Reader thread registration.
+ *
+ * The caller can specify an optional notifier if it wants RCU
+ * to enforce grace periods. This is needed by drain_call_rcu().
+ * Note that the notifier is executed in the context of the RCU
+ * thread.
  */
-extern void rcu_register_thread(void);
+extern void rcu_register_thread_with_force_rcu(Notifier *n);
+
+static inline void rcu_register_thread(void)
+{
+    rcu_register_thread_with_force_rcu(NULL);
+}
+
 extern void rcu_unregister_thread(void);
 
 /*
diff --git a/util/rcu.c b/util/rcu.c
index 13ac0f75cb2a..da3506917fa8 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -46,9 +46,17 @@
 unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
 
 QemuEvent rcu_gp_event;
+static int in_drain_call_rcu;
 static QemuMutex rcu_registry_lock;
 static QemuMutex rcu_sync_lock;
 
+/*
+ * NotifierList used to force an RCU grace period.  Accessed under
+ * rcu_registry_lock.
+ */
+static NotifierList force_rcu_notifiers =
+    NOTIFIER_LIST_INITIALIZER(force_rcu_notifiers);
+
 /*
  * Check whether a quiescent state was crossed between the beginning of
  * update_counter_and_wait and now.
@@ -107,6 +115,8 @@ static void wait_for_readers(void)
                  * get some extra futex wakeups.
                  */
                 qatomic_set(&index->waiting, false);
+            } else if (qatomic_read(&in_drain_call_rcu)) {
+                notifier_list_notify(&force_rcu_notifiers, NULL);
             }
         }
 
@@ -293,7 +303,6 @@ void call_rcu1(struct rcu_head *node, void (*func)(struct rcu_head *node))
     qemu_event_set(&rcu_call_ready_event);
 }
 
-
 struct rcu_drain {
     struct rcu_head rcu;
     QemuEvent drain_complete_event;
@@ -339,8 +348,10 @@ void drain_call_rcu(void)
      * assumed.
      */
 
+    qatomic_inc(&in_drain_call_rcu);
     call_rcu1(&rcu_drain.rcu, drain_rcu_callback);
     qemu_event_wait(&rcu_drain.drain_complete_event);
+    qatomic_dec(&in_drain_call_rcu);
 
     if (locked) {
         qemu_mutex_lock_iothread();
@@ -348,10 +359,14 @@ void drain_call_rcu(void)
 
 }
 
-void rcu_register_thread(void)
+void rcu_register_thread_with_force_rcu(Notifier *n)
 {
     assert(rcu_reader.ctr == 0);
     qemu_mutex_lock(&rcu_registry_lock);
+    if (n) {
+        rcu_reader.force_rcu = n;
+        notifier_list_add(&force_rcu_notifiers, rcu_reader.force_rcu);
+    }
     QLIST_INSERT_HEAD(&registry, &rcu_reader, node);
     qemu_mutex_unlock(&rcu_registry_lock);
 }
@@ -360,6 +375,10 @@ void rcu_unregister_thread(void)
 {
     qemu_mutex_lock(&rcu_registry_lock);
     QLIST_REMOVE(&rcu_reader, node);
+    if (rcu_reader.force_rcu) {
+        notifier_remove(rcu_reader.force_rcu);
+        rcu_reader.force_rcu = NULL;
+    }
     qemu_mutex_unlock(&rcu_registry_lock);
 }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] accel/tcg: Register a force_rcu notifier
  2021-10-15 16:12 [PATCH 0/2] accel/tcg: Fix monitor deadlock Greg Kurz
  2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
@ 2021-10-15 16:12 ` Greg Kurz
  2021-10-18  9:18   ` Paolo Bonzini
  1 sibling, 1 reply; 5+ messages in thread
From: Greg Kurz @ 2021-10-15 16:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: Eduardo Habkost, Richard Henderson, Greg Kurz, qemu-stable,
	Paolo Bonzini

A TCG vCPU doing a busy loop systematicaly hangs the QEMU monitor
if the user passes 'device_add' without argument. This is because
drain_cpu_all() which is called from qmp_device_add() cannot return
if readers don't exit read-side critical sections. That is typically
what busy-looping TCG vCPUs do, both in MTTCG and RR modes:

int cpu_exec(CPUState *cpu)
{
[...]
    rcu_read_lock();
[...]
    while (!cpu_handle_exception(cpu, &ret)) {
        // Busy loop keeps vCPU here
    }
[...]
    rcu_read_unlock();

    return ret;
}

Have all vCPUs register a force_rcu notifier that will kick them out
of the loop using async_run_on_cpu(). The notifier implementation is
shared by MTTCG and RR since both are affected.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 7bed89958bfb ("device_core: use drain_call_rcu in in qmp_device_add")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/650
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 accel/tcg/tcg-accel-ops-mttcg.c |  3 ++-
 accel/tcg/tcg-accel-ops-rr.c    |  3 ++-
 accel/tcg/tcg-accel-ops.c       | 11 +++++++++++
 accel/tcg/tcg-accel-ops.h       |  2 ++
 include/hw/core/cpu.h           |  2 ++
 5 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index 847d2079d21f..114c10cb22e2 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -48,7 +48,8 @@ static void *mttcg_cpu_thread_fn(void *arg)
     assert(tcg_enabled());
     g_assert(!icount_enabled());
 
-    rcu_register_thread();
+    cpu->force_rcu.notify = tcg_cpus_force_rcu;
+    rcu_register_thread_with_force_rcu(&cpu->force_rcu);
     tcg_register_thread();
 
     qemu_mutex_lock_iothread();
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index a5fd26190e20..90cc02221faa 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -146,7 +146,8 @@ static void *rr_cpu_thread_fn(void *arg)
     CPUState *cpu = arg;
 
     assert(tcg_enabled());
-    rcu_register_thread();
+    cpu->force_rcu.notify = tcg_cpus_force_rcu;
+    rcu_register_thread_with_force_rcu(&cpu->force_rcu);
     tcg_register_thread();
 
     qemu_mutex_lock_iothread();
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 1a8e8390bd60..58a88a34adaf 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -91,6 +91,17 @@ void tcg_handle_interrupt(CPUState *cpu, int mask)
     }
 }
 
+static void do_nothing(CPUState *cpu, run_on_cpu_data d)
+{
+}
+
+void tcg_cpus_force_rcu(Notifier *notify, void *data)
+{
+    CPUState *cpu = container_of(notify, CPUState, force_rcu);
+
+    async_run_on_cpu(cpu, do_nothing, RUN_ON_CPU_NULL);
+}
+
 static void tcg_accel_ops_init(AccelOpsClass *ops)
 {
     if (qemu_tcg_mttcg_enabled()) {
diff --git a/accel/tcg/tcg-accel-ops.h b/accel/tcg/tcg-accel-ops.h
index 6a5fcef88980..8742041c8aea 100644
--- a/accel/tcg/tcg-accel-ops.h
+++ b/accel/tcg/tcg-accel-ops.h
@@ -18,5 +18,7 @@ void tcg_cpus_destroy(CPUState *cpu);
 int tcg_cpus_exec(CPUState *cpu);
 void tcg_handle_interrupt(CPUState *cpu, int mask);
 void tcg_cpu_init_cflags(CPUState *cpu, bool parallel);
+/* Common force_rcu notifier for MTTCG and RR */
+void tcg_cpus_force_rcu(Notifier *notify, void *data);
 
 #endif /* TCG_CPUS_H */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index b7d5bc1200cd..b0047f4b3a97 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,8 @@ struct CPUState {
 
     /* track IOMMUs whose translations we've cached in the TCG TLB */
     GArray *iommu_notifiers;
+
+    Notifier force_rcu;
 };
 
 typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] rcu: Introduce force_rcu notifier
  2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
@ 2021-10-18  9:17   ` Paolo Bonzini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2021-10-18  9:17 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Richard Henderson, qemu-stable, Eduardo Habkost

On 15/10/21 18:12, Greg Kurz wrote:
> +/*
> + * NotifierList used to force an RCU grace period.  Accessed under
> + * rcu_registry_lock.
> + */
> +static NotifierList force_rcu_notifiers =
> +    NOTIFIER_LIST_INITIALIZER(force_rcu_notifiers);
> +
>   /*
>    * Check whether a quiescent state was crossed between the beginning of
>    * update_counter_and_wait and now.
> @@ -107,6 +115,8 @@ static void wait_for_readers(void)
>                    * get some extra futex wakeups.
>                    */
>                   qatomic_set(&index->waiting, false);
> +            } else if (qatomic_read(&in_drain_call_rcu)) {
> +                notifier_list_notify(&force_rcu_notifiers, NULL);
>               }
>           }
>   

You can put the notifier in struct rcu_reader_data---this way it doesn't 
call all the notifiers but only those that are necessary to make progress.

While at it, I have a slight preference for a separate 
rcu_add_force_rcu_notifier API, but I can be convinced otherwise. :)

Paolo



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 2/2] accel/tcg: Register a force_rcu notifier
  2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
@ 2021-10-18  9:18   ` Paolo Bonzini
  0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2021-10-18  9:18 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Richard Henderson, qemu-stable, Eduardo Habkost

On 15/10/21 18:12, Greg Kurz wrote:
> +void tcg_cpus_force_rcu(Notifier *notify, void *data)
> +{
> +    CPUState *cpu = container_of(notify, CPUState, force_rcu);
> +

Perhaps add a comment: /* Called with rcu_registry_lock held, using 
async_run_on_cpu ensudes that there are no deadlocks.  */

Paolo

> +    async_run_on_cpu(cpu, do_nothing, RUN_ON_CPU_NULL);
> +}
> +



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-18  9:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-15 16:12 [PATCH 0/2] accel/tcg: Fix monitor deadlock Greg Kurz
2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
2021-10-18  9:17   ` Paolo Bonzini
2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
2021-10-18  9:18   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).