* [PATCH 0/2] accel/tcg: Fix monitor deadlock
@ 2021-10-15 16:12 Greg Kurz
2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
0 siblings, 2 replies; 5+ messages in thread
From: Greg Kurz @ 2021-10-15 16:12 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Richard Henderson, Greg Kurz, qemu-stable,
Paolo Bonzini
Commit 7bed89958bfb ("device_core: use drain_call_rcu in in qmp_device_add")
introduced a regression in QEMU 6.0 : passing device_add without argument
hangs the monitor. This was reported against qemu-system-mips64 with TGC,
but I could consistently reproduce it with other targets (x86 and ppc64).
See https://gitlab.com/qemu-project/qemu/-/issues/650 for details.
The problem is that an emulated busy-looping vCPU can stay forever in
its RCU read-side critical section and prevent drain_call_rcu() to return.
This series fixes the issue by letting RCU kick vCPUs out of the read-side
critical section when drain_call_rcu() is in progress. This is achieved
through notifiers, as suggested by Paolo Bonzini.
Greg Kurz (2):
rcu: Introduce force_rcu notifier
accel/tcg: Register a force_rcu notifier
accel/tcg/tcg-accel-ops-mttcg.c | 3 ++-
accel/tcg/tcg-accel-ops-rr.c | 3 ++-
accel/tcg/tcg-accel-ops.c | 11 +++++++++++
accel/tcg/tcg-accel-ops.h | 2 ++
include/hw/core/cpu.h | 2 ++
include/qemu/rcu.h | 21 ++++++++++++++++++++-
util/rcu.c | 23 +++++++++++++++++++++--
7 files changed, 60 insertions(+), 5 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] rcu: Introduce force_rcu notifier
2021-10-15 16:12 [PATCH 0/2] accel/tcg: Fix monitor deadlock Greg Kurz
@ 2021-10-15 16:12 ` Greg Kurz
2021-10-18 9:17 ` Paolo Bonzini
2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
1 sibling, 1 reply; 5+ messages in thread
From: Greg Kurz @ 2021-10-15 16:12 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Richard Henderson, Greg Kurz, qemu-stable,
Paolo Bonzini
The drain_rcu_call() function can be blocked as long as an RCU reader
stays in a read-side critical section. This is typically what happens
when a TCG vCPU is executing a busy loop. It can deadlock the QEMU
monitor as reported in https://gitlab.com/qemu-project/qemu/-/issues/650 .
This can be avoided by allowing drain_rcu_call() to enforce an RCU grace
period. Since each reader might need to do specific actions to end a
read-side critical section, do it with notifiers.
Prepare ground for this by adding a NotifierList and use it in
wait_for_readers() if drain_rcu_call() is in progress. Readers can
now optionally specify a Notifier to be called in this case at
thread registration time. The current rcu_register_thread() API is
preserved for readers that don't need this. The notifier is removed
automatically when the thread unregisters.
This is largely based on a draft from Paolo Bonzini.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
include/qemu/rcu.h | 21 ++++++++++++++++++++-
util/rcu.c | 23 +++++++++++++++++++++--
2 files changed, 41 insertions(+), 3 deletions(-)
diff --git a/include/qemu/rcu.h b/include/qemu/rcu.h
index 515d327cf11c..498e4e5e3479 100644
--- a/include/qemu/rcu.h
+++ b/include/qemu/rcu.h
@@ -27,6 +27,7 @@
#include "qemu/thread.h"
#include "qemu/queue.h"
#include "qemu/atomic.h"
+#include "qemu/notify.h"
#include "qemu/sys_membarrier.h"
#ifdef __cplusplus
@@ -66,6 +67,13 @@ struct rcu_reader_data {
/* Data used for registry, protected by rcu_registry_lock */
QLIST_ENTRY(rcu_reader_data) node;
+
+ /*
+ * Notifier used to force an RCU grace period. Accessed under
+ * rcu_registry_lock. Note that the notifier is called _outside_
+ * the thread!
+ */
+ Notifier *force_rcu;
};
extern __thread struct rcu_reader_data rcu_reader;
@@ -114,8 +122,19 @@ extern void synchronize_rcu(void);
/*
* Reader thread registration.
+ *
+ * The caller can specify an optional notifier if it wants RCU
+ * to enforce grace periods. This is needed by drain_call_rcu().
+ * Note that the notifier is executed in the context of the RCU
+ * thread.
*/
-extern void rcu_register_thread(void);
+extern void rcu_register_thread_with_force_rcu(Notifier *n);
+
+static inline void rcu_register_thread(void)
+{
+ rcu_register_thread_with_force_rcu(NULL);
+}
+
extern void rcu_unregister_thread(void);
/*
diff --git a/util/rcu.c b/util/rcu.c
index 13ac0f75cb2a..da3506917fa8 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -46,9 +46,17 @@
unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
QemuEvent rcu_gp_event;
+static int in_drain_call_rcu;
static QemuMutex rcu_registry_lock;
static QemuMutex rcu_sync_lock;
+/*
+ * NotifierList used to force an RCU grace period. Accessed under
+ * rcu_registry_lock.
+ */
+static NotifierList force_rcu_notifiers =
+ NOTIFIER_LIST_INITIALIZER(force_rcu_notifiers);
+
/*
* Check whether a quiescent state was crossed between the beginning of
* update_counter_and_wait and now.
@@ -107,6 +115,8 @@ static void wait_for_readers(void)
* get some extra futex wakeups.
*/
qatomic_set(&index->waiting, false);
+ } else if (qatomic_read(&in_drain_call_rcu)) {
+ notifier_list_notify(&force_rcu_notifiers, NULL);
}
}
@@ -293,7 +303,6 @@ void call_rcu1(struct rcu_head *node, void (*func)(struct rcu_head *node))
qemu_event_set(&rcu_call_ready_event);
}
-
struct rcu_drain {
struct rcu_head rcu;
QemuEvent drain_complete_event;
@@ -339,8 +348,10 @@ void drain_call_rcu(void)
* assumed.
*/
+ qatomic_inc(&in_drain_call_rcu);
call_rcu1(&rcu_drain.rcu, drain_rcu_callback);
qemu_event_wait(&rcu_drain.drain_complete_event);
+ qatomic_dec(&in_drain_call_rcu);
if (locked) {
qemu_mutex_lock_iothread();
@@ -348,10 +359,14 @@ void drain_call_rcu(void)
}
-void rcu_register_thread(void)
+void rcu_register_thread_with_force_rcu(Notifier *n)
{
assert(rcu_reader.ctr == 0);
qemu_mutex_lock(&rcu_registry_lock);
+ if (n) {
+ rcu_reader.force_rcu = n;
+ notifier_list_add(&force_rcu_notifiers, rcu_reader.force_rcu);
+ }
QLIST_INSERT_HEAD(®istry, &rcu_reader, node);
qemu_mutex_unlock(&rcu_registry_lock);
}
@@ -360,6 +375,10 @@ void rcu_unregister_thread(void)
{
qemu_mutex_lock(&rcu_registry_lock);
QLIST_REMOVE(&rcu_reader, node);
+ if (rcu_reader.force_rcu) {
+ notifier_remove(rcu_reader.force_rcu);
+ rcu_reader.force_rcu = NULL;
+ }
qemu_mutex_unlock(&rcu_registry_lock);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] accel/tcg: Register a force_rcu notifier
2021-10-15 16:12 [PATCH 0/2] accel/tcg: Fix monitor deadlock Greg Kurz
2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
@ 2021-10-15 16:12 ` Greg Kurz
2021-10-18 9:18 ` Paolo Bonzini
1 sibling, 1 reply; 5+ messages in thread
From: Greg Kurz @ 2021-10-15 16:12 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, Richard Henderson, Greg Kurz, qemu-stable,
Paolo Bonzini
A TCG vCPU doing a busy loop systematicaly hangs the QEMU monitor
if the user passes 'device_add' without argument. This is because
drain_cpu_all() which is called from qmp_device_add() cannot return
if readers don't exit read-side critical sections. That is typically
what busy-looping TCG vCPUs do, both in MTTCG and RR modes:
int cpu_exec(CPUState *cpu)
{
[...]
rcu_read_lock();
[...]
while (!cpu_handle_exception(cpu, &ret)) {
// Busy loop keeps vCPU here
}
[...]
rcu_read_unlock();
return ret;
}
Have all vCPUs register a force_rcu notifier that will kick them out
of the loop using async_run_on_cpu(). The notifier implementation is
shared by MTTCG and RR since both are affected.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 7bed89958bfb ("device_core: use drain_call_rcu in in qmp_device_add")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/650
Signed-off-by: Greg Kurz <groug@kaod.org>
---
accel/tcg/tcg-accel-ops-mttcg.c | 3 ++-
accel/tcg/tcg-accel-ops-rr.c | 3 ++-
accel/tcg/tcg-accel-ops.c | 11 +++++++++++
accel/tcg/tcg-accel-ops.h | 2 ++
include/hw/core/cpu.h | 2 ++
5 files changed, 19 insertions(+), 2 deletions(-)
diff --git a/accel/tcg/tcg-accel-ops-mttcg.c b/accel/tcg/tcg-accel-ops-mttcg.c
index 847d2079d21f..114c10cb22e2 100644
--- a/accel/tcg/tcg-accel-ops-mttcg.c
+++ b/accel/tcg/tcg-accel-ops-mttcg.c
@@ -48,7 +48,8 @@ static void *mttcg_cpu_thread_fn(void *arg)
assert(tcg_enabled());
g_assert(!icount_enabled());
- rcu_register_thread();
+ cpu->force_rcu.notify = tcg_cpus_force_rcu;
+ rcu_register_thread_with_force_rcu(&cpu->force_rcu);
tcg_register_thread();
qemu_mutex_lock_iothread();
diff --git a/accel/tcg/tcg-accel-ops-rr.c b/accel/tcg/tcg-accel-ops-rr.c
index a5fd26190e20..90cc02221faa 100644
--- a/accel/tcg/tcg-accel-ops-rr.c
+++ b/accel/tcg/tcg-accel-ops-rr.c
@@ -146,7 +146,8 @@ static void *rr_cpu_thread_fn(void *arg)
CPUState *cpu = arg;
assert(tcg_enabled());
- rcu_register_thread();
+ cpu->force_rcu.notify = tcg_cpus_force_rcu;
+ rcu_register_thread_with_force_rcu(&cpu->force_rcu);
tcg_register_thread();
qemu_mutex_lock_iothread();
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index 1a8e8390bd60..58a88a34adaf 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -91,6 +91,17 @@ void tcg_handle_interrupt(CPUState *cpu, int mask)
}
}
+static void do_nothing(CPUState *cpu, run_on_cpu_data d)
+{
+}
+
+void tcg_cpus_force_rcu(Notifier *notify, void *data)
+{
+ CPUState *cpu = container_of(notify, CPUState, force_rcu);
+
+ async_run_on_cpu(cpu, do_nothing, RUN_ON_CPU_NULL);
+}
+
static void tcg_accel_ops_init(AccelOpsClass *ops)
{
if (qemu_tcg_mttcg_enabled()) {
diff --git a/accel/tcg/tcg-accel-ops.h b/accel/tcg/tcg-accel-ops.h
index 6a5fcef88980..8742041c8aea 100644
--- a/accel/tcg/tcg-accel-ops.h
+++ b/accel/tcg/tcg-accel-ops.h
@@ -18,5 +18,7 @@ void tcg_cpus_destroy(CPUState *cpu);
int tcg_cpus_exec(CPUState *cpu);
void tcg_handle_interrupt(CPUState *cpu, int mask);
void tcg_cpu_init_cflags(CPUState *cpu, bool parallel);
+/* Common force_rcu notifier for MTTCG and RR */
+void tcg_cpus_force_rcu(Notifier *notify, void *data);
#endif /* TCG_CPUS_H */
diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index b7d5bc1200cd..b0047f4b3a97 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -418,6 +418,8 @@ struct CPUState {
/* track IOMMUs whose translations we've cached in the TCG TLB */
GArray *iommu_notifiers;
+
+ Notifier force_rcu;
};
typedef QTAILQ_HEAD(CPUTailQ, CPUState) CPUTailQ;
--
2.31.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] rcu: Introduce force_rcu notifier
2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
@ 2021-10-18 9:17 ` Paolo Bonzini
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2021-10-18 9:17 UTC (permalink / raw)
To: Greg Kurz, qemu-devel; +Cc: Richard Henderson, qemu-stable, Eduardo Habkost
On 15/10/21 18:12, Greg Kurz wrote:
> +/*
> + * NotifierList used to force an RCU grace period. Accessed under
> + * rcu_registry_lock.
> + */
> +static NotifierList force_rcu_notifiers =
> + NOTIFIER_LIST_INITIALIZER(force_rcu_notifiers);
> +
> /*
> * Check whether a quiescent state was crossed between the beginning of
> * update_counter_and_wait and now.
> @@ -107,6 +115,8 @@ static void wait_for_readers(void)
> * get some extra futex wakeups.
> */
> qatomic_set(&index->waiting, false);
> + } else if (qatomic_read(&in_drain_call_rcu)) {
> + notifier_list_notify(&force_rcu_notifiers, NULL);
> }
> }
>
You can put the notifier in struct rcu_reader_data---this way it doesn't
call all the notifiers but only those that are necessary to make progress.
While at it, I have a slight preference for a separate
rcu_add_force_rcu_notifier API, but I can be convinced otherwise. :)
Paolo
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] accel/tcg: Register a force_rcu notifier
2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
@ 2021-10-18 9:18 ` Paolo Bonzini
0 siblings, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2021-10-18 9:18 UTC (permalink / raw)
To: Greg Kurz, qemu-devel; +Cc: Richard Henderson, qemu-stable, Eduardo Habkost
On 15/10/21 18:12, Greg Kurz wrote:
> +void tcg_cpus_force_rcu(Notifier *notify, void *data)
> +{
> + CPUState *cpu = container_of(notify, CPUState, force_rcu);
> +
Perhaps add a comment: /* Called with rcu_registry_lock held, using
async_run_on_cpu ensudes that there are no deadlocks. */
Paolo
> + async_run_on_cpu(cpu, do_nothing, RUN_ON_CPU_NULL);
> +}
> +
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-10-18 9:19 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-15 16:12 [PATCH 0/2] accel/tcg: Fix monitor deadlock Greg Kurz
2021-10-15 16:12 ` [PATCH 1/2] rcu: Introduce force_rcu notifier Greg Kurz
2021-10-18 9:17 ` Paolo Bonzini
2021-10-15 16:12 ` [PATCH 2/2] accel/tcg: Register a " Greg Kurz
2021-10-18 9:18 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).