[Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state
@ 2016-06-19 22:28 Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers Sergey Fedorov
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Sergey Fedorov, MTTCG Devel, KONRAD Frédéric,
	Alvise Rigo, Emilio G. Cota, Alex Bennée, Paolo Bonzini,
	Richard Henderson, Peter Maydell

From: Sergey Fedorov <serge.fdrv@gmail.com>

Hi,

This RFC series is a follow-up for a discussion on the subject [1].

Basically, this series is intended to show a possible way to perform
operations on quiescent state, so that we can discuss if it a sane way
to do this. The goal is to implement such a mechanism which can be used
for safe translation buffer flush in multi-threaded user-mode emulation
(and later in MTTCG) and merge it into mainline in v2.7.

I tried to keep this series as small as possible so that would be easier
to focus on the main idea. Thus bsd-user part was simply skipped here.
Please note that this is just a kind of "proof of concept" series and
needs to be polished and refined.

The patch 1 is just a useful tweak from Alex's MTTCG tree, please don't
comment on it here if possible.

The patches 2 through 5 are arrangements for the patch 7 which adds
support for CPU work in linux-user. This wouldn't make any sense without
the patch 8 which is the subject matter of this series. Although there
is nothing special to do in case of single-threaded round-robin CPU loop
of current system-mode emulation to ensure quiescent state, that is
shown in the patch 7, how it would look like in MTTCG. The last patch
actually employs this new mechanism making translation buffer flush
thread safe.

Again for brevity, the considerations on expensiveness of work item
dynamic allocation [2] was not taken into account. I'll just mention
here that the desired effect can be achieved by either using dynamic
arrays for CPU work queues or making queue_work_on_cpu() from the
patch 2 a public interface thus allowing to use preallocated work items.

I would like your comments in order to produce something upstreamable
quickly!

This series is available at a public git repository:

    https://github.com/sergefdrv/qemu.git safe-cpu-work

Kind regards,
Sergey

[1] http://thread.gmane.org/gmane.comp.emulators.qemu/417599
[2] http://thread.gmane.org/gmane.comp.emulators.qemu/407030/focus=407039

Alex Bennée (1):
  cpus: pass CPUState to run_on_cpu helpers

Sergey Fedorov (7):
  cpus: Move common code out of {async_,}run_on_cpu()
  cpus: Add 'qemu_work_cond' usage wrappers
  linux-user: Rework exclusive operation mechanism
  linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  linux-user: Support CPU work queue
  cpu-exec-common: Introduce async_safe_run_on_cpu()
  tcg: Make tb_flush() thread safe

 cpu-exec-common.c       | 126 ++++++++++++++++++++++++++++++++++++++++++++++++
 cpus.c                  |  98 +++++++------------------------------
 hw/i386/kvm/apic.c      |   3 +-
 hw/i386/kvmvapic.c      |   8 +--
 hw/ppc/ppce500_spin.c   |   3 +-
 hw/ppc/spapr.c          |   6 +--
 hw/ppc/spapr_hcall.c    |  12 ++---
 include/exec/exec-all.h |   6 +++
 include/qom/cpu.h       |  22 +++++++--
 kvm-all.c               |  20 +++-----
 linux-user/main.c       |  47 +++++++++++++-----
 target-i386/helper.c    |   3 +-
 target-i386/kvm.c       |   6 +--
 target-s390x/cpu.c      |   4 +-
 target-s390x/cpu.h      |   7 +--
 translate-all.c         |  12 +++--
 16 files changed, 238 insertions(+), 145 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-20  1:23   ` David Gibson
  2016-06-20 13:02   ` Alex Bennée
  2016-06-19 22:28 ` [Qemu-devel] [RFC 2/8] cpus: Move common code out of {async_, }run_on_cpu() Sergey Fedorov
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Alex Bennée, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Michael S. Tsirkin,
	Eduardo Habkost, David Gibson, Alexander Graf, Marcelo Tosatti,
	qemu-ppc, kvm

From: Alex Bennée <alex.bennee@linaro.org>

CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.

This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpus.c                | 15 +++++++--------
 hw/i386/kvm/apic.c    |  3 +--
 hw/i386/kvmvapic.c    |  8 ++++----
 hw/ppc/ppce500_spin.c |  3 +--
 hw/ppc/spapr.c        |  6 ++----
 hw/ppc/spapr_hcall.c  | 12 +++++-------
 include/qom/cpu.h     |  8 +++++---
 kvm-all.c             | 20 +++++++-------------
 target-i386/helper.c  |  3 +--
 target-i386/kvm.c     |  6 ++----
 target-s390x/cpu.c    |  4 ++--
 target-s390x/cpu.h    |  7 ++-----
 12 files changed, 39 insertions(+), 56 deletions(-)

diff --git a/cpus.c b/cpus.c
index 84c3520d446f..049c2d04e150 100644
--- a/cpus.c
+++ b/cpus.c
@@ -551,9 +551,8 @@ static const VMStateDescription vmstate_timers = {
     }
 };
 
-static void cpu_throttle_thread(void *opaque)
+static void cpu_throttle_thread(CPUState *cpu, void *opaque)
 {
-    CPUState *cpu = opaque;
     double pct;
     double throttle_ratio;
     long sleeptime_ns;
@@ -583,7 +582,7 @@ static void cpu_throttle_timer_tick(void *opaque)
     }
     CPU_FOREACH(cpu) {
         if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
-            async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
+            async_run_on_cpu(cpu, cpu_throttle_thread, NULL);
         }
     }
 
@@ -911,12 +910,12 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
 
     if (qemu_cpu_is_self(cpu)) {
-        func(data);
+        func(cpu, data);
         return;
     }
 
@@ -944,12 +943,12 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
     }
 }
 
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item *wi;
 
     if (qemu_cpu_is_self(cpu)) {
-        func(data);
+        func(cpu, data);
         return;
     }
 
@@ -1000,7 +999,7 @@ static void flush_queued_work(CPUState *cpu)
             cpu->queued_work_last = NULL;
         }
         qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(wi->data);
+        wi->func(cpu, wi->data);
         qemu_mutex_lock(&cpu->work_mutex);
         if (wi->free) {
             g_free(wi);
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index c5983c79be47..9b66e741d4b4 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -125,10 +125,9 @@ static void kvm_apic_vapic_base_update(APICCommonState *s)
     }
 }
 
-static void do_inject_external_nmi(void *data)
+static void do_inject_external_nmi(CPUState *cpu, void *data)
 {
     APICCommonState *s = data;
-    CPUState *cpu = CPU(s->cpu);
     uint32_t lvt;
     int ret;
 
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 3bf1ddd97612..8063d695312e 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -483,7 +483,7 @@ typedef struct VAPICEnableTPRReporting {
     bool enable;
 } VAPICEnableTPRReporting;
 
-static void vapic_do_enable_tpr_reporting(void *data)
+static void vapic_do_enable_tpr_reporting(CPUState *cpu, void *data)
 {
     VAPICEnableTPRReporting *info = data;
 
@@ -734,15 +734,15 @@ static void vapic_realize(DeviceState *dev, Error **errp)
     nb_option_roms++;
 }
 
-static void do_vapic_enable(void *data)
+static void do_vapic_enable(CPUState *cpu, void *data)
 {
     VAPICROMState *s = data;
-    X86CPU *cpu = X86_CPU(first_cpu);
+    X86CPU *x86_cpu = X86_CPU(cpu);
 
     static const uint8_t enabled = 1;
     cpu_physical_memory_write(s->vapic_paddr + offsetof(VAPICState, enabled),
                               &enabled, sizeof(enabled));
-    apic_enable_vapic(cpu->apic_state, s->vapic_paddr);
+    apic_enable_vapic(x86_cpu->apic_state, s->vapic_paddr);
     s->state = VAPIC_ACTIVE;
 }
 
diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c
index 76bd78bfd7a6..e2aeef3701ec 100644
--- a/hw/ppc/ppce500_spin.c
+++ b/hw/ppc/ppce500_spin.c
@@ -94,10 +94,9 @@ static void mmubooke_create_initial_mapping(CPUPPCState *env,
     env->tlb_dirty = true;
 }
 
-static void spin_kick(void *data)
+static void spin_kick(CPUState *cpu, void *data)
 {
     SpinKick *kick = data;
-    CPUState *cpu = CPU(kick->cpu);
     CPUPPCState *env = &kick->cpu->env;
     SpinInfo *curspin = kick->spin;
     hwaddr map_size = 64 * 1024 * 1024;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 778fa255a946..a427492c0310 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2159,10 +2159,8 @@ static void spapr_machine_finalizefn(Object *obj)
     g_free(spapr->kvm_type);
 }
 
-static void ppc_cpu_do_nmi_on_cpu(void *arg)
+static void ppc_cpu_do_nmi_on_cpu(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
-
     cpu_synchronize_state(cs);
     ppc_cpu_do_system_reset(cs);
 }
@@ -2172,7 +2170,7 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     CPUState *cs;
 
     CPU_FOREACH(cs) {
-        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, cs);
+        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, NULL);
     }
 }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 2ba5cbdb194a..22d57469b8a5 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -13,19 +13,18 @@
 #include "kvm_ppc.h"
 
 struct SPRSyncState {
-    CPUState *cs;
     int spr;
     target_ulong value;
     target_ulong mask;
 };
 
-static void do_spr_sync(void *arg)
+static void do_spr_sync(CPUState *cs, void *arg)
 {
     struct SPRSyncState *s = arg;
-    PowerPCCPU *cpu = POWERPC_CPU(s->cs);
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
     CPUPPCState *env = &cpu->env;
 
-    cpu_synchronize_state(s->cs);
+    cpu_synchronize_state(cs);
     env->spr[s->spr] &= ~s->mask;
     env->spr[s->spr] |= s->value;
 }
@@ -34,7 +33,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong value,
                     target_ulong mask)
 {
     struct SPRSyncState s = {
-        .cs = cs,
         .spr = spr,
         .value = value,
         .mask = mask
@@ -908,11 +906,11 @@ typedef struct {
     Error *err;
 } SetCompatState;
 
-static void do_set_compat(void *arg)
+static void do_set_compat(CPUState *cs, void *arg)
 {
     SetCompatState *s = arg;
 
-    cpu_synchronize_state(CPU(s->cpu));
+    cpu_synchronize_state(cs);
     ppc_set_compat(s->cpu, s->cpu_version, &s->err);
 }
 
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 32f3af3e1c63..4e688f645b4a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -223,9 +223,11 @@ struct kvm_run;
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
 /* work queue */
+typedef void (*run_on_cpu_func)(CPUState *cpu, void *data);
+
 struct qemu_work_item {
     struct qemu_work_item *next;
-    void (*func)(void *data);
+    run_on_cpu_func func;
     void *data;
     int done;
     bool free;
@@ -610,7 +612,7 @@ bool cpu_is_stopped(CPUState *cpu);
  *
  * Schedules the function @func for execution on the vCPU @cpu.
  */
-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
  * async_run_on_cpu:
@@ -620,7 +622,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
  *
  * Schedules the function @func for execution on the vCPU @cpu asynchronously.
  */
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
  * qemu_get_cpu:
diff --git a/kvm-all.c b/kvm-all.c
index a88f917fda69..a2ee20cfb7f3 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1828,10 +1828,8 @@ void kvm_flush_coalesced_mmio_buffer(void)
     s->coalesced_flush_in_progress = false;
 }
 
-static void do_kvm_cpu_synchronize_state(void *arg)
+static void do_kvm_cpu_synchronize_state(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     if (!cpu->kvm_vcpu_dirty) {
         kvm_arch_get_registers(cpu);
         cpu->kvm_vcpu_dirty = true;
@@ -1841,34 +1839,30 @@ static void do_kvm_cpu_synchronize_state(void *arg)
 void kvm_cpu_synchronize_state(CPUState *cpu)
 {
     if (!cpu->kvm_vcpu_dirty) {
-        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
+        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, NULL);
     }
 }
 
-static void do_kvm_cpu_synchronize_post_reset(void *arg)
+static void do_kvm_cpu_synchronize_post_reset(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
     cpu->kvm_vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, cpu);
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, NULL);
 }
 
-static void do_kvm_cpu_synchronize_post_init(void *arg)
+static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
     cpu->kvm_vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, cpu);
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, NULL);
 }
 
 int kvm_cpu_exec(CPUState *cpu)
@@ -2210,7 +2204,7 @@ struct kvm_set_guest_debug_data {
     int err;
 };
 
-static void kvm_invoke_set_guest_debug(void *data)
+static void kvm_invoke_set_guest_debug(CPUState *unused_cpu, void *data)
 {
     struct kvm_set_guest_debug_data *dbg_data = data;
 
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 1c250b82452d..2e438fc1dd8f 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1122,11 +1122,10 @@ typedef struct MCEInjectionParams {
     int flags;
 } MCEInjectionParams;
 
-static void do_inject_x86_mce(void *data)
+static void do_inject_x86_mce(CPUState *cpu, void *data)
 {
     MCEInjectionParams *params = data;
     CPUX86State *cenv = &params->cpu->env;
-    CPUState *cpu = CPU(params->cpu);
     uint64_t *banks = cenv->mce_banks + 4 * params->bank;
 
     cpu_synchronize_state(cpu);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index ff92b1d118e1..6a239cd16c0f 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -151,10 +151,8 @@ static int kvm_get_tsc(CPUState *cs)
     return 0;
 }
 
-static inline void do_kvm_synchronize_tsc(void *arg)
+static inline void do_kvm_synchronize_tsc(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_get_tsc(cpu);
 }
 
@@ -164,7 +162,7 @@ void kvm_synchronize_all_tsc(void)
 
     if (kvm_enabled()) {
         CPU_FOREACH(cpu) {
-            run_on_cpu(cpu, do_kvm_synchronize_tsc, cpu);
+            run_on_cpu(cpu, do_kvm_synchronize_tsc, NULL);
         }
     }
 }
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index e43e2d61550b..4f09c647b7a8 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -188,7 +188,7 @@ static void s390_cpu_machine_reset_cb(void *opaque)
 {
     S390CPU *cpu = opaque;
 
-    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, CPU(cpu));
+    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, NULL);
 }
 #endif
 
@@ -238,7 +238,7 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp)
     s390_cpu_gdb_init(cs);
     qemu_init_vcpu(cs);
 #if !defined(CONFIG_USER_ONLY)
-    run_on_cpu(cs, s390_do_cpu_full_reset, cs);
+    run_on_cpu(cs, s390_do_cpu_full_reset, NULL);
 #else
     cpu_reset(cs);
 #endif
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index bd6b2e57ef6c..3ae5d2d7de3f 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -499,17 +499,14 @@ static inline hwaddr decode_basedisp_s(CPUS390XState *env, uint32_t ipb,
 #define decode_basedisp_rs decode_basedisp_s
 
 /* helper functions for run_on_cpu() */
-static inline void s390_do_cpu_reset(void *arg)
+static inline void s390_do_cpu_reset(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
     S390CPUClass *scc = S390_CPU_GET_CLASS(cs);
 
     scc->cpu_reset(cs);
 }
-static inline void s390_do_cpu_full_reset(void *arg)
+static inline void s390_do_cpu_full_reset(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
-
     cpu_reset(cs);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 2/8] cpus: Move common code out of {async_, }run_on_cpu()
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-27  8:54   ` Alex Bennée
  2016-06-19 22:28 ` [Qemu-devel] [RFC 3/8] cpus: Add 'qemu_work_cond' usage wrappers Sergey Fedorov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Sergey Fedorov, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

From: Sergey Fedorov <serge.fdrv@gmail.com>

Move the code common between run_on_cpu() and async_run_on_cpu() into a
new function queue_work_on_cpu().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpus.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/cpus.c b/cpus.c
index 049c2d04e150..04687c85bcd4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -910,6 +910,22 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
@@ -923,18 +939,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi.data = data;
     wi.free = false;
 
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = &wi;
-    } else {
-        cpu->queued_work_last->next = &wi;
-    }
-    cpu->queued_work_last = &wi;
-    wi.next = NULL;
-    wi.done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    queue_work_on_cpu(cpu, &wi);
     while (!atomic_mb_read(&wi.done)) {
         CPUState *self_cpu = current_cpu;
 
@@ -957,18 +962,7 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi->data = data;
     wi->free = true;
 
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    queue_work_on_cpu(cpu, wi);
 }
 
 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 3/8] cpus: Add 'qemu_work_cond' usage wrappers
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 2/8] cpus: Move common code out of {async_, }run_on_cpu() Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism Sergey Fedorov
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Sergey Fedorov, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

From: Sergey Fedorov <serge.fdrv@gmail.com>

Introducing these wrappers is a step towards CPU work support in
user-mode emulation.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpus.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index 04687c85bcd4..f123eb707cc6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -910,6 +910,16 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
+static void wait_cpu_work(void)
+{
+    qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
+}
+
+static void signal_cpu_work(void)
+{
+    qemu_cond_broadcast(&qemu_work_cond);
+}
+
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
     qemu_mutex_lock(&cpu->work_mutex);
@@ -943,7 +953,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     while (!atomic_mb_read(&wi.done)) {
         CPUState *self_cpu = current_cpu;
 
-        qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
+        wait_cpu_work();
         current_cpu = self_cpu;
     }
 }
@@ -1002,7 +1012,7 @@ static void flush_queued_work(CPUState *cpu)
         }
     }
     qemu_mutex_unlock(&cpu->work_mutex);
-    qemu_cond_broadcast(&qemu_work_cond);
+    signal_cpu_work();
 }
 
 static void qemu_wait_io_event_common(CPUState *cpu)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
                   ` (2 preceding siblings ...)
  2016-06-19 22:28 ` [Qemu-devel] [RFC 3/8] cpus: Add 'qemu_work_cond' usage wrappers Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-27  9:02   ` Alex Bennée
  2016-06-19 22:28 ` [Qemu-devel] [RFC 5/8] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Sergey Fedorov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches, Sergey Fedorov, Sergey Fedorov, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

A single variable 'pending_cpus' was used for both counting currently
running CPUs and for signalling the pending exclusive operation request.

To prepare for supporting operations which requires a quiescent state,
like translation buffer flush, it is useful to keep a counter of
currently running CPUs always up to date.

Use a separate variable 'tcg_pending_cpus' to count for currently
running CPUs and a separate variable 'exclusive_pending' to indicate
that there's an exclusive operation pending.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 linux-user/main.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index b9a4e0ea45ac..485336f78b8f 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -111,7 +111,8 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
 static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
-static int pending_cpus;
+static bool exclusive_pending;
+static int tcg_pending_cpus;
 
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
@@ -133,7 +134,8 @@ void fork_end(int child)
                 QTAILQ_REMOVE(&cpus, cpu, node);
             }
         }
-        pending_cpus = 0;
+        tcg_pending_cpus = 0;
+        exclusive_pending = false;
         pthread_mutex_init(&exclusive_lock, NULL);
         pthread_mutex_init(&cpu_list_mutex, NULL);
         pthread_cond_init(&exclusive_cond, NULL);
@@ -150,7 +152,7 @@ void fork_end(int child)
    must be held.  */
 static inline void exclusive_idle(void)
 {
-    while (pending_cpus) {
+    while (exclusive_pending) {
         pthread_cond_wait(&exclusive_resume, &exclusive_lock);
     }
 }
@@ -164,15 +166,14 @@ static inline void start_exclusive(void)
     pthread_mutex_lock(&exclusive_lock);
     exclusive_idle();
 
-    pending_cpus = 1;
+    exclusive_pending = true;
     /* Make all other cpus stop executing.  */
     CPU_FOREACH(other_cpu) {
         if (other_cpu->running) {
-            pending_cpus++;
             cpu_exit(other_cpu);
         }
     }
-    if (pending_cpus > 1) {
+    while (tcg_pending_cpus) {
         pthread_cond_wait(&exclusive_cond, &exclusive_lock);
     }
 }
@@ -180,7 +181,7 @@ static inline void start_exclusive(void)
 /* Finish an exclusive operation.  */
 static inline void __attribute__((unused)) end_exclusive(void)
 {
-    pending_cpus = 0;
+    exclusive_pending = false;
     pthread_cond_broadcast(&exclusive_resume);
     pthread_mutex_unlock(&exclusive_lock);
 }
@@ -191,6 +192,7 @@ static inline void cpu_exec_start(CPUState *cpu)
     pthread_mutex_lock(&exclusive_lock);
     exclusive_idle();
     cpu->running = true;
+    tcg_pending_cpus++;
     pthread_mutex_unlock(&exclusive_lock);
 }
 
@@ -199,11 +201,9 @@ static inline void cpu_exec_end(CPUState *cpu)
 {
     pthread_mutex_lock(&exclusive_lock);
     cpu->running = false;
-    if (pending_cpus > 1) {
-        pending_cpus--;
-        if (pending_cpus == 1) {
-            pthread_cond_signal(&exclusive_cond);
-        }
+    tcg_pending_cpus--;
+    if (!tcg_pending_cpus) {
+        pthread_cond_broadcast(&exclusive_cond);
     }
     exclusive_idle();
     pthread_mutex_unlock(&exclusive_lock);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 5/8] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
                   ` (3 preceding siblings ...)
  2016-06-19 22:28 ` [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue Sergey Fedorov
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: patches, Sergey Fedorov, Sergey Fedorov, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 linux-user/main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index 485336f78b8f..0093a8008c8e 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3775,6 +3775,16 @@ void cpu_loop(CPUTLGState *env)
 
 THREAD CPUState *thread_cpu;
 
+bool qemu_cpu_is_self(CPUState *cpu)
+{
+    return thread_cpu == cpu;
+}
+
+void qemu_cpu_kick(CPUState *cpu)
+{
+    cpu_exit(cpu);
+}
+
 void task_settid(TaskState *ts)
 {
     if (ts->ts_tid == 0) {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
                   ` (4 preceding siblings ...)
  2016-06-19 22:28 ` [Qemu-devel] [RFC 5/8] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-27  9:31   ` Alex Bennée
  2016-06-29 16:17   ` Alex Bennée
  2016-06-19 22:28 ` [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu() Sergey Fedorov
  2016-06-19 22:28 ` [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe Sergey Fedorov
  7 siblings, 2 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Sergey Fedorov, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

Make CPU work core functions common between system and user-mode
emulation. User-mode does not have BQL, so flush_queued_work() is
protected by 'exclusive_lock'.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec-common.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 cpus.c                  | 87 ++-----------------------------------------------
 include/exec/exec-all.h |  4 +++
 linux-user/main.c       | 13 ++++++++
 4 files changed, 102 insertions(+), 85 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 0cb4ae60eff9..8184e0662cbd 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -77,3 +77,86 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
     }
     siglongjmp(cpu->jmp_env, 1);
 }
+
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi.func = func;
+    wi.data = data;
+    wi.free = false;
+
+    queue_work_on_cpu(cpu, &wi);
+    while (!atomic_mb_read(&wi.done)) {
+        CPUState *self_cpu = current_cpu;
+
+        wait_cpu_work();
+        current_cpu = self_cpu;
+    }
+}
+
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+void flush_queued_work(CPUState *cpu)
+{
+    struct qemu_work_item *wi;
+
+    if (cpu->queued_work_first == NULL) {
+        return;
+    }
+
+    qemu_mutex_lock(&cpu->work_mutex);
+    while (cpu->queued_work_first != NULL) {
+        wi = cpu->queued_work_first;
+        cpu->queued_work_first = wi->next;
+        if (!cpu->queued_work_first) {
+            cpu->queued_work_last = NULL;
+        }
+        qemu_mutex_unlock(&cpu->work_mutex);
+        wi->func(cpu, wi->data);
+        qemu_mutex_lock(&cpu->work_mutex);
+        if (wi->free) {
+            g_free(wi);
+        } else {
+            atomic_mb_set(&wi->done, true);
+        }
+    }
+    qemu_mutex_unlock(&cpu->work_mutex);
+    signal_cpu_work();
+}
diff --git a/cpus.c b/cpus.c
index f123eb707cc6..98f60f6f98f5 100644
--- a/cpus.c
+++ b/cpus.c
@@ -910,71 +910,16 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
-static void wait_cpu_work(void)
+void wait_cpu_work(void)
 {
     qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
 }
 
-static void signal_cpu_work(void)
+void signal_cpu_work(void)
 {
     qemu_cond_broadcast(&qemu_work_cond);
 }
 
-static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
-{
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
-}
-
-void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
-{
-    struct qemu_work_item wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
-    wi.func = func;
-    wi.data = data;
-    wi.free = false;
-
-    queue_work_on_cpu(cpu, &wi);
-    while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
-        wait_cpu_work();
-        current_cpu = self_cpu;
-    }
-}
-
-void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
-{
-    struct qemu_work_item *wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-
-    queue_work_on_cpu(cpu, wi);
-}
-
 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
 {
     if (kvm_destroy_vcpu(cpu) < 0) {
@@ -987,34 +932,6 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
-static void flush_queued_work(CPUState *cpu)
-{
-    struct qemu_work_item *wi;
-
-    if (cpu->queued_work_first == NULL) {
-        return;
-    }
-
-    qemu_mutex_lock(&cpu->work_mutex);
-    while (cpu->queued_work_first != NULL) {
-        wi = cpu->queued_work_first;
-        cpu->queued_work_first = wi->next;
-        if (!cpu->queued_work_first) {
-            cpu->queued_work_last = NULL;
-        }
-        qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(cpu, wi->data);
-        qemu_mutex_lock(&cpu->work_mutex);
-        if (wi->free) {
-            g_free(wi);
-        } else {
-            atomic_mb_set(&wi->done, true);
-        }
-    }
-    qemu_mutex_unlock(&cpu->work_mutex);
-    signal_cpu_work();
-}
-
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
     if (cpu->stop) {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index c1f59fa59d2c..23b4b50e0a45 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -407,4 +407,8 @@ extern int singlestep;
 extern CPUState *tcg_current_cpu;
 extern bool exit_request;
 
+void wait_cpu_work(void);
+void signal_cpu_work(void);
+void flush_queued_work(CPUState *cpu);
+
 #endif
diff --git a/linux-user/main.c b/linux-user/main.c
index 0093a8008c8e..5a68651159c2 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -111,6 +111,7 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
 static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
 static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
+static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
 static bool exclusive_pending;
 static int tcg_pending_cpus;
 
@@ -140,6 +141,7 @@ void fork_end(int child)
         pthread_mutex_init(&cpu_list_mutex, NULL);
         pthread_cond_init(&exclusive_cond, NULL);
         pthread_cond_init(&exclusive_resume, NULL);
+        pthread_cond_init(&work_cond, NULL);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork(thread_cpu);
     } else {
@@ -148,6 +150,16 @@ void fork_end(int child)
     }
 }
 
+void wait_cpu_work(void)
+{
+    pthread_cond_wait(&work_cond, &exclusive_lock);
+}
+
+void signal_cpu_work(void)
+{
+    pthread_cond_broadcast(&work_cond);
+}
+
 /* Wait for pending exclusive operations to complete.  The exclusive lock
    must be held.  */
 static inline void exclusive_idle(void)
@@ -206,6 +218,7 @@ static inline void cpu_exec_end(CPUState *cpu)
         pthread_cond_broadcast(&exclusive_cond);
     }
     exclusive_idle();
+    flush_queued_work(cpu);
     pthread_mutex_unlock(&exclusive_lock);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
                   ` (5 preceding siblings ...)
  2016-06-19 22:28 ` [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-27  9:36   ` Alex Bennée
  2016-07-01 16:29   ` Alvise Rigo
  2016-06-19 22:28 ` [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe Sergey Fedorov
  7 siblings, 2 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Sergey Fedorov, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

This patch is based on the ideas found in work of KONRAD Frederic [1],
Alex Bennée [2], and Alvise Rigo [3].

This mechanism allows to perform an operation safely in a quiescent
state. Quiescent state means: (1) no vCPU is running and (2) BQL in
system-mode or 'exclusive_lock' in user-mode emulation is held while
performing the operation. This functionality is required e.g. for
performing translation buffer flush safely in multi-threaded user-mode
emulation.

The existing CPU work queue is used to schedule such safe operations. A
new 'safe' flag is added into struct qemu_work_item to designate the
special requirements of the safe work. An operation in a quiescent sate
can be scheduled by using async_safe_run_on_cpu() function which is
actually the same as sync_run_on_cpu() except that it marks the queued
work item with the 'safe' flag set to true. Given this flag set
queue_work_on_cpu() atomically increments 'safe_work_pending' global
counter and kicks all the CPUs instead of just the target CPU as in case
of normal CPU work. This allows to force other CPUs to exit their
execution loops and wait in wait_safe_cpu_work() function for the safe
work to finish. When a CPU drains its work queue, if it encounters a
work item marked as safe, it first waits for other CPUs to exit their
execution loops, then called the work item function, and finally
decrements 'safe_work_pending' counter with signalling other CPUs to let
them continue execution as soon as all pending safe work items have been
processed. The 'tcg_pending_cpus' protected by 'exclusive_lock' in
user-mode or by 'qemu_global_mutex' in system-mode emulation is used to
determine if there is any CPU run and wait for it to exit the execution
loop. The fairness of all the CPU work queues is ensured by draining all
the pending safe work items before any CPU can run.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01128.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02531.html
[3] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04792.html

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec-common.c       | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 cpus.c                  | 16 ++++++++++++++++
 include/exec/exec-all.h |  2 ++
 include/qom/cpu.h       | 14 ++++++++++++++
 linux-user/main.c       |  2 +-
 5 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 8184e0662cbd..3056324738f8 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -25,6 +25,7 @@
 
 bool exit_request;
 CPUState *tcg_current_cpu;
+int tcg_pending_cpus;
 
 /* exit the current TB, but without causing any exception to be raised */
 void cpu_loop_exit_noexc(CPUState *cpu)
@@ -78,6 +79,15 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
     siglongjmp(cpu->jmp_env, 1);
 }
 
+static int safe_work_pending;
+
+void wait_safe_cpu_work(void)
+{
+    while (atomic_mb_read(&safe_work_pending) > 0) {
+        wait_cpu_work();
+    }
+}
+
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
     qemu_mutex_lock(&cpu->work_mutex);
@@ -89,9 +99,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
     cpu->queued_work_last = wi;
     wi->next = NULL;
     wi->done = false;
+    if (wi->safe) {
+        atomic_inc(&safe_work_pending);
+    }
     qemu_mutex_unlock(&cpu->work_mutex);
 
-    qemu_cpu_kick(cpu);
+    if (!wi->safe) {
+        qemu_cpu_kick(cpu);
+    } else {
+        CPU_FOREACH(cpu) {
+            qemu_cpu_kick(cpu);
+        }
+    }
 }
 
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
@@ -106,6 +125,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi.func = func;
     wi.data = data;
     wi.free = false;
+    wi.safe = false;
 
     queue_work_on_cpu(cpu, &wi);
     while (!atomic_mb_read(&wi.done)) {
@@ -129,6 +149,20 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi->func = func;
     wi->data = data;
     wi->free = true;
+    wi->safe = false;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+    wi->safe = true;
 
     queue_work_on_cpu(cpu, wi);
 }
@@ -148,9 +182,18 @@ void flush_queued_work(CPUState *cpu)
         if (!cpu->queued_work_first) {
             cpu->queued_work_last = NULL;
         }
+        if (wi->safe) {
+            while (tcg_pending_cpus) {
+                wait_cpu_work();
+            }
+        }
         qemu_mutex_unlock(&cpu->work_mutex);
         wi->func(cpu, wi->data);
         qemu_mutex_lock(&cpu->work_mutex);
+        if (wi->safe) {
+            atomic_dec(&safe_work_pending);
+            signal_cpu_work();
+        }
         if (wi->free) {
             g_free(wi);
         } else {
diff --git a/cpus.c b/cpus.c
index 98f60f6f98f5..bb6bd8615cfc 100644
--- a/cpus.c
+++ b/cpus.c
@@ -932,6 +932,18 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
+static void tcg_cpu_exec_start(CPUState *cpu)
+{
+    tcg_pending_cpus++;
+}
+
+static void tcg_cpu_exec_end(CPUState *cpu)
+{
+    if (--tcg_pending_cpus) {
+        signal_cpu_work();
+    }
+}
+
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
     if (cpu->stop) {
@@ -956,6 +968,8 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
     CPU_FOREACH(cpu) {
         qemu_wait_io_event_common(cpu);
     }
+
+    wait_safe_cpu_work();
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1491,7 +1505,9 @@ static void tcg_exec_all(void)
                           (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
 
         if (cpu_can_run(cpu)) {
+            tcg_cpu_exec_start(cpu);
             r = tcg_cpu_exec(cpu);
+            tcg_cpu_exec_end(cpu);
             if (r == EXCP_DEBUG) {
                 cpu_handle_guest_debug(cpu);
                 break;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 23b4b50e0a45..3bc44ed81473 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -405,10 +405,12 @@ extern int singlestep;
 
 /* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
 extern CPUState *tcg_current_cpu;
+extern int tcg_pending_cpus;
 extern bool exit_request;
 
 void wait_cpu_work(void);
 void signal_cpu_work(void);
 void flush_queued_work(CPUState *cpu);
+void wait_safe_cpu_work(void);
 
 #endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 4e688f645b4a..5128fcc1745a 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -231,6 +231,7 @@ struct qemu_work_item {
     void *data;
     int done;
     bool free;
+    bool safe;
 };
 
 /**
@@ -625,6 +626,19 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
+ * async_safe_run_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ *
+ * Schedules the function @func for execution on the vCPU @cpu asynchronously
+ * and in quiescent state. Quiescent state means: (1) all other vCPUs are
+ * halted and (2) #qemu_global_mutex (a.k.a. BQL) in system-mode or
+ * #exclusive_lock in user-mode emulation is held while @func is executing.
+ */
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
+
+/**
  * qemu_get_cpu:
  * @index: The CPUState@cpu_index value of the CPU to obtain.
  *
diff --git a/linux-user/main.c b/linux-user/main.c
index 5a68651159c2..6da3bb32186b 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -113,7 +113,6 @@ static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
 static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
 static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
 static bool exclusive_pending;
-static int tcg_pending_cpus;
 
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
@@ -219,6 +218,7 @@ static inline void cpu_exec_end(CPUState *cpu)
     }
     exclusive_idle();
     flush_queued_work(cpu);
+    wait_safe_cpu_work();
     pthread_mutex_unlock(&exclusive_lock);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe
  2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
                   ` (6 preceding siblings ...)
  2016-06-19 22:28 ` [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu() Sergey Fedorov
@ 2016-06-19 22:28 ` Sergey Fedorov
  2016-06-28 16:18   ` Alex Bennée
  7 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-19 22:28 UTC (permalink / raw)
  To: qemu-devel
  Cc: patches, Sergey Fedorov, Sergey Fedorov, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson

From: Sergey Fedorov <serge.fdrv@gmail.com>

Use async_safe_run_on_cpu() to make tb_flush() thread safe.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 translate-all.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 3f402dfe04f5..09b1d0b0efc3 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -832,7 +832,7 @@ static void page_flush_tb(void)
 
 /* flush all the translation blocks */
 /* XXX: tb_flush is currently not thread safe */
-void tb_flush(CPUState *cpu)
+static void do_tb_flush(CPUState *cpu, void *data)
 {
 #if defined(DEBUG_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
@@ -861,6 +861,11 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.tb_flush_count++;
 }
 
+void tb_flush(CPUState *cpu)
+{
+    async_safe_run_on_cpu(cpu, do_tb_flush, NULL);
+}
+
 #ifdef DEBUG_TB_CHECK
 
 static void
@@ -1163,9 +1168,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  buffer_overflow:
         /* flush must be done */
         tb_flush(cpu);
-        /* cannot fail at this point */
-        tb = tb_alloc(pc);
-        assert(tb != NULL);
+        mmap_unlock();
+        cpu_loop_exit(cpu);
     }
 
     gen_code_buf = tcg_ctx.code_gen_ptr;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers
  2016-06-19 22:28 ` [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers Sergey Fedorov
@ 2016-06-20  1:23   ` David Gibson
  2016-06-20 13:02   ` Alex Bennée
  1 sibling, 0 replies; 30+ messages in thread
From: David Gibson @ 2016-06-20  1:23 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, patches, Alex Bennée, Paolo Bonzini,
	Peter Crosthwaite, Richard Henderson, Michael S. Tsirkin,
	Eduardo Habkost, Alexander Graf, Marcelo Tosatti, qemu-ppc, kvm

[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]

On Mon, Jun 20, 2016 at 01:28:26AM +0300, Sergey Fedorov wrote:
> From: Alex Bennée <alex.bennee@linaro.org>
> 
> CPUState is a fairly common pointer to pass to these helpers. This means
> if you need other arguments for the async_run_on_cpu case you end up
> having to do a g_malloc to stuff additional data into the routine. For
> the current users this isn't a massive deal but for MTTCG this gets
> cumbersome when the only other parameter is often an address.
> 
> This adds the typedef run_on_cpu_func for helper functions which has an
> explicit CPUState * passed as the first parameter. All the users of
> run_on_cpu and async_run_on_cpu have had their helpers updated to use
> CPUState where available.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

[snip]

> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 778fa255a946..a427492c0310 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2159,10 +2159,8 @@ static void spapr_machine_finalizefn(Object *obj)
>      g_free(spapr->kvm_type);
>  }
>  
> -static void ppc_cpu_do_nmi_on_cpu(void *arg)
> +static void ppc_cpu_do_nmi_on_cpu(CPUState *cs, void *arg)
>  {
> -    CPUState *cs = arg;
> -
>      cpu_synchronize_state(cs);
>      ppc_cpu_do_system_reset(cs);
>  }
> @@ -2172,7 +2170,7 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>      CPUState *cs;
>  
>      CPU_FOREACH(cs) {
> -        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, cs);
> +        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, NULL);
>      }
>  }
>  
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 2ba5cbdb194a..22d57469b8a5 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -13,19 +13,18 @@
>  #include "kvm_ppc.h"
>  
>  struct SPRSyncState {
> -    CPUState *cs;
>      int spr;
>      target_ulong value;
>      target_ulong mask;
>  };
>  
> -static void do_spr_sync(void *arg)
> +static void do_spr_sync(CPUState *cs, void *arg)
>  {
>      struct SPRSyncState *s = arg;
> -    PowerPCCPU *cpu = POWERPC_CPU(s->cs);
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
>      CPUPPCState *env = &cpu->env;
>  
> -    cpu_synchronize_state(s->cs);
> +    cpu_synchronize_state(cs);
>      env->spr[s->spr] &= ~s->mask;
>      env->spr[s->spr] |= s->value;
>  }
> @@ -34,7 +33,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong value,
>                      target_ulong mask)
>  {
>      struct SPRSyncState s = {
> -        .cs = cs,
>          .spr = spr,
>          .value = value,
>          .mask = mask
> @@ -908,11 +906,11 @@ typedef struct {
>      Error *err;
>  } SetCompatState;

You've missed the opportunity here to remove the cpu field from
SetCompatState, as you did with SPRSyncState.

>  
> -static void do_set_compat(void *arg)
> +static void do_set_compat(CPUState *cs, void *arg)
>  {
>      SetCompatState *s = arg;
>  
> -    cpu_synchronize_state(CPU(s->cpu));
> +    cpu_synchronize_state(cs);
>      ppc_set_compat(s->cpu, s->cpu_version, &s->err);
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers
  2016-06-19 22:28 ` [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers Sergey Fedorov
  2016-06-20  1:23   ` David Gibson
@ 2016-06-20 13:02   ` Alex Bennée
  2016-06-20 13:07     ` Sergey Fedorov
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-20 13:02 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, patches, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Michael S. Tsirkin, Eduardo Habkost,
	David Gibson, Alexander Graf, Marcelo Tosatti, qemu-ppc, kvm


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Alex Bennée <alex.bennee@linaro.org>
>
> CPUState is a fairly common pointer to pass to these helpers. This means
> if you need other arguments for the async_run_on_cpu case you end up
> having to do a g_malloc to stuff additional data into the routine. For
> the current users this isn't a massive deal but for MTTCG this gets
> cumbersome when the only other parameter is often an address.
>
> This adds the typedef run_on_cpu_func for helper functions which has an
> explicit CPUState * passed as the first parameter. All the users of
> run_on_cpu and async_run_on_cpu have had their helpers updated to use
> CPUState where available.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Is this as is or did you apply the fixes from the review comments?

> ---
>  cpus.c                | 15 +++++++--------
>  hw/i386/kvm/apic.c    |  3 +--
>  hw/i386/kvmvapic.c    |  8 ++++----
>  hw/ppc/ppce500_spin.c |  3 +--
>  hw/ppc/spapr.c        |  6 ++----
>  hw/ppc/spapr_hcall.c  | 12 +++++-------
>  include/qom/cpu.h     |  8 +++++---
>  kvm-all.c             | 20 +++++++-------------
>  target-i386/helper.c  |  3 +--
>  target-i386/kvm.c     |  6 ++----
>  target-s390x/cpu.c    |  4 ++--
>  target-s390x/cpu.h    |  7 ++-----
>  12 files changed, 39 insertions(+), 56 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 84c3520d446f..049c2d04e150 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -551,9 +551,8 @@ static const VMStateDescription vmstate_timers = {
>      }
>  };
>
> -static void cpu_throttle_thread(void *opaque)
> +static void cpu_throttle_thread(CPUState *cpu, void *opaque)
>  {
> -    CPUState *cpu = opaque;
>      double pct;
>      double throttle_ratio;
>      long sleeptime_ns;
> @@ -583,7 +582,7 @@ static void cpu_throttle_timer_tick(void *opaque)
>      }
>      CPU_FOREACH(cpu) {
>          if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
> -            async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
> +            async_run_on_cpu(cpu, cpu_throttle_thread, NULL);
>          }
>      }
>
> @@ -911,12 +910,12 @@ void qemu_init_cpu_loop(void)
>      qemu_thread_get_self(&io_thread);
>  }
>
> -void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
> +void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
>      struct qemu_work_item wi;
>
>      if (qemu_cpu_is_self(cpu)) {
> -        func(data);
> +        func(cpu, data);
>          return;
>      }
>
> @@ -944,12 +943,12 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
>      }
>  }
>
> -void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
> +void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
>      struct qemu_work_item *wi;
>
>      if (qemu_cpu_is_self(cpu)) {
> -        func(data);
> +        func(cpu, data);
>          return;
>      }
>
> @@ -1000,7 +999,7 @@ static void flush_queued_work(CPUState *cpu)
>              cpu->queued_work_last = NULL;
>          }
>          qemu_mutex_unlock(&cpu->work_mutex);
> -        wi->func(wi->data);
> +        wi->func(cpu, wi->data);
>          qemu_mutex_lock(&cpu->work_mutex);
>          if (wi->free) {
>              g_free(wi);
> diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> index c5983c79be47..9b66e741d4b4 100644
> --- a/hw/i386/kvm/apic.c
> +++ b/hw/i386/kvm/apic.c
> @@ -125,10 +125,9 @@ static void kvm_apic_vapic_base_update(APICCommonState *s)
>      }
>  }
>
> -static void do_inject_external_nmi(void *data)
> +static void do_inject_external_nmi(CPUState *cpu, void *data)
>  {
>      APICCommonState *s = data;
> -    CPUState *cpu = CPU(s->cpu);
>      uint32_t lvt;
>      int ret;
>
> diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
> index 3bf1ddd97612..8063d695312e 100644
> --- a/hw/i386/kvmvapic.c
> +++ b/hw/i386/kvmvapic.c
> @@ -483,7 +483,7 @@ typedef struct VAPICEnableTPRReporting {
>      bool enable;
>  } VAPICEnableTPRReporting;
>
> -static void vapic_do_enable_tpr_reporting(void *data)
> +static void vapic_do_enable_tpr_reporting(CPUState *cpu, void *data)
>  {
>      VAPICEnableTPRReporting *info = data;
>
> @@ -734,15 +734,15 @@ static void vapic_realize(DeviceState *dev, Error **errp)
>      nb_option_roms++;
>  }
>
> -static void do_vapic_enable(void *data)
> +static void do_vapic_enable(CPUState *cpu, void *data)
>  {
>      VAPICROMState *s = data;
> -    X86CPU *cpu = X86_CPU(first_cpu);
> +    X86CPU *x86_cpu = X86_CPU(cpu);
>
>      static const uint8_t enabled = 1;
>      cpu_physical_memory_write(s->vapic_paddr + offsetof(VAPICState, enabled),
>                                &enabled, sizeof(enabled));
> -    apic_enable_vapic(cpu->apic_state, s->vapic_paddr);
> +    apic_enable_vapic(x86_cpu->apic_state, s->vapic_paddr);
>      s->state = VAPIC_ACTIVE;
>  }
>
> diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c
> index 76bd78bfd7a6..e2aeef3701ec 100644
> --- a/hw/ppc/ppce500_spin.c
> +++ b/hw/ppc/ppce500_spin.c
> @@ -94,10 +94,9 @@ static void mmubooke_create_initial_mapping(CPUPPCState *env,
>      env->tlb_dirty = true;
>  }
>
> -static void spin_kick(void *data)
> +static void spin_kick(CPUState *cpu, void *data)
>  {
>      SpinKick *kick = data;
> -    CPUState *cpu = CPU(kick->cpu);
>      CPUPPCState *env = &kick->cpu->env;
>      SpinInfo *curspin = kick->spin;
>      hwaddr map_size = 64 * 1024 * 1024;
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 778fa255a946..a427492c0310 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2159,10 +2159,8 @@ static void spapr_machine_finalizefn(Object *obj)
>      g_free(spapr->kvm_type);
>  }
>
> -static void ppc_cpu_do_nmi_on_cpu(void *arg)
> +static void ppc_cpu_do_nmi_on_cpu(CPUState *cs, void *arg)
>  {
> -    CPUState *cs = arg;
> -
>      cpu_synchronize_state(cs);
>      ppc_cpu_do_system_reset(cs);
>  }
> @@ -2172,7 +2170,7 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>      CPUState *cs;
>
>      CPU_FOREACH(cs) {
> -        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, cs);
> +        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, NULL);
>      }
>  }
>
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 2ba5cbdb194a..22d57469b8a5 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -13,19 +13,18 @@
>  #include "kvm_ppc.h"
>
>  struct SPRSyncState {
> -    CPUState *cs;
>      int spr;
>      target_ulong value;
>      target_ulong mask;
>  };
>
> -static void do_spr_sync(void *arg)
> +static void do_spr_sync(CPUState *cs, void *arg)
>  {
>      struct SPRSyncState *s = arg;
> -    PowerPCCPU *cpu = POWERPC_CPU(s->cs);
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
>      CPUPPCState *env = &cpu->env;
>
> -    cpu_synchronize_state(s->cs);
> +    cpu_synchronize_state(cs);
>      env->spr[s->spr] &= ~s->mask;
>      env->spr[s->spr] |= s->value;
>  }
> @@ -34,7 +33,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong value,
>                      target_ulong mask)
>  {
>      struct SPRSyncState s = {
> -        .cs = cs,
>          .spr = spr,
>          .value = value,
>          .mask = mask
> @@ -908,11 +906,11 @@ typedef struct {
>      Error *err;
>  } SetCompatState;
>
> -static void do_set_compat(void *arg)
> +static void do_set_compat(CPUState *cs, void *arg)
>  {
>      SetCompatState *s = arg;
>
> -    cpu_synchronize_state(CPU(s->cpu));
> +    cpu_synchronize_state(cs);
>      ppc_set_compat(s->cpu, s->cpu_version, &s->err);
>  }
>
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 32f3af3e1c63..4e688f645b4a 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -223,9 +223,11 @@ struct kvm_run;
>  #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
>
>  /* work queue */
> +typedef void (*run_on_cpu_func)(CPUState *cpu, void *data);
> +
>  struct qemu_work_item {
>      struct qemu_work_item *next;
> -    void (*func)(void *data);
> +    run_on_cpu_func func;
>      void *data;
>      int done;
>      bool free;
> @@ -610,7 +612,7 @@ bool cpu_is_stopped(CPUState *cpu);
>   *
>   * Schedules the function @func for execution on the vCPU @cpu.
>   */
> -void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
> +void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>
>  /**
>   * async_run_on_cpu:
> @@ -620,7 +622,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
>   *
>   * Schedules the function @func for execution on the vCPU @cpu asynchronously.
>   */
> -void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
> +void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>
>  /**
>   * qemu_get_cpu:
> diff --git a/kvm-all.c b/kvm-all.c
> index a88f917fda69..a2ee20cfb7f3 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1828,10 +1828,8 @@ void kvm_flush_coalesced_mmio_buffer(void)
>      s->coalesced_flush_in_progress = false;
>  }
>
> -static void do_kvm_cpu_synchronize_state(void *arg)
> +static void do_kvm_cpu_synchronize_state(CPUState *cpu, void *arg)
>  {
> -    CPUState *cpu = arg;
> -
>      if (!cpu->kvm_vcpu_dirty) {
>          kvm_arch_get_registers(cpu);
>          cpu->kvm_vcpu_dirty = true;
> @@ -1841,34 +1839,30 @@ static void do_kvm_cpu_synchronize_state(void *arg)
>  void kvm_cpu_synchronize_state(CPUState *cpu)
>  {
>      if (!cpu->kvm_vcpu_dirty) {
> -        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
> +        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, NULL);
>      }
>  }
>
> -static void do_kvm_cpu_synchronize_post_reset(void *arg)
> +static void do_kvm_cpu_synchronize_post_reset(CPUState *cpu, void *arg)
>  {
> -    CPUState *cpu = arg;
> -
>      kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
>      cpu->kvm_vcpu_dirty = false;
>  }
>
>  void kvm_cpu_synchronize_post_reset(CPUState *cpu)
>  {
> -    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, cpu);
> +    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, NULL);
>  }
>
> -static void do_kvm_cpu_synchronize_post_init(void *arg)
> +static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, void *arg)
>  {
> -    CPUState *cpu = arg;
> -
>      kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
>      cpu->kvm_vcpu_dirty = false;
>  }
>
>  void kvm_cpu_synchronize_post_init(CPUState *cpu)
>  {
> -    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, cpu);
> +    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, NULL);
>  }
>
>  int kvm_cpu_exec(CPUState *cpu)
> @@ -2210,7 +2204,7 @@ struct kvm_set_guest_debug_data {
>      int err;
>  };
>
> -static void kvm_invoke_set_guest_debug(void *data)
> +static void kvm_invoke_set_guest_debug(CPUState *unused_cpu, void *data)
>  {
>      struct kvm_set_guest_debug_data *dbg_data = data;
>
> diff --git a/target-i386/helper.c b/target-i386/helper.c
> index 1c250b82452d..2e438fc1dd8f 100644
> --- a/target-i386/helper.c
> +++ b/target-i386/helper.c
> @@ -1122,11 +1122,10 @@ typedef struct MCEInjectionParams {
>      int flags;
>  } MCEInjectionParams;
>
> -static void do_inject_x86_mce(void *data)
> +static void do_inject_x86_mce(CPUState *cpu, void *data)
>  {
>      MCEInjectionParams *params = data;
>      CPUX86State *cenv = &params->cpu->env;
> -    CPUState *cpu = CPU(params->cpu);
>      uint64_t *banks = cenv->mce_banks + 4 * params->bank;
>
>      cpu_synchronize_state(cpu);
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index ff92b1d118e1..6a239cd16c0f 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -151,10 +151,8 @@ static int kvm_get_tsc(CPUState *cs)
>      return 0;
>  }
>
> -static inline void do_kvm_synchronize_tsc(void *arg)
> +static inline void do_kvm_synchronize_tsc(CPUState *cpu, void *arg)
>  {
> -    CPUState *cpu = arg;
> -
>      kvm_get_tsc(cpu);
>  }
>
> @@ -164,7 +162,7 @@ void kvm_synchronize_all_tsc(void)
>
>      if (kvm_enabled()) {
>          CPU_FOREACH(cpu) {
> -            run_on_cpu(cpu, do_kvm_synchronize_tsc, cpu);
> +            run_on_cpu(cpu, do_kvm_synchronize_tsc, NULL);
>          }
>      }
>  }
> diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
> index e43e2d61550b..4f09c647b7a8 100644
> --- a/target-s390x/cpu.c
> +++ b/target-s390x/cpu.c
> @@ -188,7 +188,7 @@ static void s390_cpu_machine_reset_cb(void *opaque)
>  {
>      S390CPU *cpu = opaque;
>
> -    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, CPU(cpu));
> +    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, NULL);
>  }
>  #endif
>
> @@ -238,7 +238,7 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp)
>      s390_cpu_gdb_init(cs);
>      qemu_init_vcpu(cs);
>  #if !defined(CONFIG_USER_ONLY)
> -    run_on_cpu(cs, s390_do_cpu_full_reset, cs);
> +    run_on_cpu(cs, s390_do_cpu_full_reset, NULL);
>  #else
>      cpu_reset(cs);
>  #endif
> diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
> index bd6b2e57ef6c..3ae5d2d7de3f 100644
> --- a/target-s390x/cpu.h
> +++ b/target-s390x/cpu.h
> @@ -499,17 +499,14 @@ static inline hwaddr decode_basedisp_s(CPUS390XState *env, uint32_t ipb,
>  #define decode_basedisp_rs decode_basedisp_s
>
>  /* helper functions for run_on_cpu() */
> -static inline void s390_do_cpu_reset(void *arg)
> +static inline void s390_do_cpu_reset(CPUState *cs, void *arg)
>  {
> -    CPUState *cs = arg;
>      S390CPUClass *scc = S390_CPU_GET_CLASS(cs);
>
>      scc->cpu_reset(cs);
>  }
> -static inline void s390_do_cpu_full_reset(void *arg)
> +static inline void s390_do_cpu_full_reset(CPUState *cs, void *arg)
>  {
> -    CPUState *cs = arg;
> -
>      cpu_reset(cs);
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers
  2016-06-20 13:02   ` Alex Bennée
@ 2016-06-20 13:07     ` Sergey Fedorov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-20 13:07 UTC (permalink / raw)
  To: Alex Bennée
  Cc: qemu-devel, patches, Paolo Bonzini, Peter Crosthwaite,
	Richard Henderson, Michael S. Tsirkin, Eduardo Habkost,
	David Gibson, Alexander Graf, Marcelo Tosatti, qemu-ppc, kvm

On 20/06/16 16:02, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> > From: Alex Bennée <alex.bennee@linaro.org>
>> >
>> > CPUState is a fairly common pointer to pass to these helpers. This means
>> > if you need other arguments for the async_run_on_cpu case you end up
>> > having to do a g_malloc to stuff additional data into the routine. For
>> > the current users this isn't a massive deal but for MTTCG this gets
>> > cumbersome when the only other parameter is often an address.
>> >
>> > This adds the typedef run_on_cpu_func for helper functions which has an
>> > explicit CPUState * passed as the first parameter. All the users of
>> > run_on_cpu and async_run_on_cpu have had their helpers updated to use
>> > CPUState where available.
>> >
>> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> > Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> Is this as is or did you apply the fixes from the review comments?
>
It's "as is" so far. I could take charge of this patch in some later
re-spin.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 2/8] cpus: Move common code out of {async_, }run_on_cpu()
  2016-06-19 22:28 ` [Qemu-devel] [RFC 2/8] cpus: Move common code out of {async_, }run_on_cpu() Sergey Fedorov
@ 2016-06-27  8:54   ` Alex Bennée
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2016-06-27  8:54 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, Peter Crosthwaite, patches, Paolo Bonzini,
	Sergey Fedorov, Richard Henderson


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Move the code common between run_on_cpu() and async_run_on_cpu() into a
> new function queue_work_on_cpu().
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

> ---
>  cpus.c | 42 ++++++++++++++++++------------------------
>  1 file changed, 18 insertions(+), 24 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index 049c2d04e150..04687c85bcd4 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -910,6 +910,22 @@ void qemu_init_cpu_loop(void)
>      qemu_thread_get_self(&io_thread);
>  }
>
> +static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> +{
> +    qemu_mutex_lock(&cpu->work_mutex);
> +    if (cpu->queued_work_first == NULL) {
> +        cpu->queued_work_first = wi;
> +    } else {
> +        cpu->queued_work_last->next = wi;
> +    }
> +    cpu->queued_work_last = wi;
> +    wi->next = NULL;
> +    wi->done = false;
> +    qemu_mutex_unlock(&cpu->work_mutex);
> +
> +    qemu_cpu_kick(cpu);
> +}
> +
>  void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
>      struct qemu_work_item wi;
> @@ -923,18 +939,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      wi.data = data;
>      wi.free = false;
>
> -    qemu_mutex_lock(&cpu->work_mutex);
> -    if (cpu->queued_work_first == NULL) {
> -        cpu->queued_work_first = &wi;
> -    } else {
> -        cpu->queued_work_last->next = &wi;
> -    }
> -    cpu->queued_work_last = &wi;
> -    wi.next = NULL;
> -    wi.done = false;
> -    qemu_mutex_unlock(&cpu->work_mutex);
> -
> -    qemu_cpu_kick(cpu);
> +    queue_work_on_cpu(cpu, &wi);
>      while (!atomic_mb_read(&wi.done)) {
>          CPUState *self_cpu = current_cpu;
>
> @@ -957,18 +962,7 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      wi->data = data;
>      wi->free = true;
>
> -    qemu_mutex_lock(&cpu->work_mutex);
> -    if (cpu->queued_work_first == NULL) {
> -        cpu->queued_work_first = wi;
> -    } else {
> -        cpu->queued_work_last->next = wi;
> -    }
> -    cpu->queued_work_last = wi;
> -    wi->next = NULL;
> -    wi->done = false;
> -    qemu_mutex_unlock(&cpu->work_mutex);
> -
> -    qemu_cpu_kick(cpu);
> +    queue_work_on_cpu(cpu, wi);
>  }
>
>  static void qemu_kvm_destroy_vcpu(CPUState *cpu)


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism
  2016-06-19 22:28 ` [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism Sergey Fedorov
@ 2016-06-27  9:02   ` Alex Bennée
  2016-06-29 14:57     ` Sergey Fedorov
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-27  9:02 UTC (permalink / raw)
  To: Sergey Fedorov; +Cc: qemu-devel, Sergey Fedorov, Riku Voipio, patches


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> A single variable 'pending_cpus' was used for both counting currently
> running CPUs and for signalling the pending exclusive operation request.
>
> To prepare for supporting operations which requires a quiescent state,
> like translation buffer flush, it is useful to keep a counter of
> currently running CPUs always up to date.
>
> Use a separate variable 'tcg_pending_cpus' to count for currently
> running CPUs and a separate variable 'exclusive_pending' to indicate
> that there's an exclusive operation pending.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  linux-user/main.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index b9a4e0ea45ac..485336f78b8f 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -111,7 +111,8 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
>  static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
>  static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
> -static int pending_cpus;
> +static bool exclusive_pending;
> +static int tcg_pending_cpus;

I'm not sure you need to re-name to tcg_pending_cpus as TCG is implied
for linux-user. Also they are not really CPUs (although we are using the
CPU structure for each running thread). I'm not sure if there is a
neater way to make the distinction clear.

>
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
> @@ -133,7 +134,8 @@ void fork_end(int child)
>                  QTAILQ_REMOVE(&cpus, cpu, node);
>              }
>          }
> -        pending_cpus = 0;
> +        tcg_pending_cpus = 0;
> +        exclusive_pending = false;
>          pthread_mutex_init(&exclusive_lock, NULL);
>          pthread_mutex_init(&cpu_list_mutex, NULL);
>          pthread_cond_init(&exclusive_cond, NULL);
> @@ -150,7 +152,7 @@ void fork_end(int child)
>     must be held.  */
>  static inline void exclusive_idle(void)
>  {
> -    while (pending_cpus) {
> +    while (exclusive_pending) {
>          pthread_cond_wait(&exclusive_resume, &exclusive_lock);
>      }
>  }
> @@ -164,15 +166,14 @@ static inline void start_exclusive(void)
>      pthread_mutex_lock(&exclusive_lock);
>      exclusive_idle();
>
> -    pending_cpus = 1;
> +    exclusive_pending = true;
>      /* Make all other cpus stop executing.  */
>      CPU_FOREACH(other_cpu) {
>          if (other_cpu->running) {
> -            pending_cpus++;
>              cpu_exit(other_cpu);
>          }
>      }
> -    if (pending_cpus > 1) {
> +    while (tcg_pending_cpus) {
>          pthread_cond_wait(&exclusive_cond, &exclusive_lock);
>      }
>  }
> @@ -180,7 +181,7 @@ static inline void start_exclusive(void)
>  /* Finish an exclusive operation.  */
>  static inline void __attribute__((unused)) end_exclusive(void)
>  {
> -    pending_cpus = 0;
> +    exclusive_pending = false;
>      pthread_cond_broadcast(&exclusive_resume);
>      pthread_mutex_unlock(&exclusive_lock);
>  }
> @@ -191,6 +192,7 @@ static inline void cpu_exec_start(CPUState *cpu)
>      pthread_mutex_lock(&exclusive_lock);
>      exclusive_idle();
>      cpu->running = true;
> +    tcg_pending_cpus++;

These aren't TLS variables so shouldn't we be ensuring all access is atomic?

>      pthread_mutex_unlock(&exclusive_lock);
>  }
>
> @@ -199,11 +201,9 @@ static inline void cpu_exec_end(CPUState *cpu)
>  {
>      pthread_mutex_lock(&exclusive_lock);
>      cpu->running = false;
> -    if (pending_cpus > 1) {
> -        pending_cpus--;
> -        if (pending_cpus == 1) {
> -            pthread_cond_signal(&exclusive_cond);
> -        }
> +    tcg_pending_cpus--;
> +    if (!tcg_pending_cpus) {
> +        pthread_cond_broadcast(&exclusive_cond);
>      }

Couldn't two threads race to -1 here?

>      exclusive_idle();
>      pthread_mutex_unlock(&exclusive_lock);


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-19 22:28 ` [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue Sergey Fedorov
@ 2016-06-27  9:31   ` Alex Bennée
  2016-06-29 14:59     ` Sergey Fedorov
  2016-06-29 16:17   ` Alex Bennée
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-27  9:31 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, Riku Voipio, Peter Crosthwaite, patches,
	Paolo Bonzini, Sergey Fedorov, Richard Henderson


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Make CPU work core functions common between system and user-mode
> emulation. User-mode does not have BQL, so flush_queued_work() is
> protected by 'exclusive_lock'.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec-common.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
>  cpus.c                  | 87 ++-----------------------------------------------
>  include/exec/exec-all.h |  4 +++
>  linux-user/main.c       | 13 ++++++++
>  4 files changed, 102 insertions(+), 85 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 0cb4ae60eff9..8184e0662cbd 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -77,3 +77,86 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>      }
>      siglongjmp(cpu->jmp_env, 1);
>  }
> +
> +static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> +{
> +    qemu_mutex_lock(&cpu->work_mutex);
> +    if (cpu->queued_work_first == NULL) {
> +        cpu->queued_work_first = wi;
> +    } else {
> +        cpu->queued_work_last->next = wi;
> +    }
> +    cpu->queued_work_last = wi;
> +    wi->next = NULL;
> +    wi->done = false;
> +    qemu_mutex_unlock(&cpu->work_mutex);
> +
> +    qemu_cpu_kick(cpu);
> +}
> +
> +void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    struct qemu_work_item wi;
> +
> +    if (qemu_cpu_is_self(cpu)) {
> +        func(cpu, data);
> +        return;
> +    }
> +
> +    wi.func = func;
> +    wi.data = data;
> +    wi.free = false;
> +
> +    queue_work_on_cpu(cpu, &wi);
> +    while (!atomic_mb_read(&wi.done)) {
> +        CPUState *self_cpu = current_cpu;
> +
> +        wait_cpu_work();
> +        current_cpu = self_cpu;
> +    }
> +}
> +
> +void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    struct qemu_work_item *wi;
> +
> +    if (qemu_cpu_is_self(cpu)) {
> +        func(cpu, data);
> +        return;
> +    }
> +
> +    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    wi->func = func;
> +    wi->data = data;
> +    wi->free = true;
> +
> +    queue_work_on_cpu(cpu, wi);
> +}
> +
> +void flush_queued_work(CPUState *cpu)
> +{
> +    struct qemu_work_item *wi;
> +
> +    if (cpu->queued_work_first == NULL) {
> +        return;
> +    }
> +
> +    qemu_mutex_lock(&cpu->work_mutex);
> +    while (cpu->queued_work_first != NULL) {
> +        wi = cpu->queued_work_first;
> +        cpu->queued_work_first = wi->next;
> +        if (!cpu->queued_work_first) {
> +            cpu->queued_work_last = NULL;
> +        }
> +        qemu_mutex_unlock(&cpu->work_mutex);
> +        wi->func(cpu, wi->data);
> +        qemu_mutex_lock(&cpu->work_mutex);
> +        if (wi->free) {
> +            g_free(wi);
> +        } else {
> +            atomic_mb_set(&wi->done, true);
> +        }
> +    }
> +    qemu_mutex_unlock(&cpu->work_mutex);
> +    signal_cpu_work();
> +}
> diff --git a/cpus.c b/cpus.c
> index f123eb707cc6..98f60f6f98f5 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -910,71 +910,16 @@ void qemu_init_cpu_loop(void)
>      qemu_thread_get_self(&io_thread);
>  }
>
> -static void wait_cpu_work(void)
> +void wait_cpu_work(void)
>  {
>      qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
>  }
>
> -static void signal_cpu_work(void)
> +void signal_cpu_work(void)
>  {
>      qemu_cond_broadcast(&qemu_work_cond);
>  }
>
> -static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> -{
> -    qemu_mutex_lock(&cpu->work_mutex);
> -    if (cpu->queued_work_first == NULL) {
> -        cpu->queued_work_first = wi;
> -    } else {
> -        cpu->queued_work_last->next = wi;
> -    }
> -    cpu->queued_work_last = wi;
> -    wi->next = NULL;
> -    wi->done = false;
> -    qemu_mutex_unlock(&cpu->work_mutex);
> -
> -    qemu_cpu_kick(cpu);
> -}
> -
> -void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> -{
> -    struct qemu_work_item wi;
> -
> -    if (qemu_cpu_is_self(cpu)) {
> -        func(cpu, data);
> -        return;
> -    }
> -
> -    wi.func = func;
> -    wi.data = data;
> -    wi.free = false;
> -
> -    queue_work_on_cpu(cpu, &wi);
> -    while (!atomic_mb_read(&wi.done)) {
> -        CPUState *self_cpu = current_cpu;
> -
> -        wait_cpu_work();
> -        current_cpu = self_cpu;
> -    }
> -}
> -
> -void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> -{
> -    struct qemu_work_item *wi;
> -
> -    if (qemu_cpu_is_self(cpu)) {
> -        func(cpu, data);
> -        return;
> -    }
> -
> -    wi = g_malloc0(sizeof(struct qemu_work_item));
> -    wi->func = func;
> -    wi->data = data;
> -    wi->free = true;
> -
> -    queue_work_on_cpu(cpu, wi);
> -}
> -
>  static void qemu_kvm_destroy_vcpu(CPUState *cpu)
>  {
>      if (kvm_destroy_vcpu(cpu) < 0) {
> @@ -987,34 +932,6 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
>  {
>  }
>
> -static void flush_queued_work(CPUState *cpu)
> -{
> -    struct qemu_work_item *wi;
> -
> -    if (cpu->queued_work_first == NULL) {
> -        return;
> -    }
> -
> -    qemu_mutex_lock(&cpu->work_mutex);
> -    while (cpu->queued_work_first != NULL) {
> -        wi = cpu->queued_work_first;
> -        cpu->queued_work_first = wi->next;
> -        if (!cpu->queued_work_first) {
> -            cpu->queued_work_last = NULL;
> -        }
> -        qemu_mutex_unlock(&cpu->work_mutex);
> -        wi->func(cpu, wi->data);
> -        qemu_mutex_lock(&cpu->work_mutex);
> -        if (wi->free) {
> -            g_free(wi);
> -        } else {
> -            atomic_mb_set(&wi->done, true);
> -        }
> -    }
> -    qemu_mutex_unlock(&cpu->work_mutex);
> -    signal_cpu_work();
> -}
> -
>  static void qemu_wait_io_event_common(CPUState *cpu)
>  {
>      if (cpu->stop) {
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index c1f59fa59d2c..23b4b50e0a45 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -407,4 +407,8 @@ extern int singlestep;
>  extern CPUState *tcg_current_cpu;
>  extern bool exit_request;
>
> +void wait_cpu_work(void);
> +void signal_cpu_work(void);
> +void flush_queued_work(CPUState *cpu);
> +

Now these are public APIs (and have multiple implementations) some doc
comments would be useful here.

>  #endif
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 0093a8008c8e..5a68651159c2 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -111,6 +111,7 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
>  static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
>  static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
> +static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
>  static bool exclusive_pending;
>  static int tcg_pending_cpus;
>
> @@ -140,6 +141,7 @@ void fork_end(int child)
>          pthread_mutex_init(&cpu_list_mutex, NULL);
>          pthread_cond_init(&exclusive_cond, NULL);
>          pthread_cond_init(&exclusive_resume, NULL);
> +        pthread_cond_init(&work_cond, NULL);
>          qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>          gdbserver_fork(thread_cpu);
>      } else {
> @@ -148,6 +150,16 @@ void fork_end(int child)
>      }
>  }
>
> +void wait_cpu_work(void)
> +{
> +    pthread_cond_wait(&work_cond, &exclusive_lock);
> +}
> +
> +void signal_cpu_work(void)
> +{
> +    pthread_cond_broadcast(&work_cond);
> +}
> +
>  /* Wait for pending exclusive operations to complete.  The exclusive lock
>     must be held.  */
>  static inline void exclusive_idle(void)
> @@ -206,6 +218,7 @@ static inline void cpu_exec_end(CPUState *cpu)
>          pthread_cond_broadcast(&exclusive_cond);
>      }
>      exclusive_idle();
> +    flush_queued_work(cpu);
>      pthread_mutex_unlock(&exclusive_lock);
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-06-19 22:28 ` [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu() Sergey Fedorov
@ 2016-06-27  9:36   ` Alex Bennée
  2016-06-29 15:03     ` Sergey Fedorov
  2016-07-01 16:29   ` Alvise Rigo
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-27  9:36 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, Riku Voipio, Peter Crosthwaite, patches,
	Paolo Bonzini, Sergey Fedorov, Richard Henderson


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> This patch is based on the ideas found in work of KONRAD Frederic [1],
> Alex Bennée [2], and Alvise Rigo [3].
>
> This mechanism allows to perform an operation safely in a quiescent
> state. Quiescent state means: (1) no vCPU is running and (2) BQL in
> system-mode or 'exclusive_lock' in user-mode emulation is held while
> performing the operation. This functionality is required e.g. for
> performing translation buffer flush safely in multi-threaded user-mode
> emulation.
>
> The existing CPU work queue is used to schedule such safe operations. A
> new 'safe' flag is added into struct qemu_work_item to designate the
> special requirements of the safe work. An operation in a quiescent sate
> can be scheduled by using async_safe_run_on_cpu() function which is
> actually the same as sync_run_on_cpu() except that it marks the queued
> work item with the 'safe' flag set to true. Given this flag set
> queue_work_on_cpu() atomically increments 'safe_work_pending' global
> counter and kicks all the CPUs instead of just the target CPU as in case
> of normal CPU work. This allows to force other CPUs to exit their
> execution loops and wait in wait_safe_cpu_work() function for the safe
> work to finish. When a CPU drains its work queue, if it encounters a
> work item marked as safe, it first waits for other CPUs to exit their
> execution loops, then called the work item function, and finally
> decrements 'safe_work_pending' counter with signalling other CPUs to let
> them continue execution as soon as all pending safe work items have been
> processed. The 'tcg_pending_cpus' protected by 'exclusive_lock' in
> user-mode or by 'qemu_global_mutex' in system-mode emulation is used to
> determine if there is any CPU run and wait for it to exit the execution
> loop. The fairness of all the CPU work queues is ensured by draining all
> the pending safe work items before any CPU can run.
>
> [1] http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01128.html
> [2] http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02531.html
> [3] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04792.html
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec-common.c       | 45 ++++++++++++++++++++++++++++++++++++++++++++-
>  cpus.c                  | 16 ++++++++++++++++
>  include/exec/exec-all.h |  2 ++
>  include/qom/cpu.h       | 14 ++++++++++++++
>  linux-user/main.c       |  2 +-
>  5 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 8184e0662cbd..3056324738f8 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -25,6 +25,7 @@
>
>  bool exit_request;
>  CPUState *tcg_current_cpu;
> +int tcg_pending_cpus;
>
>  /* exit the current TB, but without causing any exception to be raised */
>  void cpu_loop_exit_noexc(CPUState *cpu)
> @@ -78,6 +79,15 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>      siglongjmp(cpu->jmp_env, 1);
>  }
>
> +static int safe_work_pending;
> +
> +void wait_safe_cpu_work(void)
> +{
> +    while (atomic_mb_read(&safe_work_pending) > 0) {
> +        wait_cpu_work();
> +    }
> +}
> +
>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>  {
>      qemu_mutex_lock(&cpu->work_mutex);
> @@ -89,9 +99,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>      cpu->queued_work_last = wi;
>      wi->next = NULL;
>      wi->done = false;
> +    if (wi->safe) {
> +        atomic_inc(&safe_work_pending);
> +    }
>      qemu_mutex_unlock(&cpu->work_mutex);
>
> -    qemu_cpu_kick(cpu);
> +    if (!wi->safe) {
> +        qemu_cpu_kick(cpu);
> +    } else {
> +        CPU_FOREACH(cpu) {
> +            qemu_cpu_kick(cpu);
> +        }
> +    }
>  }
>
>  void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> @@ -106,6 +125,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      wi.func = func;
>      wi.data = data;
>      wi.free = false;
> +    wi.safe = false;
>
>      queue_work_on_cpu(cpu, &wi);
>      while (!atomic_mb_read(&wi.done)) {
> @@ -129,6 +149,20 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      wi->func = func;
>      wi->data = data;
>      wi->free = true;
> +    wi->safe = false;
> +
> +    queue_work_on_cpu(cpu, wi);
> +}
> +
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    struct qemu_work_item *wi;
> +
> +    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    wi->func = func;
> +    wi->data = data;
> +    wi->free = true;
> +    wi->safe = true;
>
>      queue_work_on_cpu(cpu, wi);
>  }
> @@ -148,9 +182,18 @@ void flush_queued_work(CPUState *cpu)
>          if (!cpu->queued_work_first) {
>              cpu->queued_work_last = NULL;
>          }
> +        if (wi->safe) {
> +            while (tcg_pending_cpus) {
> +                wait_cpu_work();
> +            }
> +        }
>          qemu_mutex_unlock(&cpu->work_mutex);
>          wi->func(cpu, wi->data);
>          qemu_mutex_lock(&cpu->work_mutex);
> +        if (wi->safe) {
> +            atomic_dec(&safe_work_pending);
> +            signal_cpu_work();
> +        }
>          if (wi->free) {
>              g_free(wi);
>          } else {
> diff --git a/cpus.c b/cpus.c
> index 98f60f6f98f5..bb6bd8615cfc 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -932,6 +932,18 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
>  {
>  }
>
> +static void tcg_cpu_exec_start(CPUState *cpu)
> +{
> +    tcg_pending_cpus++;
> +}
> +
> +static void tcg_cpu_exec_end(CPUState *cpu)
> +{
> +    if (--tcg_pending_cpus) {
> +        signal_cpu_work();
> +    }
> +}

Don't these need to be atomic?

> +
>  static void qemu_wait_io_event_common(CPUState *cpu)
>  {
>      if (cpu->stop) {
> @@ -956,6 +968,8 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>      CPU_FOREACH(cpu) {
>          qemu_wait_io_event_common(cpu);
>      }
> +
> +    wait_safe_cpu_work();
>  }
>
>  static void qemu_kvm_wait_io_event(CPUState *cpu)
> @@ -1491,7 +1505,9 @@ static void tcg_exec_all(void)
>                            (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
>
>          if (cpu_can_run(cpu)) {
> +            tcg_cpu_exec_start(cpu);
>              r = tcg_cpu_exec(cpu);
> +            tcg_cpu_exec_end(cpu);
>              if (r == EXCP_DEBUG) {
>                  cpu_handle_guest_debug(cpu);
>                  break;
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 23b4b50e0a45..3bc44ed81473 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -405,10 +405,12 @@ extern int singlestep;
>
>  /* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
>  extern CPUState *tcg_current_cpu;
> +extern int tcg_pending_cpus;
>  extern bool exit_request;
>
>  void wait_cpu_work(void);
>  void signal_cpu_work(void);
>  void flush_queued_work(CPUState *cpu);
> +void wait_safe_cpu_work(void);
>
>  #endif
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 4e688f645b4a..5128fcc1745a 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -231,6 +231,7 @@ struct qemu_work_item {
>      void *data;
>      int done;
>      bool free;
> +    bool safe;
>  };
>
>  /**
> @@ -625,6 +626,19 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>
>  /**
> + * async_safe_run_on_cpu:
> + * @cpu: The vCPU to run on.
> + * @func: The function to be executed.
> + * @data: Data to pass to the function.
> + *
> + * Schedules the function @func for execution on the vCPU @cpu asynchronously
> + * and in quiescent state. Quiescent state means: (1) all other vCPUs are
> + * halted and (2) #qemu_global_mutex (a.k.a. BQL) in system-mode or
> + * #exclusive_lock in user-mode emulation is held while @func is executing.
> + */
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
> +
> +/**
>   * qemu_get_cpu:
>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>   *
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 5a68651159c2..6da3bb32186b 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -113,7 +113,6 @@ static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
>  static bool exclusive_pending;
> -static int tcg_pending_cpus;
>
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
> @@ -219,6 +218,7 @@ static inline void cpu_exec_end(CPUState *cpu)
>      }
>      exclusive_idle();
>      flush_queued_work(cpu);
> +    wait_safe_cpu_work();
>      pthread_mutex_unlock(&exclusive_lock);
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe
  2016-06-19 22:28 ` [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe Sergey Fedorov
@ 2016-06-28 16:18   ` Alex Bennée
  2016-06-29 15:03     ` Sergey Fedorov
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-28 16:18 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, Peter Crosthwaite, patches, Paolo Bonzini,
	Sergey Fedorov, Richard Henderson


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Use async_safe_run_on_cpu() to make tb_flush() thread safe.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  translate-all.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/translate-all.c b/translate-all.c
> index 3f402dfe04f5..09b1d0b0efc3 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -832,7 +832,7 @@ static void page_flush_tb(void)
>
>  /* flush all the translation blocks */
>  /* XXX: tb_flush is currently not thread safe */
           ^^^

The comment belies a lack of confidence ;-)

> -void tb_flush(CPUState *cpu)
> +static void do_tb_flush(CPUState *cpu, void *data)
>  {
>  #if defined(DEBUG_FLUSH)
>      printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
> @@ -861,6 +861,11 @@ void tb_flush(CPUState *cpu)
>      tcg_ctx.tb_ctx.tb_flush_count++;
>  }
>
> +void tb_flush(CPUState *cpu)
> +{
> +    async_safe_run_on_cpu(cpu, do_tb_flush, NULL);
> +}
> +
>  #ifdef DEBUG_TB_CHECK
>
>  static void
> @@ -1163,9 +1168,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>   buffer_overflow:
>          /* flush must be done */
>          tb_flush(cpu);
> -        /* cannot fail at this point */
> -        tb = tb_alloc(pc);
> -        assert(tb != NULL);
> +        mmap_unlock();
> +        cpu_loop_exit(cpu);
>      }
>
>      gen_code_buf = tcg_ctx.code_gen_ptr;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism
  2016-06-27  9:02   ` Alex Bennée
@ 2016-06-29 14:57     ` Sergey Fedorov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-29 14:57 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov; +Cc: qemu-devel, Riku Voipio, patches

On 27/06/16 12:02, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
(snip)
>> diff --git a/linux-user/main.c b/linux-user/main.c
>> index b9a4e0ea45ac..485336f78b8f 100644
>> --- a/linux-user/main.c
>> +++ b/linux-user/main.c
>> @@ -111,7 +111,8 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
>>  static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
>>  static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
>>  static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
>> -static int pending_cpus;
>> +static bool exclusive_pending;
>> +static int tcg_pending_cpus;
> I'm not sure you need to re-name to tcg_pending_cpus as TCG is implied
> for linux-user. Also they are not really CPUs (although we are using the
> CPU structure for each running thread). I'm not sure if there is a
> neater way to make the distinction clear.

How about 'tcg_pending_threads'? It is going to be used in system-mode
soon, so I'd like to keep "tcg_" prefix.

>
>>  /* Make sure everything is in a consistent state for calling fork().  */
>>  void fork_start(void)
>> @@ -133,7 +134,8 @@ void fork_end(int child)
>>                  QTAILQ_REMOVE(&cpus, cpu, node);
>>              }
>>          }
>> -        pending_cpus = 0;
>> +        tcg_pending_cpus = 0;
>> +        exclusive_pending = false;
>>          pthread_mutex_init(&exclusive_lock, NULL);
>>          pthread_mutex_init(&cpu_list_mutex, NULL);
>>          pthread_cond_init(&exclusive_cond, NULL);
>> @@ -150,7 +152,7 @@ void fork_end(int child)
>>     must be held.  */
>>  static inline void exclusive_idle(void)
>>  {
>> -    while (pending_cpus) {
>> +    while (exclusive_pending) {
>>          pthread_cond_wait(&exclusive_resume, &exclusive_lock);
>>      }
>>  }
>> @@ -164,15 +166,14 @@ static inline void start_exclusive(void)
>>      pthread_mutex_lock(&exclusive_lock);
>>      exclusive_idle();
>>
>> -    pending_cpus = 1;
>> +    exclusive_pending = true;
>>      /* Make all other cpus stop executing.  */
>>      CPU_FOREACH(other_cpu) {
>>          if (other_cpu->running) {
>> -            pending_cpus++;
>>              cpu_exit(other_cpu);
>>          }
>>      }
>> -    if (pending_cpus > 1) {
>> +    while (tcg_pending_cpus) {
>>          pthread_cond_wait(&exclusive_cond, &exclusive_lock);
>>      }
>>  }
>> @@ -180,7 +181,7 @@ static inline void start_exclusive(void)
>>  /* Finish an exclusive operation.  */
>>  static inline void __attribute__((unused)) end_exclusive(void)
>>  {
>> -    pending_cpus = 0;
>> +    exclusive_pending = false;
>>      pthread_cond_broadcast(&exclusive_resume);
>>      pthread_mutex_unlock(&exclusive_lock);
>>  }
>> @@ -191,6 +192,7 @@ static inline void cpu_exec_start(CPUState *cpu)
>>      pthread_mutex_lock(&exclusive_lock);
>>      exclusive_idle();
>>      cpu->running = true;
>> +    tcg_pending_cpus++;
> These aren't TLS variables so shouldn't we be ensuring all access is atomic?

It is protected by 'exclusive_lock'.

>
>>      pthread_mutex_unlock(&exclusive_lock);
>>  }
>>
>> @@ -199,11 +201,9 @@ static inline void cpu_exec_end(CPUState *cpu)
>>  {
>>      pthread_mutex_lock(&exclusive_lock);
>>      cpu->running = false;
>> -    if (pending_cpus > 1) {
>> -        pending_cpus--;
>> -        if (pending_cpus == 1) {
>> -            pthread_cond_signal(&exclusive_cond);
>> -        }
>> +    tcg_pending_cpus--;
>> +    if (!tcg_pending_cpus) {
>> +        pthread_cond_broadcast(&exclusive_cond);
>>      }
> Couldn't two threads race to -1 here?

See comment above.

Kind regards,
Sergey

>
>>      exclusive_idle();
>>      pthread_mutex_unlock(&exclusive_lock);
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-27  9:31   ` Alex Bennée
@ 2016-06-29 14:59     ` Sergey Fedorov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-29 14:59 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, Riku Voipio, Peter Crosthwaite, patches,
	Paolo Bonzini, Richard Henderson

On 27/06/16 12:31, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index c1f59fa59d2c..23b4b50e0a45 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -407,4 +407,8 @@ extern int singlestep;
>>  extern CPUState *tcg_current_cpu;
>>  extern bool exit_request;
>>
>> +void wait_cpu_work(void);
>> +void signal_cpu_work(void);
>> +void flush_queued_work(CPUState *cpu);
>> +
> Now these are public APIs (and have multiple implementations) some doc
> comments would be useful here.


Sure, I'll do this as soon as I'm sure that this is a right approach.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-06-27  9:36   ` Alex Bennée
@ 2016-06-29 15:03     ` Sergey Fedorov
  2016-06-29 16:09       ` Alex Bennée
  0 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-29 15:03 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, Riku Voipio, Peter Crosthwaite, patches,
	Paolo Bonzini, Richard Henderson

On 27/06/16 12:36, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
(snip)
>> diff --git a/cpus.c b/cpus.c
>> index 98f60f6f98f5..bb6bd8615cfc 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -932,6 +932,18 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
>>  {
>>  }
>>
>> +static void tcg_cpu_exec_start(CPUState *cpu)
>> +{
>> +    tcg_pending_cpus++;
>> +}
>> +
>> +static void tcg_cpu_exec_end(CPUState *cpu)
>> +{
>> +    if (--tcg_pending_cpus) {
>> +        signal_cpu_work();
>> +    }
>> +}
> Don't these need to be atomic?

'tcg_pending_cpus' is protected by BQL.

>
>> +
>>  static void qemu_wait_io_event_common(CPUState *cpu)
>>  {
>>      if (cpu->stop) {
>>
(snip)

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe
  2016-06-28 16:18   ` Alex Bennée
@ 2016-06-29 15:03     ` Sergey Fedorov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-29 15:03 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, Peter Crosthwaite, patches, Paolo Bonzini,
	Richard Henderson

On 28/06/16 19:18, Alex Bennée wrote:
> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> Use async_safe_run_on_cpu() to make tb_flush() thread safe.
>>
>> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
>> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
>> ---
>>  translate-all.c | 12 ++++++++----
>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/translate-all.c b/translate-all.c
>> index 3f402dfe04f5..09b1d0b0efc3 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
>> @@ -832,7 +832,7 @@ static void page_flush_tb(void)
>>
>>  /* flush all the translation blocks */
>>  /* XXX: tb_flush is currently not thread safe */
>            ^^^
>
> The comment belies a lack of confidence ;-)

Nice catch!

Thanks,
Sergey

>
>> -void tb_flush(CPUState *cpu)
>> +static void do_tb_flush(CPUState *cpu, void *data)
>>  {
>>  #if defined(DEBUG_FLUSH)
>>      printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
>> @@ -861,6 +861,11 @@ void tb_flush(CPUState *cpu)
>>      tcg_ctx.tb_ctx.tb_flush_count++;
>>  }
>>
>> +void tb_flush(CPUState *cpu)
>> +{
>> +    async_safe_run_on_cpu(cpu, do_tb_flush, NULL);
>> +}
>> +
>>  #ifdef DEBUG_TB_CHECK
>>
>>  static void
>> @@ -1163,9 +1168,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>>   buffer_overflow:
>>          /* flush must be done */
>>          tb_flush(cpu);
>> -        /* cannot fail at this point */
>> -        tb = tb_alloc(pc);
>> -        assert(tb != NULL);
>> +        mmap_unlock();
>> +        cpu_loop_exit(cpu);
>>      }
>>
>>      gen_code_buf = tcg_ctx.code_gen_ptr;
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-06-29 15:03     ` Sergey Fedorov
@ 2016-06-29 16:09       ` Alex Bennée
  0 siblings, 0 replies; 30+ messages in thread
From: Alex Bennée @ 2016-06-29 16:09 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, Riku Voipio, Peter Crosthwaite,
	patches, Paolo Bonzini, Richard Henderson


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 27/06/16 12:36, Alex Bennée wrote:
>> Sergey Fedorov <sergey.fedorov@linaro.org> writes:
>>
>>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>>
> (snip)
>>> diff --git a/cpus.c b/cpus.c
>>> index 98f60f6f98f5..bb6bd8615cfc 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -932,6 +932,18 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
>>>  {
>>>  }
>>>
>>> +static void tcg_cpu_exec_start(CPUState *cpu)
>>> +{
>>> +    tcg_pending_cpus++;
>>> +}
>>> +
>>> +static void tcg_cpu_exec_end(CPUState *cpu)
>>> +{
>>> +    if (--tcg_pending_cpus) {
>>> +        signal_cpu_work();
>>> +    }
>>> +}
>> Don't these need to be atomic?
>
> 'tcg_pending_cpus' is protected by BQL.

A quick comment above the function would help then.

>
>>
>>> +
>>>  static void qemu_wait_io_event_common(CPUState *cpu)
>>>  {
>>>      if (cpu->stop) {
>>>
> (snip)
>
> Thanks,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-19 22:28 ` [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue Sergey Fedorov
  2016-06-27  9:31   ` Alex Bennée
@ 2016-06-29 16:17   ` Alex Bennée
  2016-06-30  9:39     ` Sergey Fedorov
  1 sibling, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-29 16:17 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: qemu-devel, Riku Voipio, Peter Crosthwaite, patches,
	Paolo Bonzini, Sergey Fedorov, Richard Henderson


Sergey Fedorov <sergey.fedorov@linaro.org> writes:

> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Make CPU work core functions common between system and user-mode
> emulation. User-mode does not have BQL, so flush_queued_work() is
> protected by 'exclusive_lock'.
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec-common.c       | 83 ++++++++++++++++++++++++++++++++++++++++++++++
>  cpus.c                  | 87 ++-----------------------------------------------
>  include/exec/exec-all.h |  4 +++
>  linux-user/main.c       | 13 ++++++++
>  4 files changed, 102 insertions(+), 85 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 0cb4ae60eff9..8184e0662cbd 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -77,3 +77,86 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>      }
>      siglongjmp(cpu->jmp_env, 1);
>  }
> +
> +static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> +{
> +    qemu_mutex_lock(&cpu->work_mutex);
> +    if (cpu->queued_work_first == NULL) {
> +        cpu->queued_work_first = wi;
> +    } else {
> +        cpu->queued_work_last->next = wi;
> +    }
> +    cpu->queued_work_last = wi;
> +    wi->next = NULL;
> +    wi->done = false;
> +    qemu_mutex_unlock(&cpu->work_mutex);
> +
> +    qemu_cpu_kick(cpu);
> +}
> +
> +void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    struct qemu_work_item wi;
> +
> +    if (qemu_cpu_is_self(cpu)) {
> +        func(cpu, data);
> +        return;
> +    }
> +
> +    wi.func = func;
> +    wi.data = data;
> +    wi.free = false;
> +
> +    queue_work_on_cpu(cpu, &wi);
> +    while (!atomic_mb_read(&wi.done)) {
> +        CPUState *self_cpu = current_cpu;
> +
> +        wait_cpu_work();
> +        current_cpu = self_cpu;
> +    }
> +}
> +
> +void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    struct qemu_work_item *wi;
> +
> +    if (qemu_cpu_is_self(cpu)) {
> +        func(cpu, data);
> +        return;
> +    }
> +
> +    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    wi->func = func;
> +    wi->data = data;
> +    wi->free = true;
> +
> +    queue_work_on_cpu(cpu, wi);
> +}
> +
> +void flush_queued_work(CPUState *cpu)
> +{
> +    struct qemu_work_item *wi;
> +
> +    if (cpu->queued_work_first == NULL) {
> +        return;
> +    }
> +
> +    qemu_mutex_lock(&cpu->work_mutex);
> +    while (cpu->queued_work_first != NULL) {
> +        wi = cpu->queued_work_first;
> +        cpu->queued_work_first = wi->next;
> +        if (!cpu->queued_work_first) {
> +            cpu->queued_work_last = NULL;
> +        }
> +        qemu_mutex_unlock(&cpu->work_mutex);
> +        wi->func(cpu, wi->data);
> +        qemu_mutex_lock(&cpu->work_mutex);
> +        if (wi->free) {
> +            g_free(wi);
> +        } else {
> +            atomic_mb_set(&wi->done, true);
> +        }
> +    }
> +    qemu_mutex_unlock(&cpu->work_mutex);
> +    signal_cpu_work();
> +}
> diff --git a/cpus.c b/cpus.c
> index f123eb707cc6..98f60f6f98f5 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -910,71 +910,16 @@ void qemu_init_cpu_loop(void)
>      qemu_thread_get_self(&io_thread);
>  }
>
> -static void wait_cpu_work(void)
> +void wait_cpu_work(void)
>  {
>      qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
>  }
>
> -static void signal_cpu_work(void)
> +void signal_cpu_work(void)
>  {
>      qemu_cond_broadcast(&qemu_work_cond);
>  }
>
> -static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> -{
> -    qemu_mutex_lock(&cpu->work_mutex);
> -    if (cpu->queued_work_first == NULL) {
> -        cpu->queued_work_first = wi;
> -    } else {
> -        cpu->queued_work_last->next = wi;
> -    }
> -    cpu->queued_work_last = wi;
> -    wi->next = NULL;
> -    wi->done = false;
> -    qemu_mutex_unlock(&cpu->work_mutex);
> -
> -    qemu_cpu_kick(cpu);
> -}
> -
> -void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> -{
> -    struct qemu_work_item wi;
> -
> -    if (qemu_cpu_is_self(cpu)) {
> -        func(cpu, data);
> -        return;
> -    }
> -
> -    wi.func = func;
> -    wi.data = data;
> -    wi.free = false;
> -
> -    queue_work_on_cpu(cpu, &wi);
> -    while (!atomic_mb_read(&wi.done)) {
> -        CPUState *self_cpu = current_cpu;
> -
> -        wait_cpu_work();
> -        current_cpu = self_cpu;
> -    }
> -}
> -
> -void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> -{
> -    struct qemu_work_item *wi;
> -
> -    if (qemu_cpu_is_self(cpu)) {
> -        func(cpu, data);
> -        return;
> -    }
> -
> -    wi = g_malloc0(sizeof(struct qemu_work_item));
> -    wi->func = func;
> -    wi->data = data;
> -    wi->free = true;
> -
> -    queue_work_on_cpu(cpu, wi);
> -}
> -
>  static void qemu_kvm_destroy_vcpu(CPUState *cpu)
>  {
>      if (kvm_destroy_vcpu(cpu) < 0) {
> @@ -987,34 +932,6 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
>  {
>  }
>
> -static void flush_queued_work(CPUState *cpu)
> -{
> -    struct qemu_work_item *wi;
> -
> -    if (cpu->queued_work_first == NULL) {
> -        return;
> -    }
> -
> -    qemu_mutex_lock(&cpu->work_mutex);
> -    while (cpu->queued_work_first != NULL) {
> -        wi = cpu->queued_work_first;
> -        cpu->queued_work_first = wi->next;
> -        if (!cpu->queued_work_first) {
> -            cpu->queued_work_last = NULL;
> -        }
> -        qemu_mutex_unlock(&cpu->work_mutex);
> -        wi->func(cpu, wi->data);
> -        qemu_mutex_lock(&cpu->work_mutex);
> -        if (wi->free) {
> -            g_free(wi);
> -        } else {
> -            atomic_mb_set(&wi->done, true);
> -        }
> -    }
> -    qemu_mutex_unlock(&cpu->work_mutex);
> -    signal_cpu_work();
> -}
> -
>  static void qemu_wait_io_event_common(CPUState *cpu)
>  {
>      if (cpu->stop) {
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index c1f59fa59d2c..23b4b50e0a45 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -407,4 +407,8 @@ extern int singlestep;
>  extern CPUState *tcg_current_cpu;
>  extern bool exit_request;
>
> +void wait_cpu_work(void);
> +void signal_cpu_work(void);
> +void flush_queued_work(CPUState *cpu);
> +
>  #endif
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 0093a8008c8e..5a68651159c2 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -111,6 +111,7 @@ static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
>  static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
>  static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
> +static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
>  static bool exclusive_pending;
>  static int tcg_pending_cpus;
>
> @@ -140,6 +141,7 @@ void fork_end(int child)
>          pthread_mutex_init(&cpu_list_mutex, NULL);
>          pthread_cond_init(&exclusive_cond, NULL);
>          pthread_cond_init(&exclusive_resume, NULL);
> +        pthread_cond_init(&work_cond, NULL);
>          qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
>          gdbserver_fork(thread_cpu);
>      } else {
> @@ -148,6 +150,16 @@ void fork_end(int child)
>      }
>  }
>
> +void wait_cpu_work(void)
> +{
> +    pthread_cond_wait(&work_cond, &exclusive_lock);
> +}
> +
> +void signal_cpu_work(void)
> +{
> +    pthread_cond_broadcast(&work_cond);
> +}
> +
>  /* Wait for pending exclusive operations to complete.  The exclusive lock
>     must be held.  */
>  static inline void exclusive_idle(void)
> @@ -206,6 +218,7 @@ static inline void cpu_exec_end(CPUState *cpu)
>          pthread_cond_broadcast(&exclusive_cond);
>      }
>      exclusive_idle();
> +    flush_queued_work(cpu);
>      pthread_mutex_unlock(&exclusive_lock);
>  }

So I think there is a deadlock we can get with the async work:

(gdb) thread apply all bt

Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
#2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
#3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
#4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
#5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
#6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
#7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

<a bunch of other threads doing the same and then...>

Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
#0  0x00005555555faf5d in safe_syscall_base ()
#1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
    at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
#2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
    at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
#3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
    arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
#4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
#5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
#6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
#7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

So everything is stalled awaiting this thread waking up and draining
its queue. So for linux-user I think we need some mechanism to kick
these syscalls which I assume means throwing a signal at it.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-29 16:17   ` Alex Bennée
@ 2016-06-30  9:39     ` Sergey Fedorov
  2016-06-30 10:32       ` Alex Bennée
  0 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-30  9:39 UTC (permalink / raw)
  To: Alex Bennée, Sergey Fedorov
  Cc: qemu-devel, Riku Voipio, Peter Crosthwaite, patches,
	Paolo Bonzini, Richard Henderson

On 29/06/16 19:17, Alex Bennée wrote:
> So I think there is a deadlock we can get with the async work:
>
> (gdb) thread apply all bt
>
> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
> #1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
> #2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
> #3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
> #4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> <a bunch of other threads doing the same and then...>
>
> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
> #0  0x00005555555faf5d in safe_syscall_base ()
> #1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
> #2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
> #3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
>     arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
> #4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>
> So everything is stalled awaiting this thread waking up and draining
> its queue. So for linux-user I think we need some mechanism to kick
> these syscalls which I assume means throwing a signal at it.

Nice catch! How did you get it? We always go through cpu_exec_end()
before serving a guest syscall and always go through cpu_exec_start()
before entering the guest code execution loop. If we always schedule
safe work on the current thread's queue then I think there's a way to
make it safe and avoid kicking syscalls.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-30  9:39     ` Sergey Fedorov
@ 2016-06-30 10:32       ` Alex Bennée
  2016-06-30 10:35         ` Sergey Fedorov
  0 siblings, 1 reply; 30+ messages in thread
From: Alex Bennée @ 2016-06-30 10:32 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Sergey Fedorov, qemu-devel, Riku Voipio, Peter Crosthwaite,
	patches, Paolo Bonzini, Richard Henderson


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 29/06/16 19:17, Alex Bennée wrote:
>> So I think there is a deadlock we can get with the async work:
>>
>> (gdb) thread apply all bt
>>
>> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
>> #2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
>> #3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
>> #4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>
>> <a bunch of other threads doing the same and then...>
>>
>> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
>> #0  0x00005555555faf5d in safe_syscall_base ()
>> #1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
>> #2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
>> #3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
>>     arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
>> #4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>
>> So everything is stalled awaiting this thread waking up and draining
>> its queue. So for linux-user I think we need some mechanism to kick
>> these syscalls which I assume means throwing a signal at it.
>
> Nice catch! How did you get it?

Running pigz (armhf, debian) to compress stuff.

> We always go through cpu_exec_end()
> before serving a guest syscall and always go through cpu_exec_start()
> before entering the guest code execution loop. If we always schedule
> safe work on the current thread's queue then I think there's a way to
> make it safe and avoid kicking syscalls.

Not let the signals complete until safe work is done?
>
> Thanks,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-30 10:32       ` Alex Bennée
@ 2016-06-30 10:35         ` Sergey Fedorov
  2016-07-01  8:56           ` Sergey Fedorov
  0 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-06-30 10:35 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Riku Voipio, Peter Crosthwaite,
	patches, Paolo Bonzini, Richard Henderson

On 30/06/16 13:32, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 29/06/16 19:17, Alex Bennée wrote:
>>> So I think there is a deadlock we can get with the async work:
>>>
>>> (gdb) thread apply all bt
>>>
>>> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>> #1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
>>> #2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
>>> #3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
>>> #4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>
>>> <a bunch of other threads doing the same and then...>
>>>
>>> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
>>> #0  0x00005555555faf5d in safe_syscall_base ()
>>> #1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
>>> #2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
>>> #3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
>>>     arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
>>> #4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>
>>> So everything is stalled awaiting this thread waking up and draining
>>> its queue. So for linux-user I think we need some mechanism to kick
>>> these syscalls which I assume means throwing a signal at it.
>> Nice catch! How did you get it?
> Running pigz (armhf, debian) to compress stuff.
>
>> We always go through cpu_exec_end()
>> before serving a guest syscall and always go through cpu_exec_start()
>> before entering the guest code execution loop. If we always schedule
>> safe work on the current thread's queue then I think there's a way to
>> make it safe and avoid kicking syscalls.
> Not let the signals complete until safe work is done?

I'm thinking of waiting for completion of safe works in cpu_exec_start()
as well as in cpu_exec_end().

Regards,
Sergey

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-06-30 10:35         ` Sergey Fedorov
@ 2016-07-01  8:56           ` Sergey Fedorov
  2016-07-01  9:04             ` Sergey Fedorov
  0 siblings, 1 reply; 30+ messages in thread
From: Sergey Fedorov @ 2016-07-01  8:56 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Riku Voipio, Peter Crosthwaite,
	patches, Paolo Bonzini, Richard Henderson

On 30/06/16 13:35, Sergey Fedorov wrote:
> On 30/06/16 13:32, Alex Bennée wrote:
>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>
>>> On 29/06/16 19:17, Alex Bennée wrote:
>>>> So I think there is a deadlock we can get with the async work:
>>>>
>>>> (gdb) thread apply all bt
>>>>
>>>> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>> #1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
>>>> #2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
>>>> #3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
>>>> #4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
>>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
>>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> <a bunch of other threads doing the same and then...>
>>>>
>>>> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
>>>> #0  0x00005555555faf5d in safe_syscall_base ()
>>>> #1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
>>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
>>>> #2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
>>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
>>>> #3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
>>>>     arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
>>>> #4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
>>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
>>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>
>>>> So everything is stalled awaiting this thread waking up and draining
>>>> its queue. So for linux-user I think we need some mechanism to kick
>>>> these syscalls which I assume means throwing a signal at it.
>>> Nice catch! How did you get it?
>> Running pigz (armhf, debian) to compress stuff.
>>
>>> We always go through cpu_exec_end()
>>> before serving a guest syscall and always go through cpu_exec_start()
>>> before entering the guest code execution loop. If we always schedule
>>> safe work on the current thread's queue then I think there's a way to
>>> make it safe and avoid kicking syscalls.
>> Not let the signals complete until safe work is done?
> I'm thinking of waiting for completion of safe works in cpu_exec_start()
> as well as in cpu_exec_end().

I found a mistake in my code which causes deadlocks in my run of pigz.
Could you also try running it after applying the following patch?

diff --git a/linux-user/main.c b/linux-user/main.c
index 6da3bb32186b..1dca55145c56 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -214,7 +214,7 @@ static inline void cpu_exec_end(CPUState *cpu)
     cpu->running = false;
     tcg_pending_cpus--;
     if (!tcg_pending_cpus) {
-        pthread_cond_broadcast(&exclusive_cond);
+        signal_cpu_work();
     }
     exclusive_idle();
     flush_queued_work(cpu);


Thanks,
Sergey

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
  2016-07-01  8:56           ` Sergey Fedorov
@ 2016-07-01  9:04             ` Sergey Fedorov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-07-01  9:04 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, qemu-devel, Riku Voipio, Peter Crosthwaite,
	patches, Paolo Bonzini, Richard Henderson

On 01/07/16 11:56, Sergey Fedorov wrote:
> On 30/06/16 13:35, Sergey Fedorov wrote:
>> On 30/06/16 13:32, Alex Bennée wrote:
>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>
>>>> On 29/06/16 19:17, Alex Bennée wrote:
>>>>> So I think there is a deadlock we can get with the async work:
>>>>>
>>>>> (gdb) thread apply all bt
>>>>>
>>>>> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
>>>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>>> #1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
>>>>> #2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
>>>>> #3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
>>>>> #4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
>>>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
>>>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>>
>>>>> <a bunch of other threads doing the same and then...>
>>>>>
>>>>> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
>>>>> #0  0x00005555555faf5d in safe_syscall_base ()
>>>>> #1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
>>>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
>>>>> #2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
>>>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
>>>>> #3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
>>>>>     arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
>>>>> #4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
>>>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
>>>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>>>
>>>>> So everything is stalled awaiting this thread waking up and draining
>>>>> its queue. So for linux-user I think we need some mechanism to kick
>>>>> these syscalls which I assume means throwing a signal at it.
>>>> Nice catch! How did you get it?
>>> Running pigz (armhf, debian) to compress stuff.
>>>
>>>> We always go through cpu_exec_end()
>>>> before serving a guest syscall and always go through cpu_exec_start()
>>>> before entering the guest code execution loop. If we always schedule
>>>> safe work on the current thread's queue then I think there's a way to
>>>> make it safe and avoid kicking syscalls.
>>> Not let the signals complete until safe work is done?
>> I'm thinking of waiting for completion of safe works in cpu_exec_start()
>> as well as in cpu_exec_end().
> I found a mistake in my code which causes deadlocks in my run of pigz.
> Could you also try running it after applying the following patch?
>
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 6da3bb32186b..1dca55145c56 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -214,7 +214,7 @@ static inline void cpu_exec_end(CPUState *cpu)
>      cpu->running = false;
>      tcg_pending_cpus--;
>      if (!tcg_pending_cpus) {
> -        pthread_cond_broadcast(&exclusive_cond);
> +        signal_cpu_work();
>      }
>      exclusive_idle();
>      flush_queued_work(cpu);
>
>

Or better this one:

diff --git a/linux-user/main.c b/linux-user/main.c
index 6da3bb32186b..aba550b69611 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -111,7 +111,6 @@ static pthread_mutex_t cpu_list_mutex =
PTHREAD_MUTEX_INITIALIZER;
 static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
 static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
 static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
-static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
 static bool exclusive_pending;
 
 /* Make sure everything is in a consistent state for calling fork().  */
@@ -140,7 +139,6 @@ void fork_end(int child)
         pthread_mutex_init(&cpu_list_mutex, NULL);
         pthread_cond_init(&exclusive_cond, NULL);
         pthread_cond_init(&exclusive_resume, NULL);
-        pthread_cond_init(&work_cond, NULL);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork(thread_cpu);
     } else {
@@ -151,12 +149,12 @@ void fork_end(int child)
 
 void wait_cpu_work(void)
 {
-    pthread_cond_wait(&work_cond, &exclusive_lock);
+    pthread_cond_wait(&exclusive_cond, &exclusive_lock);
 }
 
 void signal_cpu_work(void)
 {
-    pthread_cond_broadcast(&work_cond);
+    pthread_cond_broadcast(&exclusive_cond);
 }
 
 /* Wait for pending exclusive operations to complete.  The exclusive lock
@@ -214,7 +212,7 @@ static inline void cpu_exec_end(CPUState *cpu)
     cpu->running = false;
     tcg_pending_cpus--;
     if (!tcg_pending_cpus) {
-        pthread_cond_broadcast(&exclusive_cond);
+        signal_cpu_work();
     }
     exclusive_idle();
     flush_queued_work(cpu);

Thanks,
Sergey

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-06-19 22:28 ` [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu() Sergey Fedorov
  2016-06-27  9:36   ` Alex Bennée
@ 2016-07-01 16:29   ` Alvise Rigo
  2016-07-01 16:55     ` Sergey Fedorov
  1 sibling, 1 reply; 30+ messages in thread
From: Alvise Rigo @ 2016-07-01 16:29 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: QEMU Developers, Riku Voipio, Peter Crosthwaite,
	patches@linaro.org, Paolo Bonzini, Sergey Fedorov,
	Richard Henderson

Hi Sergey,

On Mon, Jun 20, 2016 at 12:28 AM, Sergey Fedorov
<sergey.fedorov@linaro.org> wrote:
>
> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> This patch is based on the ideas found in work of KONRAD Frederic [1],
> Alex Bennée [2], and Alvise Rigo [3].
>
> This mechanism allows to perform an operation safely in a quiescent
> state. Quiescent state means: (1) no vCPU is running and (2) BQL in
> system-mode or 'exclusive_lock' in user-mode emulation is held while
> performing the operation. This functionality is required e.g. for
> performing translation buffer flush safely in multi-threaded user-mode
> emulation.
>
> The existing CPU work queue is used to schedule such safe operations. A
> new 'safe' flag is added into struct qemu_work_item to designate the
> special requirements of the safe work. An operation in a quiescent sate
> can be scheduled by using async_safe_run_on_cpu() function which is
> actually the same as sync_run_on_cpu() except that it marks the queued
> work item with the 'safe' flag set to true. Given this flag set
> queue_work_on_cpu() atomically increments 'safe_work_pending' global
> counter and kicks all the CPUs instead of just the target CPU as in case
> of normal CPU work. This allows to force other CPUs to exit their
> execution loops and wait in wait_safe_cpu_work() function for the safe
> work to finish. When a CPU drains its work queue, if it encounters a
> work item marked as safe, it first waits for other CPUs to exit their
> execution loops, then called the work item function, and finally
> decrements 'safe_work_pending' counter with signalling other CPUs to let
> them continue execution as soon as all pending safe work items have been
> processed. The 'tcg_pending_cpus' protected by 'exclusive_lock' in
> user-mode or by 'qemu_global_mutex' in system-mode emulation is used to
> determine if there is any CPU run and wait for it to exit the execution
> loop. The fairness of all the CPU work queues is ensured by draining all
> the pending safe work items before any CPU can run.
>
> [1] http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01128.html
> [2] http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02531.html
> [3] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04792.html
>
> Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
> Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> ---
>  cpu-exec-common.c       | 45 ++++++++++++++++++++++++++++++++++++++++++++-
>  cpus.c                  | 16 ++++++++++++++++
>  include/exec/exec-all.h |  2 ++
>  include/qom/cpu.h       | 14 ++++++++++++++
>  linux-user/main.c       |  2 +-
>  5 files changed, 77 insertions(+), 2 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 8184e0662cbd..3056324738f8 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -25,6 +25,7 @@
>
>  bool exit_request;
>  CPUState *tcg_current_cpu;
> +int tcg_pending_cpus;
>
>  /* exit the current TB, but without causing any exception to be raised */
>  void cpu_loop_exit_noexc(CPUState *cpu)
> @@ -78,6 +79,15 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>      siglongjmp(cpu->jmp_env, 1);
>  }
>
> +static int safe_work_pending;
> +
> +void wait_safe_cpu_work(void)
> +{
> +    while (atomic_mb_read(&safe_work_pending) > 0) {
> +        wait_cpu_work();
> +    }
> +}
> +

Is this piece of code deadlock-safe once we are in mttcg mode?
What happens when two threads call simultaneously async_safe_run_on_cpu?

Thank you,
alvise

>
>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>  {
>      qemu_mutex_lock(&cpu->work_mutex);
> @@ -89,9 +99,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>      cpu->queued_work_last = wi;
>      wi->next = NULL;
>      wi->done = false;
> +    if (wi->safe) {
> +        atomic_inc(&safe_work_pending);
> +    }
>      qemu_mutex_unlock(&cpu->work_mutex);
>
> -    qemu_cpu_kick(cpu);
> +    if (!wi->safe) {
> +        qemu_cpu_kick(cpu);
> +    } else {
> +        CPU_FOREACH(cpu) {
> +            qemu_cpu_kick(cpu);
> +        }
> +    }
>  }
>
>  void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> @@ -106,6 +125,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      wi.func = func;
>      wi.data = data;
>      wi.free = false;
> +    wi.safe = false;
>
>      queue_work_on_cpu(cpu, &wi);
>      while (!atomic_mb_read(&wi.done)) {
> @@ -129,6 +149,20 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>      wi->func = func;
>      wi->data = data;
>      wi->free = true;
> +    wi->safe = false;
> +
> +    queue_work_on_cpu(cpu, wi);
> +}
> +
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
> +{
> +    struct qemu_work_item *wi;
> +
> +    wi = g_malloc0(sizeof(struct qemu_work_item));
> +    wi->func = func;
> +    wi->data = data;
> +    wi->free = true;
> +    wi->safe = true;
>
>      queue_work_on_cpu(cpu, wi);
>  }
> @@ -148,9 +182,18 @@ void flush_queued_work(CPUState *cpu)
>          if (!cpu->queued_work_first) {
>              cpu->queued_work_last = NULL;
>          }
> +        if (wi->safe) {
> +            while (tcg_pending_cpus) {
> +                wait_cpu_work();
> +            }
> +        }
>          qemu_mutex_unlock(&cpu->work_mutex);
>          wi->func(cpu, wi->data);
>          qemu_mutex_lock(&cpu->work_mutex);
> +        if (wi->safe) {
> +            atomic_dec(&safe_work_pending);
> +            signal_cpu_work();
> +        }
>          if (wi->free) {
>              g_free(wi);
>          } else {
> diff --git a/cpus.c b/cpus.c
> index 98f60f6f98f5..bb6bd8615cfc 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -932,6 +932,18 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
>  {
>  }
>
> +static void tcg_cpu_exec_start(CPUState *cpu)
> +{
> +    tcg_pending_cpus++;
> +}
> +
> +static void tcg_cpu_exec_end(CPUState *cpu)
> +{
> +    if (--tcg_pending_cpus) {
> +        signal_cpu_work();
> +    }
> +}
> +
>  static void qemu_wait_io_event_common(CPUState *cpu)
>  {
>      if (cpu->stop) {
> @@ -956,6 +968,8 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>      CPU_FOREACH(cpu) {
>          qemu_wait_io_event_common(cpu);
>      }
> +
> +    wait_safe_cpu_work();
>  }
>
>  static void qemu_kvm_wait_io_event(CPUState *cpu)
> @@ -1491,7 +1505,9 @@ static void tcg_exec_all(void)
>                            (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
>
>          if (cpu_can_run(cpu)) {
> +            tcg_cpu_exec_start(cpu);
>              r = tcg_cpu_exec(cpu);
> +            tcg_cpu_exec_end(cpu);
>              if (r == EXCP_DEBUG) {
>                  cpu_handle_guest_debug(cpu);
>                  break;
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 23b4b50e0a45..3bc44ed81473 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -405,10 +405,12 @@ extern int singlestep;
>
>  /* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
>  extern CPUState *tcg_current_cpu;
> +extern int tcg_pending_cpus;
>  extern bool exit_request;
>
>  void wait_cpu_work(void);
>  void signal_cpu_work(void);
>  void flush_queued_work(CPUState *cpu);
> +void wait_safe_cpu_work(void);
>
>  #endif
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 4e688f645b4a..5128fcc1745a 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -231,6 +231,7 @@ struct qemu_work_item {
>      void *data;
>      int done;
>      bool free;
> +    bool safe;
>  };
>
>  /**
> @@ -625,6 +626,19 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
>
>  /**
> + * async_safe_run_on_cpu:
> + * @cpu: The vCPU to run on.
> + * @func: The function to be executed.
> + * @data: Data to pass to the function.
> + *
> + * Schedules the function @func for execution on the vCPU @cpu asynchronously
> + * and in quiescent state. Quiescent state means: (1) all other vCPUs are
> + * halted and (2) #qemu_global_mutex (a.k.a. BQL) in system-mode or
> + * #exclusive_lock in user-mode emulation is held while @func is executing.
> + */
> +void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
> +
> +/**
>   * qemu_get_cpu:
>   * @index: The CPUState@cpu_index value of the CPU to obtain.
>   *
> diff --git a/linux-user/main.c b/linux-user/main.c
> index 5a68651159c2..6da3bb32186b 100644
> --- a/linux-user/main.c
> +++ b/linux-user/main.c
> @@ -113,7 +113,6 @@ static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
>  static pthread_cond_t work_cond = PTHREAD_COND_INITIALIZER;
>  static bool exclusive_pending;
> -static int tcg_pending_cpus;
>
>  /* Make sure everything is in a consistent state for calling fork().  */
>  void fork_start(void)
> @@ -219,6 +218,7 @@ static inline void cpu_exec_end(CPUState *cpu)
>      }
>      exclusive_idle();
>      flush_queued_work(cpu);
> +    wait_safe_cpu_work();
>      pthread_mutex_unlock(&exclusive_lock);
>  }
>
> --
> 1.9.1
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-07-01 16:29   ` Alvise Rigo
@ 2016-07-01 16:55     ` Sergey Fedorov
  0 siblings, 0 replies; 30+ messages in thread
From: Sergey Fedorov @ 2016-07-01 16:55 UTC (permalink / raw)
  To: Alvise Rigo, Sergey Fedorov
  Cc: QEMU Developers, Riku Voipio, Peter Crosthwaite,
	patches@linaro.org, Paolo Bonzini, Richard Henderson

On 01/07/16 19:29, Alvise Rigo wrote:
> Hi Sergey,
>
> On Mon, Jun 20, 2016 at 12:28 AM, Sergey Fedorov
> <sergey.fedorov@linaro.org> wrote:
>> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
>> index 8184e0662cbd..3056324738f8 100644
>> --- a/cpu-exec-common.c
>> +++ b/cpu-exec-common.c
>> @@ -25,6 +25,7 @@
>>
>>  bool exit_request;
>>  CPUState *tcg_current_cpu;
>> +int tcg_pending_cpus;
>>
>>  /* exit the current TB, but without causing any exception to be raised */
>>  void cpu_loop_exit_noexc(CPUState *cpu)
>> @@ -78,6 +79,15 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>>      siglongjmp(cpu->jmp_env, 1);
>>  }
>>
>> +static int safe_work_pending;
>> +
>> +void wait_safe_cpu_work(void)
>> +{
>> +    while (atomic_mb_read(&safe_work_pending) > 0) {
>> +        wait_cpu_work();
>> +    }
>> +}
>> +
> Is this piece of code deadlock-safe once we are in mttcg mode?

It is supposed to be deadlock-safe.

> What happens when two threads call simultaneously async_safe_run_on_cpu?
>

In this case each thread will roughly:
 - exit its execution loop;
 - take BQL;
 - decrement 'tcg_pending_cpus', signal 'qemu_work_cond' if zero;
 - start processing its work queue;
 - encountering safe work wait on 'qemu_work_cond' for
'tcg_pending_cpus' to become zero;
 - reacquire BQL;
 - process the safe work;
 - decrement 'safe_work_pending', signal 'qemu_work_cond' if zero;
 - when finished processing work, wait on 'qemu_work_cond' for
'safe_work_pending' to become zero;
 - reacquire BQL;
 - continue execution (releasing BQL).

Hope this will help.

Kind regards,
Sergey.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2016-07-01 16:55 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-19 22:28 [Qemu-devel] [RFC 0/8] cpu-exec: Safe work in quiescent state Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 1/8] cpus: pass CPUState to run_on_cpu helpers Sergey Fedorov
2016-06-20  1:23   ` David Gibson
2016-06-20 13:02   ` Alex Bennée
2016-06-20 13:07     ` Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 2/8] cpus: Move common code out of {async_, }run_on_cpu() Sergey Fedorov
2016-06-27  8:54   ` Alex Bennée
2016-06-19 22:28 ` [Qemu-devel] [RFC 3/8] cpus: Add 'qemu_work_cond' usage wrappers Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 4/8] linux-user: Rework exclusive operation mechanism Sergey Fedorov
2016-06-27  9:02   ` Alex Bennée
2016-06-29 14:57     ` Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 5/8] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue Sergey Fedorov
2016-06-27  9:31   ` Alex Bennée
2016-06-29 14:59     ` Sergey Fedorov
2016-06-29 16:17   ` Alex Bennée
2016-06-30  9:39     ` Sergey Fedorov
2016-06-30 10:32       ` Alex Bennée
2016-06-30 10:35         ` Sergey Fedorov
2016-07-01  8:56           ` Sergey Fedorov
2016-07-01  9:04             ` Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 7/8] cpu-exec-common: Introduce async_safe_run_on_cpu() Sergey Fedorov
2016-06-27  9:36   ` Alex Bennée
2016-06-29 15:03     ` Sergey Fedorov
2016-06-29 16:09       ` Alex Bennée
2016-07-01 16:29   ` Alvise Rigo
2016-07-01 16:55     ` Sergey Fedorov
2016-06-19 22:28 ` [Qemu-devel] [RFC 8/8] tcg: Make tb_flush() thread safe Sergey Fedorov
2016-06-28 16:18   ` Alex Bennée
2016-06-29 15:03     ` Sergey Fedorov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).