[Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state
@ 2016-08-02 17:27 Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 01/13] atomic: introduce atomic_dec_fetch Alex Bennée
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée

Hi,

I've picked up this work from Sergey and will be taking it forward
from now on.

Apart from adding my s-o-b tags to all the patches there are only two
changes. Both are to wait_safe_cpu_work function to prevent dead-lock
conditions.

First I have added the macro can_wait_for_safe() which compiles away
to 0 on SoftMMU. This will be tweaked in later MTTCG work.

Second I have ensured we signal the qemu_exclusive_cond conditional if
tcg_pending_threads is 0 by the time to we want to sleep waiting for
safe work to run.

Finally I've added another patch at the end of the series which
converts everything to a GArray. The main driver was my MTTCG test
case which was particularly stressing of memory as multi-thousand
queues of flushes backed up. I've since mitigated that with other
changes to the cputlb code but it does have the advantage of avoiding
bouncing the lock as we go through the queue. If people aren't happy
with the change we can always drop it.

I'm keen to get this work merged as soon as the tree re-opens so any
additional comments will be helpful.

I've been using this patch set along with the hot-path tweaks in
Paolo's tree as a base for the ongoing MTTCG patches. When I post the
next set of patches they will be based of this tree:

  https://github.com/stsquad/qemu/tree/mttcg/async-safe-work-v5

Which is:
  - v2.7.0-rc0
  - plus Reduce lock contention on TCG hot-path (v5, Paolo's tree)
  - plus cpu-exec: Safe work in quiescent state (v5, this series)

Alex Bennée (3):
  atomic: introduce atomic_dec_fetch.
  cpus: pass CPUState to run_on_cpu helpers
  cpu-exec: replace cpu->queued_work with GArray

Sergey Fedorov (10):
  cpus: Move common code out of {async_,}run_on_cpu()
  cpus: Wrap mutex used to protect CPU work
  cpus: Rename flush_queued_work()
  linux-user: Use QemuMutex and QemuCond
  linux-user: Rework exclusive operation mechanism
  linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  linux-user: Support CPU work queue
  bsd-user: Support CPU work queue
  cpu-exec-common: Introduce async_safe_run_on_cpu()
  tcg: Make tb_flush() thread safe

 bsd-user/main.c            |  16 +++++
 cpu-exec-common.c          | 157 +++++++++++++++++++++++++++++++++++++++++++++
 cpu-exec.c                 |  12 +---
 cpus.c                     | 108 ++++++-------------------------
 hw/i386/kvm/apic.c         |   3 +-
 hw/i386/kvmvapic.c         |   6 +-
 hw/ppc/ppce500_spin.c      |  31 +++------
 hw/ppc/spapr.c             |   6 +-
 hw/ppc/spapr_hcall.c       |  17 ++---
 include/exec/exec-all.h    |  31 +++++++++
 include/qemu/atomic.h      |   4 ++
 include/qom/cpu.h          |  30 ++++++---
 kvm-all.c                  |  21 ++----
 linux-user/main.c          |  94 +++++++++++++++++----------
 target-i386/helper.c       |  19 +++---
 target-i386/kvm.c          |   6 +-
 target-s390x/cpu.c         |   4 +-
 target-s390x/cpu.h         |   7 +-
 target-s390x/kvm.c         |  98 ++++++++++++++--------------
 target-s390x/misc_helper.c |   4 +-
 translate-all.c            |  17 +++--
 21 files changed, 420 insertions(+), 271 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 01/13] atomic: introduce atomic_dec_fetch.
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 02/13] cpus: pass CPUState to run_on_cpu helpers Alex Bennée
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov

Useful for counting down.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 include/qemu/atomic.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
index 7e13fca..560b1af 100644
--- a/include/qemu/atomic.h
+++ b/include/qemu/atomic.h
@@ -160,6 +160,8 @@
 #define atomic_fetch_and(ptr, n) __atomic_fetch_and(ptr, n, __ATOMIC_SEQ_CST)
 #define atomic_fetch_or(ptr, n)  __atomic_fetch_or(ptr, n, __ATOMIC_SEQ_CST)
 
+#define atomic_dec_fetch(ptr)  __atomic_sub_fetch(ptr, 1, __ATOMIC_SEQ_CST)
+
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)    ((void) __atomic_fetch_add(ptr, 1, __ATOMIC_SEQ_CST))
 #define atomic_dec(ptr)    ((void) __atomic_fetch_sub(ptr, 1, __ATOMIC_SEQ_CST))
@@ -355,6 +357,8 @@
 #define atomic_fetch_or        __sync_fetch_and_or
 #define atomic_cmpxchg         __sync_val_compare_and_swap
 
+#define atomic_dec_fetch(ptr)  __sync_sub_and_fetch(ptr, 1)
+
 /* And even shorter names that return void.  */
 #define atomic_inc(ptr)        ((void) __sync_fetch_and_add(ptr, 1))
 #define atomic_dec(ptr)        ((void) __sync_fetch_and_add(ptr, -1))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 02/13] cpus: pass CPUState to run_on_cpu helpers
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 01/13] atomic: introduce atomic_dec_fetch Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 03/13] cpus: Move common code out of {async_, }run_on_cpu() Alex Bennée
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Sergey Fedorov,
	Peter Crosthwaite, Eduardo Habkost, Michael S. Tsirkin,
	David Gibson, Alexander Graf, Marcelo Tosatti,
	Christian Borntraeger, Cornelia Huck, open list:PowerPC,
	open list:Overall

CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.

This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov:
 - eliminate more CPUState in user data;
 - remove unnecessary user data passing;
 - fix target-s390x/kvm.c and target-s390x/misc_helper.c]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c                     | 15 ++++---
 hw/i386/kvm/apic.c         |  3 +-
 hw/i386/kvmvapic.c         |  6 +--
 hw/ppc/ppce500_spin.c      | 31 +++++----------
 hw/ppc/spapr.c             |  6 +--
 hw/ppc/spapr_hcall.c       | 17 ++++----
 include/qom/cpu.h          |  8 ++--
 kvm-all.c                  | 21 ++++------
 target-i386/helper.c       | 19 ++++-----
 target-i386/kvm.c          |  6 +--
 target-s390x/cpu.c         |  4 +-
 target-s390x/cpu.h         |  7 +---
 target-s390x/kvm.c         | 98 +++++++++++++++++++++++-----------------------
 target-s390x/misc_helper.c |  4 +-
 14 files changed, 108 insertions(+), 137 deletions(-)

diff --git a/cpus.c b/cpus.c
index 84c3520..049c2d0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -551,9 +551,8 @@ static const VMStateDescription vmstate_timers = {
     }
 };
 
-static void cpu_throttle_thread(void *opaque)
+static void cpu_throttle_thread(CPUState *cpu, void *opaque)
 {
-    CPUState *cpu = opaque;
     double pct;
     double throttle_ratio;
     long sleeptime_ns;
@@ -583,7 +582,7 @@ static void cpu_throttle_timer_tick(void *opaque)
     }
     CPU_FOREACH(cpu) {
         if (!atomic_xchg(&cpu->throttle_thread_scheduled, 1)) {
-            async_run_on_cpu(cpu, cpu_throttle_thread, cpu);
+            async_run_on_cpu(cpu, cpu_throttle_thread, NULL);
         }
     }
 
@@ -911,12 +910,12 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
 
     if (qemu_cpu_is_self(cpu)) {
-        func(data);
+        func(cpu, data);
         return;
     }
 
@@ -944,12 +943,12 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
     }
 }
 
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item *wi;
 
     if (qemu_cpu_is_self(cpu)) {
-        func(data);
+        func(cpu, data);
         return;
     }
 
@@ -1000,7 +999,7 @@ static void flush_queued_work(CPUState *cpu)
             cpu->queued_work_last = NULL;
         }
         qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(wi->data);
+        wi->func(cpu, wi->data);
         qemu_mutex_lock(&cpu->work_mutex);
         if (wi->free) {
             g_free(wi);
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 2bd0de8..295b675 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -125,10 +125,9 @@ static void kvm_apic_vapic_base_update(APICCommonState *s)
     }
 }
 
-static void do_inject_external_nmi(void *data)
+static void do_inject_external_nmi(CPUState *cpu, void *data)
 {
     APICCommonState *s = data;
-    CPUState *cpu = CPU(s->cpu);
     uint32_t lvt;
     int ret;
 
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 3bf1ddd..1bc02fb 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -483,7 +483,7 @@ typedef struct VAPICEnableTPRReporting {
     bool enable;
 } VAPICEnableTPRReporting;
 
-static void vapic_do_enable_tpr_reporting(void *data)
+static void vapic_do_enable_tpr_reporting(CPUState *cpu, void *data)
 {
     VAPICEnableTPRReporting *info = data;
 
@@ -734,10 +734,10 @@ static void vapic_realize(DeviceState *dev, Error **errp)
     nb_option_roms++;
 }
 
-static void do_vapic_enable(void *data)
+static void do_vapic_enable(CPUState *cs, void *data)
 {
     VAPICROMState *s = data;
-    X86CPU *cpu = X86_CPU(first_cpu);
+    X86CPU *cpu = X86_CPU(cs);
 
     static const uint8_t enabled = 1;
     cpu_physical_memory_write(s->vapic_paddr + offsetof(VAPICState, enabled),
diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c
index 22c584e..8e16f65 100644
--- a/hw/ppc/ppce500_spin.c
+++ b/hw/ppc/ppce500_spin.c
@@ -54,11 +54,6 @@ typedef struct SpinState {
     SpinInfo spin[MAX_CPUS];
 } SpinState;
 
-typedef struct spin_kick {
-    PowerPCCPU *cpu;
-    SpinInfo *spin;
-} SpinKick;
-
 static void spin_reset(void *opaque)
 {
     SpinState *s = opaque;
@@ -89,16 +84,15 @@ static void mmubooke_create_initial_mapping(CPUPPCState *env,
     env->tlb_dirty = true;
 }
 
-static void spin_kick(void *data)
+static void spin_kick(CPUState *cs, void *data)
 {
-    SpinKick *kick = data;
-    CPUState *cpu = CPU(kick->cpu);
-    CPUPPCState *env = &kick->cpu->env;
-    SpinInfo *curspin = kick->spin;
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    CPUPPCState *env = &cpu->env;
+    SpinInfo *curspin = data;
     hwaddr map_size = 64 * 1024 * 1024;
     hwaddr map_start;
 
-    cpu_synchronize_state(cpu);
+    cpu_synchronize_state(cs);
     stl_p(&curspin->pir, env->spr[SPR_BOOKE_PIR]);
     env->nip = ldq_p(&curspin->addr) & (map_size - 1);
     env->gpr[3] = ldq_p(&curspin->r3);
@@ -112,10 +106,10 @@ static void spin_kick(void *data)
     map_start = ldq_p(&curspin->addr) & ~(map_size - 1);
     mmubooke_create_initial_mapping(env, 0, map_start, map_size);
 
-    cpu->halted = 0;
-    cpu->exception_index = -1;
-    cpu->stopped = false;
-    qemu_cpu_kick(cpu);
+    cs->halted = 0;
+    cs->exception_index = -1;
+    cs->stopped = false;
+    qemu_cpu_kick(cs);
 }
 
 static void spin_write(void *opaque, hwaddr addr, uint64_t value,
@@ -153,12 +147,7 @@ static void spin_write(void *opaque, hwaddr addr, uint64_t value,
 
     if (!(ldq_p(&curspin->addr) & 1)) {
         /* run CPU */
-        SpinKick kick = {
-            .cpu = POWERPC_CPU(cpu),
-            .spin = curspin,
-        };
-
-        run_on_cpu(cpu, spin_kick, &kick);
+        run_on_cpu(cpu, spin_kick, curspin);
     }
 }
 
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 7f33a1b..882a3c6 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2170,10 +2170,8 @@ static void spapr_machine_finalizefn(Object *obj)
     g_free(spapr->kvm_type);
 }
 
-static void ppc_cpu_do_nmi_on_cpu(void *arg)
+static void ppc_cpu_do_nmi_on_cpu(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
-
     cpu_synchronize_state(cs);
     ppc_cpu_do_system_reset(cs);
 }
@@ -2183,7 +2181,7 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     CPUState *cs;
 
     CPU_FOREACH(cs) {
-        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, cs);
+        async_run_on_cpu(cs, ppc_cpu_do_nmi_on_cpu, NULL);
     }
 }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 73af112..e5eca67 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -13,19 +13,18 @@
 #include "kvm_ppc.h"
 
 struct SPRSyncState {
-    CPUState *cs;
     int spr;
     target_ulong value;
     target_ulong mask;
 };
 
-static void do_spr_sync(void *arg)
+static void do_spr_sync(CPUState *cs, void *arg)
 {
     struct SPRSyncState *s = arg;
-    PowerPCCPU *cpu = POWERPC_CPU(s->cs);
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
     CPUPPCState *env = &cpu->env;
 
-    cpu_synchronize_state(s->cs);
+    cpu_synchronize_state(cs);
     env->spr[s->spr] &= ~s->mask;
     env->spr[s->spr] |= s->value;
 }
@@ -34,7 +33,6 @@ static void set_spr(CPUState *cs, int spr, target_ulong value,
                     target_ulong mask)
 {
     struct SPRSyncState s = {
-        .cs = cs,
         .spr = spr,
         .value = value,
         .mask = mask
@@ -907,17 +905,17 @@ static target_ulong cas_get_option_vector(int vector, target_ulong table)
 }
 
 typedef struct {
-    PowerPCCPU *cpu;
     uint32_t cpu_version;
     Error *err;
 } SetCompatState;
 
-static void do_set_compat(void *arg)
+static void do_set_compat(CPUState *cs, void *arg)
 {
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
     SetCompatState *s = arg;
 
-    cpu_synchronize_state(CPU(s->cpu));
-    ppc_set_compat(s->cpu, s->cpu_version, &s->err);
+    cpu_synchronize_state(cs);
+    ppc_set_compat(cpu, s->cpu_version, &s->err);
 }
 
 #define get_compat_level(cpuver) ( \
@@ -1013,7 +1011,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     if (old_cpu_version != cpu_version) {
         CPU_FOREACH(cs) {
             SetCompatState s = {
-                .cpu = POWERPC_CPU(cs),
                 .cpu_version = cpu_version,
                 .err = NULL,
             };
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index cbcd64c..bd76a27 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -232,9 +232,11 @@ struct kvm_run;
 #define TB_JMP_CACHE_SIZE (1 << TB_JMP_CACHE_BITS)
 
 /* work queue */
+typedef void (*run_on_cpu_func)(CPUState *cpu, void *data);
+
 struct qemu_work_item {
     struct qemu_work_item *next;
-    void (*func)(void *data);
+    run_on_cpu_func func;
     void *data;
     int done;
     bool free;
@@ -623,7 +625,7 @@ bool cpu_is_stopped(CPUState *cpu);
  *
  * Schedules the function @func for execution on the vCPU @cpu.
  */
-void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
  * async_run_on_cpu:
@@ -633,7 +635,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
  *
  * Schedules the function @func for execution on the vCPU @cpu asynchronously.
  */
-void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data);
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
  * qemu_get_cpu:
diff --git a/kvm-all.c b/kvm-all.c
index ef81ca5..f44fdd0 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1839,10 +1839,8 @@ void kvm_flush_coalesced_mmio_buffer(void)
     s->coalesced_flush_in_progress = false;
 }
 
-static void do_kvm_cpu_synchronize_state(void *arg)
+static void do_kvm_cpu_synchronize_state(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     if (!cpu->kvm_vcpu_dirty) {
         kvm_arch_get_registers(cpu);
         cpu->kvm_vcpu_dirty = true;
@@ -1852,34 +1850,30 @@ static void do_kvm_cpu_synchronize_state(void *arg)
 void kvm_cpu_synchronize_state(CPUState *cpu)
 {
     if (!cpu->kvm_vcpu_dirty) {
-        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu);
+        run_on_cpu(cpu, do_kvm_cpu_synchronize_state, NULL);
     }
 }
 
-static void do_kvm_cpu_synchronize_post_reset(void *arg)
+static void do_kvm_cpu_synchronize_post_reset(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_arch_put_registers(cpu, KVM_PUT_RESET_STATE);
     cpu->kvm_vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_reset(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, cpu);
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_reset, NULL);
 }
 
-static void do_kvm_cpu_synchronize_post_init(void *arg)
+static void do_kvm_cpu_synchronize_post_init(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_arch_put_registers(cpu, KVM_PUT_FULL_STATE);
     cpu->kvm_vcpu_dirty = false;
 }
 
 void kvm_cpu_synchronize_post_init(CPUState *cpu)
 {
-    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, cpu);
+    run_on_cpu(cpu, do_kvm_cpu_synchronize_post_init, NULL);
 }
 
 int kvm_cpu_exec(CPUState *cpu)
@@ -2221,7 +2215,7 @@ struct kvm_set_guest_debug_data {
     int err;
 };
 
-static void kvm_invoke_set_guest_debug(void *data)
+static void kvm_invoke_set_guest_debug(CPUState *unused_cpu, void *data)
 {
     struct kvm_set_guest_debug_data *dbg_data = data;
 
@@ -2239,7 +2233,6 @@ int kvm_update_guest_debug(CPUState *cpu, unsigned long reinject_trap)
         data.dbg.control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
     }
     kvm_arch_update_guest_debug(cpu, &data.dbg);
-    data.cpu = cpu;
 
     run_on_cpu(cpu, kvm_invoke_set_guest_debug, &data);
     return data.err;
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 1c250b8..9bc961b 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1113,7 +1113,6 @@ out:
 
 typedef struct MCEInjectionParams {
     Monitor *mon;
-    X86CPU *cpu;
     int bank;
     uint64_t status;
     uint64_t mcg_status;
@@ -1122,14 +1121,14 @@ typedef struct MCEInjectionParams {
     int flags;
 } MCEInjectionParams;
 
-static void do_inject_x86_mce(void *data)
+static void do_inject_x86_mce(CPUState *cs, void *data)
 {
     MCEInjectionParams *params = data;
-    CPUX86State *cenv = &params->cpu->env;
-    CPUState *cpu = CPU(params->cpu);
+    X86CPU *cpu = X86_CPU(cs);
+    CPUX86State *cenv = &cpu->env;
     uint64_t *banks = cenv->mce_banks + 4 * params->bank;
 
-    cpu_synchronize_state(cpu);
+    cpu_synchronize_state(cs);
 
     /*
      * If there is an MCE exception being processed, ignore this SRAO MCE
@@ -1149,7 +1148,7 @@ static void do_inject_x86_mce(void *data)
         if ((cenv->mcg_cap & MCG_CTL_P) && cenv->mcg_ctl != ~(uint64_t)0) {
             monitor_printf(params->mon,
                            "CPU %d: Uncorrected error reporting disabled\n",
-                           cpu->cpu_index);
+                           cs->cpu_index);
             return;
         }
 
@@ -1161,7 +1160,7 @@ static void do_inject_x86_mce(void *data)
             monitor_printf(params->mon,
                            "CPU %d: Uncorrected error reporting disabled for"
                            " bank %d\n",
-                           cpu->cpu_index, params->bank);
+                           cs->cpu_index, params->bank);
             return;
         }
 
@@ -1170,7 +1169,7 @@ static void do_inject_x86_mce(void *data)
             monitor_printf(params->mon,
                            "CPU %d: Previous MCE still in progress, raising"
                            " triple fault\n",
-                           cpu->cpu_index);
+                           cs->cpu_index);
             qemu_log_mask(CPU_LOG_RESET, "Triple fault\n");
             qemu_system_reset_request();
             return;
@@ -1182,7 +1181,7 @@ static void do_inject_x86_mce(void *data)
         banks[3] = params->misc;
         cenv->mcg_status = params->mcg_status;
         banks[1] = params->status;
-        cpu_interrupt(cpu, CPU_INTERRUPT_MCE);
+        cpu_interrupt(cs, CPU_INTERRUPT_MCE);
     } else if (!(banks[1] & MCI_STATUS_VAL)
                || !(banks[1] & MCI_STATUS_UC)) {
         if (banks[1] & MCI_STATUS_VAL) {
@@ -1204,7 +1203,6 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
     CPUX86State *cenv = &cpu->env;
     MCEInjectionParams params = {
         .mon = mon,
-        .cpu = cpu,
         .bank = bank,
         .status = status,
         .mcg_status = mcg_status,
@@ -1245,7 +1243,6 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
             if (other_cs == cs) {
                 continue;
             }
-            params.cpu = X86_CPU(other_cs);
             run_on_cpu(other_cs, do_inject_x86_mce, &params);
         }
     }
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 9697e16..9d5e523 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -156,10 +156,8 @@ static int kvm_get_tsc(CPUState *cs)
     return 0;
 }
 
-static inline void do_kvm_synchronize_tsc(void *arg)
+static inline void do_kvm_synchronize_tsc(CPUState *cpu, void *arg)
 {
-    CPUState *cpu = arg;
-
     kvm_get_tsc(cpu);
 }
 
@@ -169,7 +167,7 @@ void kvm_synchronize_all_tsc(void)
 
     if (kvm_enabled()) {
         CPU_FOREACH(cpu) {
-            run_on_cpu(cpu, do_kvm_synchronize_tsc, cpu);
+            run_on_cpu(cpu, do_kvm_synchronize_tsc, NULL);
         }
     }
 }
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index e43e2d6..4f09c64 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -188,7 +188,7 @@ static void s390_cpu_machine_reset_cb(void *opaque)
 {
     S390CPU *cpu = opaque;
 
-    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, CPU(cpu));
+    run_on_cpu(CPU(cpu), s390_do_cpu_full_reset, NULL);
 }
 #endif
 
@@ -238,7 +238,7 @@ static void s390_cpu_realizefn(DeviceState *dev, Error **errp)
     s390_cpu_gdb_init(cs);
     qemu_init_vcpu(cs);
 #if !defined(CONFIG_USER_ONLY)
-    run_on_cpu(cs, s390_do_cpu_full_reset, cs);
+    run_on_cpu(cs, s390_do_cpu_full_reset, NULL);
 #else
     cpu_reset(cs);
 #endif
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index c216bda..0bdb1be 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -499,17 +499,14 @@ static inline hwaddr decode_basedisp_s(CPUS390XState *env, uint32_t ipb,
 #define decode_basedisp_rs decode_basedisp_s
 
 /* helper functions for run_on_cpu() */
-static inline void s390_do_cpu_reset(void *arg)
+static inline void s390_do_cpu_reset(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
     S390CPUClass *scc = S390_CPU_GET_CLASS(cs);
 
     scc->cpu_reset(cs);
 }
-static inline void s390_do_cpu_full_reset(void *arg)
+static inline void s390_do_cpu_full_reset(CPUState *cs, void *arg)
 {
-    CPUState *cs = arg;
-
     cpu_reset(cs);
 }
 
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 80ac621..53deb44 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -1354,7 +1354,6 @@ static int handle_diag(S390CPU *cpu, struct kvm_run *run, uint32_t ipb)
 }
 
 typedef struct SigpInfo {
-    S390CPU *cpu;
     uint64_t param;
     int cc;
     uint64_t *status_reg;
@@ -1367,38 +1366,40 @@ static void set_sigp_status(SigpInfo *si, uint64_t status)
     si->cc = SIGP_CC_STATUS_STORED;
 }
 
-static void sigp_start(void *arg)
+static void sigp_start(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
 
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
         return;
     }
 
-    s390_cpu_set_state(CPU_STATE_OPERATING, si->cpu);
+    s390_cpu_set_state(CPU_STATE_OPERATING, cpu);
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_stop(void *arg)
+static void sigp_stop(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     struct kvm_s390_irq irq = {
         .type = KVM_S390_SIGP_STOP,
     };
 
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_OPERATING) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_OPERATING) {
         si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
         return;
     }
 
     /* disabled wait - sleeping in user space */
-    if (CPU(si->cpu)->halted) {
-        s390_cpu_set_state(CPU_STATE_STOPPED, si->cpu);
+    if (cs->halted) {
+        s390_cpu_set_state(CPU_STATE_STOPPED, cpu);
     } else {
         /* execute the stop function */
-        si->cpu->env.sigp_order = SIGP_STOP;
-        kvm_s390_vcpu_interrupt(si->cpu, &irq);
+        cpu->env.sigp_order = SIGP_STOP;
+        kvm_s390_vcpu_interrupt(cpu, &irq);
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
@@ -1465,56 +1466,58 @@ static int kvm_s390_store_status(S390CPU *cpu, hwaddr addr, bool store_arch)
     return 0;
 }
 
-static void sigp_stop_and_store_status(void *arg)
+static void sigp_stop_and_store_status(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     struct kvm_s390_irq irq = {
         .type = KVM_S390_SIGP_STOP,
     };
 
     /* disabled wait - sleeping in user space */
-    if (s390_cpu_get_state(si->cpu) == CPU_STATE_OPERATING &&
-        CPU(si->cpu)->halted) {
-        s390_cpu_set_state(CPU_STATE_STOPPED, si->cpu);
+    if (s390_cpu_get_state(cpu) == CPU_STATE_OPERATING && cs->halted) {
+        s390_cpu_set_state(CPU_STATE_STOPPED, cpu);
     }
 
-    switch (s390_cpu_get_state(si->cpu)) {
+    switch (s390_cpu_get_state(cpu)) {
     case CPU_STATE_OPERATING:
-        si->cpu->env.sigp_order = SIGP_STOP_STORE_STATUS;
-        kvm_s390_vcpu_interrupt(si->cpu, &irq);
+        cpu->env.sigp_order = SIGP_STOP_STORE_STATUS;
+        kvm_s390_vcpu_interrupt(cpu, &irq);
         /* store will be performed when handling the stop intercept */
         break;
     case CPU_STATE_STOPPED:
         /* already stopped, just store the status */
-        cpu_synchronize_state(CPU(si->cpu));
-        kvm_s390_store_status(si->cpu, KVM_S390_STORE_STATUS_DEF_ADDR, true);
+        cpu_synchronize_state(cs);
+        kvm_s390_store_status(cpu, KVM_S390_STORE_STATUS_DEF_ADDR, true);
         break;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_store_status_at_address(void *arg)
+static void sigp_store_status_at_address(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     uint32_t address = si->param & 0x7ffffe00u;
 
     /* cpu has to be stopped */
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         set_sigp_status(si, SIGP_STAT_INCORRECT_STATE);
         return;
     }
 
-    cpu_synchronize_state(CPU(si->cpu));
+    cpu_synchronize_state(cs);
 
-    if (kvm_s390_store_status(si->cpu, address, false)) {
+    if (kvm_s390_store_status(cpu, address, false)) {
         set_sigp_status(si, SIGP_STAT_INVALID_PARAMETER);
         return;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_store_adtl_status(void *arg)
+static void sigp_store_adtl_status(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
 
     if (!kvm_check_extension(kvm_state, KVM_CAP_S390_VECTOR_REGISTERS)) {
@@ -1523,7 +1526,7 @@ static void sigp_store_adtl_status(void *arg)
     }
 
     /* cpu has to be stopped */
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         set_sigp_status(si, SIGP_STAT_INCORRECT_STATE);
         return;
     }
@@ -1534,31 +1537,32 @@ static void sigp_store_adtl_status(void *arg)
         return;
     }
 
-    cpu_synchronize_state(CPU(si->cpu));
+    cpu_synchronize_state(cs);
 
-    if (kvm_s390_store_adtl_status(si->cpu, si->param)) {
+    if (kvm_s390_store_adtl_status(cpu, si->param)) {
         set_sigp_status(si, SIGP_STAT_INVALID_PARAMETER);
         return;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_restart(void *arg)
+static void sigp_restart(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     struct kvm_s390_irq irq = {
         .type = KVM_S390_RESTART,
     };
 
-    switch (s390_cpu_get_state(si->cpu)) {
+    switch (s390_cpu_get_state(cpu)) {
     case CPU_STATE_STOPPED:
         /* the restart irq has to be delivered prior to any other pending irq */
-        cpu_synchronize_state(CPU(si->cpu));
-        do_restart_interrupt(&si->cpu->env);
-        s390_cpu_set_state(CPU_STATE_OPERATING, si->cpu);
+        cpu_synchronize_state(cs);
+        do_restart_interrupt(&cpu->env);
+        s390_cpu_set_state(CPU_STATE_OPERATING, cpu);
         break;
     case CPU_STATE_OPERATING:
-        kvm_s390_vcpu_interrupt(si->cpu, &irq);
+        kvm_s390_vcpu_interrupt(cpu, &irq);
         break;
     }
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
@@ -1566,20 +1570,18 @@ static void sigp_restart(void *arg)
 
 int kvm_s390_cpu_restart(S390CPU *cpu)
 {
-    SigpInfo si = {
-        .cpu = cpu,
-    };
+    SigpInfo si = {};
 
     run_on_cpu(CPU(cpu), sigp_restart, &si);
     DPRINTF("DONE: KVM cpu restart: %p\n", &cpu->env);
     return 0;
 }
 
-static void sigp_initial_cpu_reset(void *arg)
+static void sigp_initial_cpu_reset(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
+    S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
     SigpInfo *si = arg;
-    CPUState *cs = CPU(si->cpu);
-    S390CPUClass *scc = S390_CPU_GET_CLASS(si->cpu);
 
     cpu_synchronize_state(cs);
     scc->initial_cpu_reset(cs);
@@ -1587,11 +1589,11 @@ static void sigp_initial_cpu_reset(void *arg)
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_cpu_reset(void *arg)
+static void sigp_cpu_reset(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
+    S390CPUClass *scc = S390_CPU_GET_CLASS(cpu);
     SigpInfo *si = arg;
-    CPUState *cs = CPU(si->cpu);
-    S390CPUClass *scc = S390_CPU_GET_CLASS(si->cpu);
 
     cpu_synchronize_state(cs);
     scc->cpu_reset(cs);
@@ -1599,12 +1601,13 @@ static void sigp_cpu_reset(void *arg)
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
-static void sigp_set_prefix(void *arg)
+static void sigp_set_prefix(CPUState *cs, void *arg)
 {
+    S390CPU *cpu = S390_CPU(cs);
     SigpInfo *si = arg;
     uint32_t addr = si->param & 0x7fffe000u;
 
-    cpu_synchronize_state(CPU(si->cpu));
+    cpu_synchronize_state(cs);
 
     if (!address_space_access_valid(&address_space_memory, addr,
                                     sizeof(struct LowCore), false)) {
@@ -1613,13 +1616,13 @@ static void sigp_set_prefix(void *arg)
     }
 
     /* cpu has to be stopped */
-    if (s390_cpu_get_state(si->cpu) != CPU_STATE_STOPPED) {
+    if (s390_cpu_get_state(cpu) != CPU_STATE_STOPPED) {
         set_sigp_status(si, SIGP_STAT_INCORRECT_STATE);
         return;
     }
 
-    si->cpu->env.psa = addr;
-    cpu_synchronize_post_init(CPU(si->cpu));
+    cpu->env.psa = addr;
+    cpu_synchronize_post_init(cs);
     si->cc = SIGP_CC_ORDER_CODE_ACCEPTED;
 }
 
@@ -1627,7 +1630,6 @@ static int handle_sigp_single_dst(S390CPU *dst_cpu, uint8_t order,
                                   uint64_t param, uint64_t *status_reg)
 {
     SigpInfo si = {
-        .cpu = dst_cpu,
         .param = param,
         .status_reg = status_reg,
     };
diff --git a/target-s390x/misc_helper.c b/target-s390x/misc_helper.c
index 86da194..4df2ec6 100644
--- a/target-s390x/misc_helper.c
+++ b/target-s390x/misc_helper.c
@@ -126,7 +126,7 @@ static int modified_clear_reset(S390CPU *cpu)
     pause_all_vcpus();
     cpu_synchronize_all_states();
     CPU_FOREACH(t) {
-        run_on_cpu(t, s390_do_cpu_full_reset, t);
+        run_on_cpu(t, s390_do_cpu_full_reset, NULL);
     }
     s390_cmma_reset();
     subsystem_reset();
@@ -145,7 +145,7 @@ static int load_normal_reset(S390CPU *cpu)
     pause_all_vcpus();
     cpu_synchronize_all_states();
     CPU_FOREACH(t) {
-        run_on_cpu(t, s390_do_cpu_reset, t);
+        run_on_cpu(t, s390_do_cpu_reset, NULL);
     }
     s390_cmma_reset();
     subsystem_reset();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 03/13] cpus: Move common code out of {async_, }run_on_cpu()
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 01/13] atomic: introduce atomic_dec_fetch Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 02/13] cpus: pass CPUState to run_on_cpu helpers Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 04/13] cpus: Wrap mutex used to protect CPU work Alex Bennée
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

Move the code common between run_on_cpu() and async_run_on_cpu() into a
new function queue_work_on_cpu().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c | 42 ++++++++++++++++++------------------------
 1 file changed, 18 insertions(+), 24 deletions(-)

diff --git a/cpus.c b/cpus.c
index 049c2d0..04687c8 100644
--- a/cpus.c
+++ b/cpus.c
@@ -910,6 +910,22 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
@@ -923,18 +939,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi.data = data;
     wi.free = false;
 
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = &wi;
-    } else {
-        cpu->queued_work_last->next = &wi;
-    }
-    cpu->queued_work_last = &wi;
-    wi.next = NULL;
-    wi.done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    queue_work_on_cpu(cpu, &wi);
     while (!atomic_mb_read(&wi.done)) {
         CPUState *self_cpu = current_cpu;
 
@@ -957,18 +962,7 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi->data = data;
     wi->free = true;
 
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
+    queue_work_on_cpu(cpu, wi);
 }
 
 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 04/13] cpus: Wrap mutex used to protect CPU work
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (2 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 03/13] cpus: Move common code out of {async_, }run_on_cpu() Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 05/13] cpus: Rename flush_queued_work() Alex Bennée
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

This will be useful to enable CPU work on user mode emulation.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 04687c8..f80ed2a 100644
--- a/cpus.c
+++ b/cpus.c
@@ -910,6 +910,11 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
+static QemuMutex *qemu_get_cpu_work_mutex(void)
+{
+    return &qemu_global_mutex;
+}
+
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
     qemu_mutex_lock(&cpu->work_mutex);
@@ -943,7 +948,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     while (!atomic_mb_read(&wi.done)) {
         CPUState *self_cpu = current_cpu;
 
-        qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
+        qemu_cond_wait(&qemu_work_cond, qemu_get_cpu_work_mutex());
         current_cpu = self_cpu;
     }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 05/13] cpus: Rename flush_queued_work()
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (3 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 04/13] cpus: Wrap mutex used to protect CPU work Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 06/13] linux-user: Use QemuMutex and QemuCond Alex Bennée
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

To avoid possible confusion, rename flush_queued_work() to
process_queued_cpu_work().

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/cpus.c b/cpus.c
index f80ed2a..51fd8c1 100644
--- a/cpus.c
+++ b/cpus.c
@@ -982,7 +982,7 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
-static void flush_queued_work(CPUState *cpu)
+static void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
 
@@ -1017,7 +1017,7 @@ static void qemu_wait_io_event_common(CPUState *cpu)
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
-    flush_queued_work(cpu);
+    process_queued_cpu_work(cpu);
     cpu->thread_kicked = false;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 06/13] linux-user: Use QemuMutex and QemuCond
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (4 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 05/13] cpus: Rename flush_queued_work() Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 07/13] linux-user: Rework exclusive operation mechanism Alex Bennée
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

Convert pthread_mutex_t and pthread_cond_t to QemuMutex and QemuCond.
This will allow to make some locks and conditional variables common
between user and system mode emulation.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 linux-user/main.c | 53 +++++++++++++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 22 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 462e820..c80caeb 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -111,17 +111,25 @@ int cpu_get_pic_interrupt(CPUX86State *env)
    We don't require a full sync, only that no cpus are executing guest code.
    The alternative is to map target atomic ops onto host equivalents,
    which requires quite a lot of per host/target work.  */
-static pthread_mutex_t cpu_list_mutex = PTHREAD_MUTEX_INITIALIZER;
-static pthread_mutex_t exclusive_lock = PTHREAD_MUTEX_INITIALIZER;
-static pthread_cond_t exclusive_cond = PTHREAD_COND_INITIALIZER;
-static pthread_cond_t exclusive_resume = PTHREAD_COND_INITIALIZER;
+static QemuMutex cpu_list_mutex;
+static QemuMutex exclusive_lock;
+static QemuCond exclusive_cond;
+static QemuCond exclusive_resume;
 static int pending_cpus;
 
+void qemu_init_cpu_loop(void)
+{
+    qemu_mutex_init(&cpu_list_mutex);
+    qemu_mutex_init(&exclusive_lock);
+    qemu_cond_init(&exclusive_cond);
+    qemu_cond_init(&exclusive_resume);
+}
+
 /* Make sure everything is in a consistent state for calling fork().  */
 void fork_start(void)
 {
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     mmap_fork_start();
 }
 
@@ -138,14 +146,14 @@ void fork_end(int child)
             }
         }
         pending_cpus = 0;
-        pthread_mutex_init(&exclusive_lock, NULL);
-        pthread_mutex_init(&cpu_list_mutex, NULL);
-        pthread_cond_init(&exclusive_cond, NULL);
-        pthread_cond_init(&exclusive_resume, NULL);
+        qemu_mutex_init(&exclusive_lock);
+        qemu_mutex_init(&cpu_list_mutex);
+        qemu_cond_init(&exclusive_cond);
+        qemu_cond_init(&exclusive_resume);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork(thread_cpu);
     } else {
-        pthread_mutex_unlock(&exclusive_lock);
+        qemu_mutex_unlock(&exclusive_lock);
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
     }
 }
@@ -155,7 +163,7 @@ void fork_end(int child)
 static inline void exclusive_idle(void)
 {
     while (pending_cpus) {
-        pthread_cond_wait(&exclusive_resume, &exclusive_lock);
+        qemu_cond_wait(&exclusive_resume, &exclusive_lock);
     }
 }
 
@@ -165,7 +173,7 @@ static inline void start_exclusive(void)
 {
     CPUState *other_cpu;
 
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     exclusive_idle();
 
     pending_cpus = 1;
@@ -177,7 +185,7 @@ static inline void start_exclusive(void)
         }
     }
     if (pending_cpus > 1) {
-        pthread_cond_wait(&exclusive_cond, &exclusive_lock);
+        qemu_cond_wait(&exclusive_cond, &exclusive_lock);
     }
 }
 
@@ -185,42 +193,42 @@ static inline void start_exclusive(void)
 static inline void __attribute__((unused)) end_exclusive(void)
 {
     pending_cpus = 0;
-    pthread_cond_broadcast(&exclusive_resume);
-    pthread_mutex_unlock(&exclusive_lock);
+    qemu_cond_broadcast(&exclusive_resume);
+    qemu_mutex_unlock(&exclusive_lock);
 }
 
 /* Wait for exclusive ops to finish, and begin cpu execution.  */
 static inline void cpu_exec_start(CPUState *cpu)
 {
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     exclusive_idle();
     cpu->running = true;
-    pthread_mutex_unlock(&exclusive_lock);
+    qemu_mutex_unlock(&exclusive_lock);
 }
 
 /* Mark cpu as not executing, and release pending exclusive ops.  */
 static inline void cpu_exec_end(CPUState *cpu)
 {
-    pthread_mutex_lock(&exclusive_lock);
+    qemu_mutex_lock(&exclusive_lock);
     cpu->running = false;
     if (pending_cpus > 1) {
         pending_cpus--;
         if (pending_cpus == 1) {
-            pthread_cond_signal(&exclusive_cond);
+            qemu_cond_signal(&exclusive_cond);
         }
     }
     exclusive_idle();
-    pthread_mutex_unlock(&exclusive_lock);
+    qemu_mutex_unlock(&exclusive_lock);
 }
 
 void cpu_list_lock(void)
 {
-    pthread_mutex_lock(&cpu_list_mutex);
+    qemu_mutex_lock(&cpu_list_mutex);
 }
 
 void cpu_list_unlock(void)
 {
-    pthread_mutex_unlock(&cpu_list_mutex);
+    qemu_mutex_unlock(&cpu_list_mutex);
 }
 
 
@@ -4222,6 +4230,7 @@ int main(int argc, char **argv, char **envp)
     int ret;
     int execfd;
 
+    qemu_init_cpu_loop();
     module_call_init(MODULE_INIT_QOM);
 
     if ((envlist = envlist_create()) == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 07/13] linux-user: Rework exclusive operation mechanism
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (5 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 06/13] linux-user: Use QemuMutex and QemuCond Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 08/13] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Alex Bennée
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

A single variable 'pending_cpus' was used for both counting currently
running CPUs and for signalling the pending exclusive operation request.

To prepare for supporting operations which requires a quiescent state,
like translation buffer flush, it is useful to keep a counter of
currently running CPUs always up to date.

Use a separate variable 'tcg_pending_threads' to count for currently
running CPUs and a separate variable 'exclusive_pending' to indicate
that there's an exclusive operation pending.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 linux-user/main.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index c80caeb..60ca69f 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -115,7 +115,8 @@ static QemuMutex cpu_list_mutex;
 static QemuMutex exclusive_lock;
 static QemuCond exclusive_cond;
 static QemuCond exclusive_resume;
-static int pending_cpus;
+static bool exclusive_pending;
+static int tcg_pending_threads;
 
 void qemu_init_cpu_loop(void)
 {
@@ -145,7 +146,8 @@ void fork_end(int child)
                 QTAILQ_REMOVE(&cpus, cpu, node);
             }
         }
-        pending_cpus = 0;
+        tcg_pending_threads = 0;
+        exclusive_pending = false;
         qemu_mutex_init(&exclusive_lock);
         qemu_mutex_init(&cpu_list_mutex);
         qemu_cond_init(&exclusive_cond);
@@ -162,7 +164,7 @@ void fork_end(int child)
    must be held.  */
 static inline void exclusive_idle(void)
 {
-    while (pending_cpus) {
+    while (exclusive_pending) {
         qemu_cond_wait(&exclusive_resume, &exclusive_lock);
     }
 }
@@ -176,15 +178,14 @@ static inline void start_exclusive(void)
     qemu_mutex_lock(&exclusive_lock);
     exclusive_idle();
 
-    pending_cpus = 1;
+    exclusive_pending = true;
     /* Make all other cpus stop executing.  */
     CPU_FOREACH(other_cpu) {
         if (other_cpu->running) {
-            pending_cpus++;
             cpu_exit(other_cpu);
         }
     }
-    if (pending_cpus > 1) {
+    while (tcg_pending_threads) {
         qemu_cond_wait(&exclusive_cond, &exclusive_lock);
     }
 }
@@ -192,7 +193,7 @@ static inline void start_exclusive(void)
 /* Finish an exclusive operation.  */
 static inline void __attribute__((unused)) end_exclusive(void)
 {
-    pending_cpus = 0;
+    exclusive_pending = false;
     qemu_cond_broadcast(&exclusive_resume);
     qemu_mutex_unlock(&exclusive_lock);
 }
@@ -203,6 +204,7 @@ static inline void cpu_exec_start(CPUState *cpu)
     qemu_mutex_lock(&exclusive_lock);
     exclusive_idle();
     cpu->running = true;
+    tcg_pending_threads++;
     qemu_mutex_unlock(&exclusive_lock);
 }
 
@@ -211,11 +213,9 @@ static inline void cpu_exec_end(CPUState *cpu)
 {
     qemu_mutex_lock(&exclusive_lock);
     cpu->running = false;
-    if (pending_cpus > 1) {
-        pending_cpus--;
-        if (pending_cpus == 1) {
-            qemu_cond_signal(&exclusive_cond);
-        }
+    tcg_pending_threads--;
+    if (!tcg_pending_threads) {
+        qemu_cond_signal(&exclusive_cond);
     }
     exclusive_idle();
     qemu_mutex_unlock(&exclusive_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 08/13] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick()
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (6 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 07/13] linux-user: Rework exclusive operation mechanism Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 09/13] linux-user: Support CPU work queue Alex Bennée
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 linux-user/main.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/linux-user/main.c b/linux-user/main.c
index 60ca69f..f5ddf96 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -3788,6 +3788,16 @@ void cpu_loop(CPUTLGState *env)
 
 THREAD CPUState *thread_cpu;
 
+bool qemu_cpu_is_self(CPUState *cpu)
+{
+    return thread_cpu == cpu;
+}
+
+void qemu_cpu_kick(CPUState *cpu)
+{
+    cpu_exit(cpu);
+}
+
 void task_settid(TaskState *ts)
 {
     if (ts->ts_tid == 0) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 09/13] linux-user: Support CPU work queue
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (7 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 08/13] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 10/13] bsd-user: " Alex Bennée
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

Make CPU work core functions common between system and user-mode
emulation. User-mode does not have BQL, so process_queued_cpu_work() is
protected by 'exclusive_lock'.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec-common.c       | 85 ++++++++++++++++++++++++++++++++++++++++++++++++
 cpus.c                  | 86 +------------------------------------------------
 include/exec/exec-all.h | 17 ++++++++++
 linux-user/main.c       |  8 +++++
 4 files changed, 111 insertions(+), 85 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 0cb4ae6..a233f01 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -77,3 +77,88 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
     }
     siglongjmp(cpu->jmp_env, 1);
 }
+
+QemuCond qemu_work_cond;
+
+static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
+{
+    qemu_mutex_lock(&cpu->work_mutex);
+    if (cpu->queued_work_first == NULL) {
+        cpu->queued_work_first = wi;
+    } else {
+        cpu->queued_work_last->next = wi;
+    }
+    cpu->queued_work_last = wi;
+    wi->next = NULL;
+    wi->done = false;
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    qemu_cpu_kick(cpu);
+}
+
+void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi.func = func;
+    wi.data = data;
+    wi.free = false;
+
+    queue_work_on_cpu(cpu, &wi);
+    while (!atomic_mb_read(&wi.done)) {
+        CPUState *self_cpu = current_cpu;
+
+        qemu_cond_wait(&qemu_work_cond, qemu_get_cpu_work_mutex());
+        current_cpu = self_cpu;
+    }
+}
+
+void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    if (qemu_cpu_is_self(cpu)) {
+        func(cpu, data);
+        return;
+    }
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+void process_queued_cpu_work(CPUState *cpu)
+{
+    struct qemu_work_item *wi;
+
+    if (cpu->queued_work_first == NULL) {
+        return;
+    }
+
+    qemu_mutex_lock(&cpu->work_mutex);
+    while (cpu->queued_work_first != NULL) {
+        wi = cpu->queued_work_first;
+        cpu->queued_work_first = wi->next;
+        if (!cpu->queued_work_first) {
+            cpu->queued_work_last = NULL;
+        }
+        qemu_mutex_unlock(&cpu->work_mutex);
+        wi->func(cpu, wi->data);
+        qemu_mutex_lock(&cpu->work_mutex);
+        if (wi->free) {
+            g_free(wi);
+        } else {
+            atomic_mb_set(&wi->done, true);
+        }
+    }
+    qemu_mutex_unlock(&cpu->work_mutex);
+    qemu_cond_broadcast(&qemu_work_cond);
+}
diff --git a/cpus.c b/cpus.c
index 51fd8c1..282d7e3 100644
--- a/cpus.c
+++ b/cpus.c
@@ -896,7 +896,6 @@ static QemuThread io_thread;
 static QemuCond qemu_cpu_cond;
 /* system init */
 static QemuCond qemu_pause_cond;
-static QemuCond qemu_work_cond;
 
 void qemu_init_cpu_loop(void)
 {
@@ -910,66 +909,11 @@ void qemu_init_cpu_loop(void)
     qemu_thread_get_self(&io_thread);
 }
 
-static QemuMutex *qemu_get_cpu_work_mutex(void)
+QemuMutex *qemu_get_cpu_work_mutex(void)
 {
     return &qemu_global_mutex;
 }
 
-static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
-{
-    qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
-    }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
-    qemu_mutex_unlock(&cpu->work_mutex);
-
-    qemu_cpu_kick(cpu);
-}
-
-void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
-{
-    struct qemu_work_item wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
-    wi.func = func;
-    wi.data = data;
-    wi.free = false;
-
-    queue_work_on_cpu(cpu, &wi);
-    while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
-        qemu_cond_wait(&qemu_work_cond, qemu_get_cpu_work_mutex());
-        current_cpu = self_cpu;
-    }
-}
-
-void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
-{
-    struct qemu_work_item *wi;
-
-    if (qemu_cpu_is_self(cpu)) {
-        func(cpu, data);
-        return;
-    }
-
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-
-    queue_work_on_cpu(cpu, wi);
-}
-
 static void qemu_kvm_destroy_vcpu(CPUState *cpu)
 {
     if (kvm_destroy_vcpu(cpu) < 0) {
@@ -982,34 +926,6 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
-static void process_queued_cpu_work(CPUState *cpu)
-{
-    struct qemu_work_item *wi;
-
-    if (cpu->queued_work_first == NULL) {
-        return;
-    }
-
-    qemu_mutex_lock(&cpu->work_mutex);
-    while (cpu->queued_work_first != NULL) {
-        wi = cpu->queued_work_first;
-        cpu->queued_work_first = wi->next;
-        if (!cpu->queued_work_first) {
-            cpu->queued_work_last = NULL;
-        }
-        qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(cpu, wi->data);
-        qemu_mutex_lock(&cpu->work_mutex);
-        if (wi->free) {
-            g_free(wi);
-        } else {
-            atomic_mb_set(&wi->done, true);
-        }
-    }
-    qemu_mutex_unlock(&cpu->work_mutex);
-    qemu_cond_broadcast(&qemu_work_cond);
-}
-
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
     if (cpu->stop) {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index bc0bcc5..e4dfd3c 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -409,4 +409,21 @@ extern int singlestep;
 extern CPUState *tcg_current_cpu;
 extern bool exit_request;
 
+/**
+ * qemu_work_cond - condition to wait for CPU work items completion
+ */
+extern QemuCond qemu_work_cond;
+
+/**
+ * qemu_get_cpu_work_mutex() - get the mutex which protects CPU work execution
+ *
+ * Return: A pointer to the mutex.
+ */
+QemuMutex *qemu_get_cpu_work_mutex(void);
+/**
+ * process_queued_cpu_work() - process all items on CPU work queue
+ * @cpu: The CPU which work queue to process.
+ */
+void process_queued_cpu_work(CPUState *cpu);
+
 #endif
diff --git a/linux-user/main.c b/linux-user/main.c
index f5ddf96..13ac77d 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -124,6 +124,7 @@ void qemu_init_cpu_loop(void)
     qemu_mutex_init(&exclusive_lock);
     qemu_cond_init(&exclusive_cond);
     qemu_cond_init(&exclusive_resume);
+    qemu_cond_init(&qemu_work_cond);
 }
 
 /* Make sure everything is in a consistent state for calling fork().  */
@@ -152,6 +153,7 @@ void fork_end(int child)
         qemu_mutex_init(&cpu_list_mutex);
         qemu_cond_init(&exclusive_cond);
         qemu_cond_init(&exclusive_resume);
+        qemu_cond_init(&qemu_work_cond);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork(thread_cpu);
     } else {
@@ -160,6 +162,11 @@ void fork_end(int child)
     }
 }
 
+QemuMutex *qemu_get_cpu_work_mutex(void)
+{
+    return &exclusive_lock;
+}
+
 /* Wait for pending exclusive operations to complete.  The exclusive lock
    must be held.  */
 static inline void exclusive_idle(void)
@@ -218,6 +225,7 @@ static inline void cpu_exec_end(CPUState *cpu)
         qemu_cond_signal(&exclusive_cond);
     }
     exclusive_idle();
+    process_queued_cpu_work(cpu);
     qemu_mutex_unlock(&exclusive_lock);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 10/13] bsd-user: Support CPU work queue
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (8 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 09/13] linux-user: Support CPU work queue Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu() Alex Bennée
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée

From: Sergey Fedorov <serge.fdrv@gmail.com>

It is a minimalistic support because bsd-linux claims to be _not_
threadsafe.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 bsd-user/main.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 315ba1d..24d33c9 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -66,6 +66,19 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 }
 #endif
 
+void qemu_init_cpu_loop(void)
+{
+    /* We need to do this becuase process_queued_cpu_work() calls
+     * qemu_cond_broadcast() on it
+     */
+    qemu_cond_init(&qemu_work_cond);
+}
+
+QemuMutex *qemu_get_cpu_work_mutex(void)
+{
+    return NULL; /* it will never be used */
+}
+
 /* These are no-ops because we are not threadsafe.  */
 static inline void cpu_exec_start(CPUArchState *env)
 {
@@ -73,6 +86,7 @@ static inline void cpu_exec_start(CPUArchState *env)
 
 static inline void cpu_exec_end(CPUArchState *env)
 {
+    process_queued_cpu_work(cpu);
 }
 
 static inline void start_exclusive(void)
@@ -746,6 +760,7 @@ int main(int argc, char **argv)
     if (argc <= 1)
         usage();
 
+    qemu_init_cpu_loop();
     module_call_init(MODULE_INIT_QOM);
 
     if ((envlist = envlist_create()) == NULL) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (9 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 10/13] bsd-user: " Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 19:22   ` Emilio G. Cota
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 12/13] tcg: Make tb_flush() thread safe Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Alex Bennée
  12 siblings, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite, Riku Voipio

From: Sergey Fedorov <serge.fdrv@gmail.com>

This patch is based on the ideas found in work of KONRAD Frederic [1],
Alex Bennée [2], and Alvise Rigo [3].

This mechanism allows to perform an operation safely in a quiescent
state. Quiescent state means: (1) no vCPU is running and (2) BQL in
system-mode or 'exclusive_lock' in user-mode emulation is held while
performing the operation. This functionality is required e.g. for
performing translation buffer flush safely in multi-threaded user-mode
emulation.

The existing CPU work queue is used to schedule such safe operations. A
new 'safe' flag is added into struct qemu_work_item to designate the
special requirements of the safe work. An operation in a quiescent sate
can be scheduled by using async_safe_run_on_cpu() function which is
actually the same as sync_run_on_cpu() except that it marks the queued
work item with the 'safe' flag set to true. Given this flag set
queue_work_on_cpu() atomically increments 'safe_work_pending' global
counter and kicks all the CPUs instead of just the target CPU as in case
of normal CPU work. This allows to force other CPUs to exit their
execution loops and wait in wait_safe_cpu_work() function for the safe
work to finish. When a CPU drains its work queue, if it encounters a
work item marked as safe, it first waits for other CPUs to exit their
execution loops, then called the work item function, and finally
decrements 'safe_work_pending' counter with signalling other CPUs to let
them continue execution as soon as all pending safe work items have been
processed. The 'tcg_pending_threads' protected by 'exclusive_lock' in
user-mode or by 'qemu_global_mutex' in system-mode emulation is used to
determine if there is any CPU run and wait for it to exit the execution
loop. The fairness of all the CPU work queues is ensured by draining all
the pending safe work items before any CPU can run.

[1] http://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg01128.html
[2] http://lists.nongnu.org/archive/html/qemu-devel/2016-04/msg02531.html
[3] http://lists.nongnu.org/archive/html/qemu-devel/2016-05/msg04792.html

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v5 (ajb)
  - signal another thread if sleeping and tcg_pending_threads == 0
  - add can_wait_for_safe() to skip in single thread SoftMMU mode
---
 bsd-user/main.c         |  3 ++-
 cpu-exec-common.c       | 67 ++++++++++++++++++++++++++++++++++++++++++++++++-
 cpus.c                  | 20 +++++++++++++++
 include/exec/exec-all.h | 14 +++++++++++
 include/qom/cpu.h       | 14 +++++++++++
 linux-user/main.c       | 13 +++++-----
 6 files changed, 123 insertions(+), 8 deletions(-)

diff --git a/bsd-user/main.c b/bsd-user/main.c
index 24d33c9..6f6a03c 100644
--- a/bsd-user/main.c
+++ b/bsd-user/main.c
@@ -69,9 +69,10 @@ int cpu_get_pic_interrupt(CPUX86State *env)
 void qemu_init_cpu_loop(void)
 {
     /* We need to do this becuase process_queued_cpu_work() calls
-     * qemu_cond_broadcast() on it
+     * qemu_cond_broadcast() on them
      */
     qemu_cond_init(&qemu_work_cond);
+    qemu_cond_init(&qemu_safe_work_cond);
 }
 
 QemuMutex *qemu_get_cpu_work_mutex(void)
diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index a233f01..6d5da15 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -25,6 +25,7 @@
 
 bool exit_request;
 CPUState *tcg_current_cpu;
+int tcg_pending_threads;
 
 /* exit the current TB, but without causing any exception to be raised */
 void cpu_loop_exit_noexc(CPUState *cpu)
@@ -79,6 +80,35 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
 }
 
 QemuCond qemu_work_cond;
+QemuCond qemu_safe_work_cond;
+QemuCond qemu_exclusive_cond;
+
+static int safe_work_pending;
+
+#ifdef CONFIG_USER_ONLY
+#define can_wait_for_safe() (1)
+#else
+/*
+ * We never sleep in SoftMMU emulation because we would deadlock as
+ * all vCPUs are in the same thread. This will change for MTTCG
+ * however.
+ */
+#define can_wait_for_safe() (0)
+#endif
+
+void wait_safe_cpu_work(void)
+{
+    while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {
+        /*
+         * If there is pending safe work and no pending threads we
+         * need to signal another thread to start its work.
+         */
+        if (tcg_pending_threads == 0) {
+            qemu_cond_signal(&qemu_exclusive_cond);
+        }
+        qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
+    }
+}
 
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
@@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
     cpu->queued_work_last = wi;
     wi->next = NULL;
     wi->done = false;
+    if (wi->safe) {
+        atomic_inc(&safe_work_pending);
+    }
     qemu_mutex_unlock(&cpu->work_mutex);
 
-    qemu_cpu_kick(cpu);
+    if (!wi->safe) {
+        qemu_cpu_kick(cpu);
+    } else {
+        CPU_FOREACH(cpu) {
+            qemu_cpu_kick(cpu);
+        }
+    }
 }
 
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
@@ -108,6 +147,7 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi.func = func;
     wi.data = data;
     wi.free = false;
+    wi.safe = false;
 
     queue_work_on_cpu(cpu, &wi);
     while (!atomic_mb_read(&wi.done)) {
@@ -131,6 +171,20 @@ void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
     wi->func = func;
     wi->data = data;
     wi->free = true;
+    wi->safe = false;
+
+    queue_work_on_cpu(cpu, wi);
+}
+
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
+{
+    struct qemu_work_item *wi;
+
+    wi = g_malloc0(sizeof(struct qemu_work_item));
+    wi->func = func;
+    wi->data = data;
+    wi->free = true;
+    wi->safe = true;
 
     queue_work_on_cpu(cpu, wi);
 }
@@ -150,9 +204,20 @@ void process_queued_cpu_work(CPUState *cpu)
         if (!cpu->queued_work_first) {
             cpu->queued_work_last = NULL;
         }
+        if (wi->safe) {
+            while (tcg_pending_threads) {
+                qemu_cond_wait(&qemu_exclusive_cond,
+                               qemu_get_cpu_work_mutex());
+            }
+        }
         qemu_mutex_unlock(&cpu->work_mutex);
         wi->func(cpu, wi->data);
         qemu_mutex_lock(&cpu->work_mutex);
+        if (wi->safe) {
+            if (!atomic_dec_fetch(&safe_work_pending)) {
+                qemu_cond_broadcast(&qemu_safe_work_cond);
+            }
+        }
         if (wi->free) {
             g_free(wi);
         } else {
diff --git a/cpus.c b/cpus.c
index 282d7e3..b712204 100644
--- a/cpus.c
+++ b/cpus.c
@@ -903,6 +903,8 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_cpu_cond);
     qemu_cond_init(&qemu_pause_cond);
     qemu_cond_init(&qemu_work_cond);
+    qemu_cond_init(&qemu_safe_work_cond);
+    qemu_cond_init(&qemu_exclusive_cond);
     qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
 
@@ -926,6 +928,20 @@ static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 {
 }
 
+/* called with qemu_global_mutex held */
+static inline void tcg_cpu_exec_start(CPUState *cpu)
+{
+    tcg_pending_threads++;
+}
+
+/* called with qemu_global_mutex held */
+static inline void tcg_cpu_exec_end(CPUState *cpu)
+{
+    if (--tcg_pending_threads) {
+        qemu_cond_broadcast(&qemu_exclusive_cond);
+    }
+}
+
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
     if (cpu->stop) {
@@ -950,6 +966,8 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
     CPU_FOREACH(cpu) {
         qemu_wait_io_event_common(cpu);
     }
+
+    wait_safe_cpu_work();
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1485,7 +1503,9 @@ static void tcg_exec_all(void)
                           (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
 
         if (cpu_can_run(cpu)) {
+            tcg_cpu_exec_start(cpu);
             r = tcg_cpu_exec(cpu);
+            tcg_cpu_exec_end(cpu);
             if (r == EXCP_DEBUG) {
                 cpu_handle_guest_debug(cpu);
                 break;
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e4dfd3c..ed5b9c8 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -407,12 +407,22 @@ extern int singlestep;
 
 /* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
 extern CPUState *tcg_current_cpu;
+extern int tcg_pending_threads;
 extern bool exit_request;
 
 /**
  * qemu_work_cond - condition to wait for CPU work items completion
  */
 extern QemuCond qemu_work_cond;
+/**
+ * qemu_safe_work_cond - condition to wait for safe CPU work items completion
+ */
+extern QemuCond qemu_safe_work_cond;
+/**
+ * qemu_exclusive_cond - condition to wait for all TCG threads to be out of
+ *                       guest code execution loop
+ */
+extern QemuCond qemu_exclusive_cond;
 
 /**
  * qemu_get_cpu_work_mutex() - get the mutex which protects CPU work execution
@@ -425,5 +435,9 @@ QemuMutex *qemu_get_cpu_work_mutex(void);
  * @cpu: The CPU which work queue to process.
  */
 void process_queued_cpu_work(CPUState *cpu);
+/**
+ * wait_safe_cpu_work() - wait until all safe CPU work items has processed
+ */
+void wait_safe_cpu_work(void);
 
 #endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index bd76a27..bc24514 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -240,6 +240,7 @@ struct qemu_work_item {
     void *data;
     int done;
     bool free;
+    bool safe;
 };
 
 /**
@@ -638,6 +639,19 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
 
 /**
+ * async_safe_run_on_cpu:
+ * @cpu: The vCPU to run on.
+ * @func: The function to be executed.
+ * @data: Data to pass to the function.
+ *
+ * Schedules the function @func for execution on the vCPU @cpu asynchronously
+ * and in quiescent state. Quiescent state means: (1) all other vCPUs are
+ * halted and (2) #qemu_global_mutex (a.k.a. BQL) in system-mode or
+ * #exclusive_lock in user-mode emulation is held while @func is executing.
+ */
+void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data);
+
+/**
  * qemu_get_cpu:
  * @index: The CPUState@cpu_index value of the CPU to obtain.
  *
diff --git a/linux-user/main.c b/linux-user/main.c
index 13ac77d..e7d7dd4 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -113,18 +113,17 @@ int cpu_get_pic_interrupt(CPUX86State *env)
    which requires quite a lot of per host/target work.  */
 static QemuMutex cpu_list_mutex;
 static QemuMutex exclusive_lock;
-static QemuCond exclusive_cond;
 static QemuCond exclusive_resume;
 static bool exclusive_pending;
-static int tcg_pending_threads;
 
 void qemu_init_cpu_loop(void)
 {
     qemu_mutex_init(&cpu_list_mutex);
     qemu_mutex_init(&exclusive_lock);
-    qemu_cond_init(&exclusive_cond);
     qemu_cond_init(&exclusive_resume);
     qemu_cond_init(&qemu_work_cond);
+    qemu_cond_init(&qemu_safe_work_cond);
+    qemu_cond_init(&qemu_exclusive_cond);
 }
 
 /* Make sure everything is in a consistent state for calling fork().  */
@@ -151,9 +150,10 @@ void fork_end(int child)
         exclusive_pending = false;
         qemu_mutex_init(&exclusive_lock);
         qemu_mutex_init(&cpu_list_mutex);
-        qemu_cond_init(&exclusive_cond);
         qemu_cond_init(&exclusive_resume);
         qemu_cond_init(&qemu_work_cond);
+        qemu_cond_init(&qemu_safe_work_cond);
+        qemu_cond_init(&qemu_exclusive_cond);
         qemu_mutex_init(&tcg_ctx.tb_ctx.tb_lock);
         gdbserver_fork(thread_cpu);
     } else {
@@ -193,7 +193,7 @@ static inline void start_exclusive(void)
         }
     }
     while (tcg_pending_threads) {
-        qemu_cond_wait(&exclusive_cond, &exclusive_lock);
+        qemu_cond_wait(&qemu_exclusive_cond, &exclusive_lock);
     }
 }
 
@@ -222,10 +222,11 @@ static inline void cpu_exec_end(CPUState *cpu)
     cpu->running = false;
     tcg_pending_threads--;
     if (!tcg_pending_threads) {
-        qemu_cond_signal(&exclusive_cond);
+        qemu_cond_broadcast(&qemu_exclusive_cond);
     }
     exclusive_idle();
     process_queued_cpu_work(cpu);
+    wait_safe_cpu_work();
     qemu_mutex_unlock(&exclusive_lock);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 12/13] tcg: Make tb_flush() thread safe
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (10 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu() Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Alex Bennée
  12 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Alex Bennée,
	Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

Use async_safe_run_on_cpu() to make tb_flush() thread safe.

It can happen that multiple threads schedule a safe work to flush the
translation buffer. To keep statistics and debugging output sane, always
check if the translation buffer has already been flushed.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: minor re-base fixes]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec.c        | 12 ++----------
 include/qom/cpu.h |  2 --
 translate-all.c   | 17 +++++++++++------
 3 files changed, 13 insertions(+), 18 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 49d9f34..f8cfdbd 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -203,20 +203,16 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
                              TranslationBlock *orig_tb, bool ignore_icount)
 {
     TranslationBlock *tb;
-    bool old_tb_flushed;
 
     /* Should never happen.
        We only end up here when an existing TB is too long.  */
     if (max_cycles > CF_COUNT_MASK)
         max_cycles = CF_COUNT_MASK;
 
-    old_tb_flushed = cpu->tb_flushed;
-    cpu->tb_flushed = false;
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
-    tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
-    cpu->tb_flushed |= old_tb_flushed;
+    tb->orig_tb = orig_tb;
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
     cpu_tb_exec(cpu, tb);
@@ -337,10 +333,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
             tb_lock();
             have_tb_lock = true;
         }
-        /* Check if translation buffer has been flushed */
-        if (cpu->tb_flushed) {
-            cpu->tb_flushed = false;
-        } else if (!tb->invalid) {
+        if (!tb->invalid) {
             tb_add_jump(last_tb, tb_exit, tb);
         }
     }
@@ -605,7 +598,6 @@ int cpu_exec(CPUState *cpu)
                 break;
             }
 
-            atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
             for(;;) {
                 cpu_handle_interrupt(cpu, &last_tb);
                 tb = tb_find(cpu, last_tb, tb_exit);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index bc24514..dee5ad0 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -260,7 +260,6 @@ struct qemu_work_item {
  * @crash_occurred: Indicates the OS reported a crash (panic) for this CPU
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
- * @tb_flushed: Indicates the translation buffer has been flushed.
  * @singlestep_enabled: Flags for single-stepping.
  * @icount_extra: Instructions until next timer event.
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
@@ -313,7 +312,6 @@ struct CPUState {
     bool unplug;
     bool crash_occurred;
     bool exit_request;
-    bool tb_flushed;
     uint32_t interrupt_request;
     int singlestep_enabled;
     int64_t icount_extra;
diff --git a/translate-all.c b/translate-all.c
index 1ce05ff..60527ad 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -832,9 +832,11 @@ static void page_flush_tb(void)
 }
 
 /* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
-void tb_flush(CPUState *cpu)
+static void do_tb_flush(CPUState *cpu, void *data)
 {
+    if (tcg_ctx.tb_ctx.nb_tbs == 0) {
+        return;
+    }
 #if defined(DEBUG_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
@@ -853,7 +855,6 @@ void tb_flush(CPUState *cpu)
         for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
             atomic_set(&cpu->tb_jmp_cache[i], NULL);
         }
-        atomic_mb_set(&cpu->tb_flushed, true);
     }
 
     tcg_ctx.tb_ctx.nb_tbs = 0;
@@ -866,6 +867,11 @@ void tb_flush(CPUState *cpu)
     tcg_ctx.tb_ctx.tb_flush_count++;
 }
 
+void tb_flush(CPUState *cpu)
+{
+    async_safe_run_on_cpu(cpu, do_tb_flush, NULL);
+}
+
 #ifdef DEBUG_TB_CHECK
 
 static void
@@ -1170,9 +1176,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  buffer_overflow:
         /* flush must be done */
         tb_flush(cpu);
-        /* cannot fail at this point */
-        tb = tb_alloc(pc);
-        assert(tb != NULL);
+        mmap_unlock();
+        cpu_loop_exit(cpu);
     }
 
     gen_code_buf = tcg_ctx.code_gen_ptr;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray
  2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
                   ` (11 preceding siblings ...)
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 12/13] tcg: Make tb_flush() thread safe Alex Bennée
@ 2016-08-02 17:27 ` Alex Bennée
  2016-08-02 17:36   ` Alex Bennée
                     ` (2 more replies)
  12 siblings, 3 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:27 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Under times of high memory stress the additional small mallocs by a
linked list are source of potential memory fragmentation. As we have
worked hard to avoid mallocs elsewhere when queuing work we might as
well do the same for the list. We convert the lists to a auto-resizeing
GArray which will re-size in steps of powers of 2.

In theory the GArray could be mostly lockless but at the moment we keep
the locking scheme as before. However another advantage of having an array
means we can allocate a new one and process the old one without bouncing
the lock.

This will also be more cache friendly as we don't need to bounce between
cache lines as we work through the saved data.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec-common.c | 109 +++++++++++++++++++++++++++++-------------------------
 cpus.c            |   2 +-
 include/qom/cpu.h |   6 +--
 3 files changed, 61 insertions(+), 56 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 6d5da15..745d973 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -113,17 +113,18 @@ void wait_safe_cpu_work(void)
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
     qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
+
+    if (!cpu->queued_work) {
+        cpu->queued_work = g_array_sized_new(true, true,
+                             sizeof(struct qemu_work_item), 16);
     }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
+    trace_queue_work_on_cpu(cpu->cpu_index, wi->safe, cpu->queued_work->len);
+
+    g_array_append_val(cpu->queued_work, *wi);
     if (wi->safe) {
         atomic_inc(&safe_work_pending);
     }
+
     qemu_mutex_unlock(&cpu->work_mutex);
 
     if (!wi->safe) {
@@ -138,6 +139,7 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
+    bool done = false;
 
     if (qemu_cpu_is_self(cpu)) {
         func(cpu, data);
@@ -146,11 +148,11 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 
     wi.func = func;
     wi.data = data;
-    wi.free = false;
     wi.safe = false;
+    wi.done = &done;
 
     queue_work_on_cpu(cpu, &wi);
-    while (!atomic_mb_read(&wi.done)) {
+    while (!atomic_mb_read(&done)) {
         CPUState *self_cpu = current_cpu;
 
         qemu_cond_wait(&qemu_work_cond, qemu_get_cpu_work_mutex());
@@ -160,70 +162,75 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
-    struct qemu_work_item *wi;
+    struct qemu_work_item wi;
 
     if (qemu_cpu_is_self(cpu)) {
         func(cpu, data);
         return;
     }
 
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-    wi->safe = false;
+    wi.func = func;
+    wi.data = data;
+    wi.safe = false;
+    wi.done = NULL;
 
-    queue_work_on_cpu(cpu, wi);
+    queue_work_on_cpu(cpu, &wi);
 }
 
 void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
-    struct qemu_work_item *wi;
+    struct qemu_work_item wi;
 
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-    wi->safe = true;
+    wi.func = func;
+    wi.data = data;
+    wi.safe = true;
+    wi.done = NULL;
 
-    queue_work_on_cpu(cpu, wi);
+    queue_work_on_cpu(cpu, &wi);
 }
 
 void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
-
-    if (cpu->queued_work_first == NULL) {
-        return;
-    }
+    GArray *work_list = NULL;
+    int i;
 
     qemu_mutex_lock(&cpu->work_mutex);
-    while (cpu->queued_work_first != NULL) {
-        wi = cpu->queued_work_first;
-        cpu->queued_work_first = wi->next;
-        if (!cpu->queued_work_first) {
-            cpu->queued_work_last = NULL;
-        }
-        if (wi->safe) {
-            while (tcg_pending_threads) {
-                qemu_cond_wait(&qemu_exclusive_cond,
-                               qemu_get_cpu_work_mutex());
+
+    work_list = cpu->queued_work;
+    cpu->queued_work = NULL;
+
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    if (work_list) {
+
+        g_assert(work_list->len > 0);
+
+        for (i = 0; i < work_list->len; i++) {
+            wi = &g_array_index(work_list, struct qemu_work_item, i);
+
+            if (wi->safe) {
+                while (tcg_pending_threads) {
+                    qemu_cond_wait(&qemu_exclusive_cond,
+                                   qemu_get_cpu_work_mutex());
+                }
             }
-        }
-        qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(cpu, wi->data);
-        qemu_mutex_lock(&cpu->work_mutex);
-        if (wi->safe) {
-            if (!atomic_dec_fetch(&safe_work_pending)) {
-                qemu_cond_broadcast(&qemu_safe_work_cond);
+
+            wi->func(cpu, wi->data);
+
+            if (wi->safe) {
+                if (!atomic_dec_fetch(&safe_work_pending)) {
+                    qemu_cond_broadcast(&qemu_safe_work_cond);
+                }
+            }
+
+            if (wi->done) {
+                atomic_mb_set(wi->done, true);
             }
         }
-        if (wi->free) {
-            g_free(wi);
-        } else {
-            atomic_mb_set(&wi->done, true);
-        }
+
+        trace_process_queued_cpu_work(cpu->cpu_index, work_list->len);
+        qemu_cond_broadcast(&qemu_work_cond);
+        g_array_free(work_list, true);
     }
-    qemu_mutex_unlock(&cpu->work_mutex);
-    qemu_cond_broadcast(&qemu_work_cond);
 }
diff --git a/cpus.c b/cpus.c
index b712204..1ea60e4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -88,7 +88,7 @@ bool cpu_is_stopped(CPUState *cpu)
 
 static bool cpu_thread_is_idle(CPUState *cpu)
 {
-    if (cpu->stop || cpu->queued_work_first) {
+    if (cpu->stop || cpu->queued_work) {
         return false;
     }
     if (cpu_is_stopped(cpu)) {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index dee5ad0..060a473 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -235,11 +235,9 @@ struct kvm_run;
 typedef void (*run_on_cpu_func)(CPUState *cpu, void *data);
 
 struct qemu_work_item {
-    struct qemu_work_item *next;
     run_on_cpu_func func;
     void *data;
-    int done;
-    bool free;
+    bool *done;
     bool safe;
 };
 
@@ -318,7 +316,7 @@ struct CPUState {
     sigjmp_buf jmp_env;
 
     QemuMutex work_mutex;
-    struct qemu_work_item *queued_work_first, *queued_work_last;
+    GArray *queued_work;
 
     CPUAddressSpace *cpu_ases;
     int num_ases;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Alex Bennée
@ 2016-08-02 17:36   ` Alex Bennée
  2016-08-02 17:42   ` Alex Bennée
  2016-08-02 18:53   ` Emilio G. Cota
  2 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:36 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Under times of high memory stress the additional small mallocs by a
linked list are source of potential memory fragmentation. As we have
worked hard to avoid mallocs elsewhere when queuing work we might as
well do the same for the list. We convert the lists to a auto-resizeing
GArray which will re-size in steps of powers of 2.

In theory the GArray could be mostly lockless but at the moment we keep
the locking scheme as before. However another advantage of having an array
means we can allocate a new one and process the old one without bouncing
the lock.

This will also be more cache friendly as we don't need to bounce between
cache lines as we work through the saved data.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec-common.c | 107 ++++++++++++++++++++++++++++--------------------------
 cpus.c            |   2 +-
 include/qom/cpu.h |   6 +--
 3 files changed, 59 insertions(+), 56 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 6d5da15..0bec55a 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -113,17 +113,17 @@ void wait_safe_cpu_work(void)
 static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 {
     qemu_mutex_lock(&cpu->work_mutex);
-    if (cpu->queued_work_first == NULL) {
-        cpu->queued_work_first = wi;
-    } else {
-        cpu->queued_work_last->next = wi;
+
+    if (!cpu->queued_work) {
+        cpu->queued_work = g_array_sized_new(true, true,
+                             sizeof(struct qemu_work_item), 16);
     }
-    cpu->queued_work_last = wi;
-    wi->next = NULL;
-    wi->done = false;
+
+    g_array_append_val(cpu->queued_work, *wi);
     if (wi->safe) {
         atomic_inc(&safe_work_pending);
     }
+
     qemu_mutex_unlock(&cpu->work_mutex);
 
     if (!wi->safe) {
@@ -138,6 +138,7 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
 void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
     struct qemu_work_item wi;
+    bool done = false;
 
     if (qemu_cpu_is_self(cpu)) {
         func(cpu, data);
@@ -146,11 +147,11 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 
     wi.func = func;
     wi.data = data;
-    wi.free = false;
     wi.safe = false;
+    wi.done = &done;
 
     queue_work_on_cpu(cpu, &wi);
-    while (!atomic_mb_read(&wi.done)) {
+    while (!atomic_mb_read(&done)) {
         CPUState *self_cpu = current_cpu;
 
         qemu_cond_wait(&qemu_work_cond, qemu_get_cpu_work_mutex());
@@ -160,70 +161,74 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 
 void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
-    struct qemu_work_item *wi;
+    struct qemu_work_item wi;
 
     if (qemu_cpu_is_self(cpu)) {
         func(cpu, data);
         return;
     }
 
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-    wi->safe = false;
+    wi.func = func;
+    wi.data = data;
+    wi.safe = false;
+    wi.done = NULL;
 
-    queue_work_on_cpu(cpu, wi);
+    queue_work_on_cpu(cpu, &wi);
 }
 
 void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
 {
-    struct qemu_work_item *wi;
+    struct qemu_work_item wi;
 
-    wi = g_malloc0(sizeof(struct qemu_work_item));
-    wi->func = func;
-    wi->data = data;
-    wi->free = true;
-    wi->safe = true;
+    wi.func = func;
+    wi.data = data;
+    wi.safe = true;
+    wi.done = NULL;
 
-    queue_work_on_cpu(cpu, wi);
+    queue_work_on_cpu(cpu, &wi);
 }
 
 void process_queued_cpu_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
-
-    if (cpu->queued_work_first == NULL) {
-        return;
-    }
+    GArray *work_list = NULL;
+    int i;
 
     qemu_mutex_lock(&cpu->work_mutex);
-    while (cpu->queued_work_first != NULL) {
-        wi = cpu->queued_work_first;
-        cpu->queued_work_first = wi->next;
-        if (!cpu->queued_work_first) {
-            cpu->queued_work_last = NULL;
-        }
-        if (wi->safe) {
-            while (tcg_pending_threads) {
-                qemu_cond_wait(&qemu_exclusive_cond,
-                               qemu_get_cpu_work_mutex());
+
+    work_list = cpu->queued_work;
+    cpu->queued_work = NULL;
+
+    qemu_mutex_unlock(&cpu->work_mutex);
+
+    if (work_list) {
+
+        g_assert(work_list->len > 0);
+
+        for (i = 0; i < work_list->len; i++) {
+            wi = &g_array_index(work_list, struct qemu_work_item, i);
+
+            if (wi->safe) {
+                while (tcg_pending_threads) {
+                    qemu_cond_wait(&qemu_exclusive_cond,
+                                   qemu_get_cpu_work_mutex());
+                }
             }
-        }
-        qemu_mutex_unlock(&cpu->work_mutex);
-        wi->func(cpu, wi->data);
-        qemu_mutex_lock(&cpu->work_mutex);
-        if (wi->safe) {
-            if (!atomic_dec_fetch(&safe_work_pending)) {
-                qemu_cond_broadcast(&qemu_safe_work_cond);
+
+            wi->func(cpu, wi->data);
+
+            if (wi->safe) {
+                if (!atomic_dec_fetch(&safe_work_pending)) {
+                    qemu_cond_broadcast(&qemu_safe_work_cond);
+                }
+            }
+
+            if (wi->done) {
+                atomic_mb_set(wi->done, true);
             }
         }
-        if (wi->free) {
-            g_free(wi);
-        } else {
-            atomic_mb_set(&wi->done, true);
-        }
+
+        qemu_cond_broadcast(&qemu_work_cond);
+        g_array_free(work_list, true);
     }
-    qemu_mutex_unlock(&cpu->work_mutex);
-    qemu_cond_broadcast(&qemu_work_cond);
 }
diff --git a/cpus.c b/cpus.c
index b712204..1ea60e4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -88,7 +88,7 @@ bool cpu_is_stopped(CPUState *cpu)
 
 static bool cpu_thread_is_idle(CPUState *cpu)
 {
-    if (cpu->stop || cpu->queued_work_first) {
+    if (cpu->stop || cpu->queued_work) {
         return false;
     }
     if (cpu_is_stopped(cpu)) {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index dee5ad0..060a473 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -235,11 +235,9 @@ struct kvm_run;
 typedef void (*run_on_cpu_func)(CPUState *cpu, void *data);
 
 struct qemu_work_item {
-    struct qemu_work_item *next;
     run_on_cpu_func func;
     void *data;
-    int done;
-    bool free;
+    bool *done;
     bool safe;
 };
 
@@ -318,7 +316,7 @@ struct CPUState {
     sigjmp_buf jmp_env;
 
     QemuMutex work_mutex;
-    struct qemu_work_item *queued_work_first, *queued_work_last;
+    GArray *queued_work;
 
     CPUAddressSpace *cpu_ases;
     int num_ases;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Alex Bennée
  2016-08-02 17:36   ` Alex Bennée
@ 2016-08-02 17:42   ` Alex Bennée
  2016-08-02 18:53   ` Emilio G. Cota
  2 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-02 17:42 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Alex Bennée <alex.bennee@linaro.org> writes:

> Under times of high memory stress the additional small mallocs by a
> linked list are source of potential memory fragmentation. As we have
> worked hard to avoid mallocs elsewhere when queuing work we might as
> well do the same for the list. We convert the lists to a auto-resizeing
> GArray which will re-size in steps of powers of 2.
<snip>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 6d5da15..745d973 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -113,17 +113,18 @@ void wait_safe_cpu_work(void)
>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>  {
>      qemu_mutex_lock(&cpu->work_mutex);
> -    if (cpu->queued_work_first == NULL) {
> -        cpu->queued_work_first = wi;
> -    } else {
> -        cpu->queued_work_last->next = wi;
> +
> +    if (!cpu->queued_work) {
> +        cpu->queued_work = g_array_sized_new(true, true,
> +                             sizeof(struct qemu_work_item), 16);
>      }
> -    cpu->queued_work_last = wi;
> -    wi->next = NULL;
> -    wi->done = false;
> +    trace_queue_work_on_cpu(cpu->cpu_index, wi->safe,
> cpu->queued_work->len);

Oops, this was left over from testing, I've posted an updated version of
the patch.

> +
> +    g_array_append_val(cpu->queued_work, *wi);
>      if (wi->safe) {
>          atomic_inc(&safe_work_pending);
>      }
> +
>      qemu_mutex_unlock(&cpu->work_mutex);
>
>      if (!wi->safe) {
> @@ -138,6 +139,7 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>  void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
>      struct qemu_work_item wi;
> +    bool done = false;
>
>      if (qemu_cpu_is_self(cpu)) {
>          func(cpu, data);
> @@ -146,11 +148,11 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>
>      wi.func = func;
>      wi.data = data;
> -    wi.free = false;
>      wi.safe = false;
> +    wi.done = &done;
>
>      queue_work_on_cpu(cpu, &wi);
> -    while (!atomic_mb_read(&wi.done)) {
> +    while (!atomic_mb_read(&done)) {
>          CPUState *self_cpu = current_cpu;
>
>          qemu_cond_wait(&qemu_work_cond, qemu_get_cpu_work_mutex());
> @@ -160,70 +162,75 @@ void run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>
>  void async_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
> -    struct qemu_work_item *wi;
> +    struct qemu_work_item wi;
>
>      if (qemu_cpu_is_self(cpu)) {
>          func(cpu, data);
>          return;
>      }
>
> -    wi = g_malloc0(sizeof(struct qemu_work_item));
> -    wi->func = func;
> -    wi->data = data;
> -    wi->free = true;
> -    wi->safe = false;
> +    wi.func = func;
> +    wi.data = data;
> +    wi.safe = false;
> +    wi.done = NULL;
>
> -    queue_work_on_cpu(cpu, wi);
> +    queue_work_on_cpu(cpu, &wi);
>  }
>
>  void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func func, void *data)
>  {
> -    struct qemu_work_item *wi;
> +    struct qemu_work_item wi;
>
> -    wi = g_malloc0(sizeof(struct qemu_work_item));
> -    wi->func = func;
> -    wi->data = data;
> -    wi->free = true;
> -    wi->safe = true;
> +    wi.func = func;
> +    wi.data = data;
> +    wi.safe = true;
> +    wi.done = NULL;
>
> -    queue_work_on_cpu(cpu, wi);
> +    queue_work_on_cpu(cpu, &wi);
>  }
>
>  void process_queued_cpu_work(CPUState *cpu)
>  {
>      struct qemu_work_item *wi;
> -
> -    if (cpu->queued_work_first == NULL) {
> -        return;
> -    }
> +    GArray *work_list = NULL;
> +    int i;
>
>      qemu_mutex_lock(&cpu->work_mutex);
> -    while (cpu->queued_work_first != NULL) {
> -        wi = cpu->queued_work_first;
> -        cpu->queued_work_first = wi->next;
> -        if (!cpu->queued_work_first) {
> -            cpu->queued_work_last = NULL;
> -        }
> -        if (wi->safe) {
> -            while (tcg_pending_threads) {
> -                qemu_cond_wait(&qemu_exclusive_cond,
> -                               qemu_get_cpu_work_mutex());
> +
> +    work_list = cpu->queued_work;
> +    cpu->queued_work = NULL;
> +
> +    qemu_mutex_unlock(&cpu->work_mutex);
> +
> +    if (work_list) {
> +
> +        g_assert(work_list->len > 0);
> +
> +        for (i = 0; i < work_list->len; i++) {
> +            wi = &g_array_index(work_list, struct qemu_work_item, i);
> +
> +            if (wi->safe) {
> +                while (tcg_pending_threads) {
> +                    qemu_cond_wait(&qemu_exclusive_cond,
> +                                   qemu_get_cpu_work_mutex());
> +                }
>              }
> -        }
> -        qemu_mutex_unlock(&cpu->work_mutex);
> -        wi->func(cpu, wi->data);
> -        qemu_mutex_lock(&cpu->work_mutex);
> -        if (wi->safe) {
> -            if (!atomic_dec_fetch(&safe_work_pending)) {
> -                qemu_cond_broadcast(&qemu_safe_work_cond);
> +
> +            wi->func(cpu, wi->data);
> +
> +            if (wi->safe) {
> +                if (!atomic_dec_fetch(&safe_work_pending)) {
> +                    qemu_cond_broadcast(&qemu_safe_work_cond);
> +                }
> +            }
> +
> +            if (wi->done) {
> +                atomic_mb_set(wi->done, true);
>              }
>          }
> -        if (wi->free) {
> -            g_free(wi);
> -        } else {
> -            atomic_mb_set(&wi->done, true);
> -        }
> +
> +        trace_process_queued_cpu_work(cpu->cpu_index,
> work_list->len);

And so was this.


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Alex Bennée
  2016-08-02 17:36   ` Alex Bennée
  2016-08-02 17:42   ` Alex Bennée
@ 2016-08-02 18:53   ` Emilio G. Cota
  2016-08-03  8:34     ` Alex Bennée
  2 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2016-08-02 18:53 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On Tue, Aug 02, 2016 at 18:27:44 +0100, Alex Bennée wrote:
> Under times of high memory stress the additional small mallocs by a
> linked list are source of potential memory fragmentation. As we have
> worked hard to avoid mallocs elsewhere when queuing work we might as
> well do the same for the list. We convert the lists to a auto-resizeing
> GArray which will re-size in steps of powers of 2.

Would be nice to see numbers on how this compares to simply using
tcmalloc/jemalloc (or the glibc allocator, really).

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu() Alex Bennée
@ 2016-08-02 19:22   ` Emilio G. Cota
  2016-08-03 21:02     ` Alex Bennée
  2016-08-28  0:21     ` Paolo Bonzini
  0 siblings, 2 replies; 25+ messages in thread
From: Emilio G. Cota @ 2016-08-02 19:22 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite, Riku Voipio

On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
> 
> This patch is based on the ideas found in work of KONRAD Frederic [1],
> Alex Bennée [2], and Alvise Rigo [3].
> 
> This mechanism allows to perform an operation safely in a quiescent
> state. Quiescent state means: (1) no vCPU is running and (2) BQL in
> system-mode or 'exclusive_lock' in user-mode emulation is held while
> performing the operation. This functionality is required e.g. for
> performing translation buffer flush safely in multi-threaded user-mode
> emulation.
> 
> The existing CPU work queue is used to schedule such safe operations. A
> new 'safe' flag is added into struct qemu_work_item to designate the
> special requirements of the safe work. An operation in a quiescent sate

s/sate/state/

(snip)
> index a233f01..6d5da15 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -25,6 +25,7 @@
>  
>  bool exit_request;
>  CPUState *tcg_current_cpu;
> +int tcg_pending_threads;
>  
>  /* exit the current TB, but without causing any exception to be raised */
>  void cpu_loop_exit_noexc(CPUState *cpu)
> @@ -79,6 +80,35 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>  }
>  
>  QemuCond qemu_work_cond;
> +QemuCond qemu_safe_work_cond;
> +QemuCond qemu_exclusive_cond;
> +
> +static int safe_work_pending;
> +
> +#ifdef CONFIG_USER_ONLY
> +#define can_wait_for_safe() (1)
> +#else
> +/*
> + * We never sleep in SoftMMU emulation because we would deadlock as
> + * all vCPUs are in the same thread. This will change for MTTCG
> + * however.
> + */
> +#define can_wait_for_safe() (0)
> +#endif
> +
> +void wait_safe_cpu_work(void)
> +{
> +    while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {

The atomic here is puzzling, see below.

> +        /*
> +         * If there is pending safe work and no pending threads we
> +         * need to signal another thread to start its work.
> +         */
> +        if (tcg_pending_threads == 0) {
> +            qemu_cond_signal(&qemu_exclusive_cond);
> +        }
> +        qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
> +    }
> +}
>  
>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>  {
> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>      cpu->queued_work_last = wi;
>      wi->next = NULL;
>      wi->done = false;
> +    if (wi->safe) {
> +        atomic_inc(&safe_work_pending);
> +    }

This doesn't seem right. Operating on the condvar's shared 'state' variable
should always be done with the condvar's mutex held. Otherwise, there's
no guarantee that sleepers will always see a consistent state when they're
woken up, which can easily lead to deadlock.

I suspect this is what caused the deadlock you saw in the last iteration
of the series.

An additional requirement is the fact that new CPUs can come anytime in
user-mode (imagine we're flushing the TB while a new pthread was just
spawned). This is easily triggered by greatly reducing the size of the
translation buffer, and spawning dozens of threads. This patch, as it
stands, won't catch the new threads coming in, because at the time
"safe work" was assigned, the new threads might not be seen by
CPU_FOREACH (btw, the CPU list should be converted to RCU, but a
ppc machine might be affected, see [1])

A possible fix is to sched safe work after exiting the CPU loop, i.e.
with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset
and doesn't scale very well on 64 cores (too much contention
on tb_lock), although at least it doesn't deadlock.

An alternative is to have a separate lock for safe work, and check for
safe work once there are no other locks held; a good place to do this is
at the beginning of cpu_loop_exec. This scales better, and I'd argue
it's simpler. In fact, I posted a patch that does this about a year
ago (!):
  https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html
Paolo didn't like condvars, but now I see them coming up again. I guess
he still won't like the synchronize_rcu() call in there, and I don't like
it either, but I don't think that's an essential part of that patch.

Thanks,

		Emilio

[1] https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02581.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray
  2016-08-02 18:53   ` Emilio G. Cota
@ 2016-08-03  8:34     ` Alex Bennée
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-03  8:34 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Emilio G. Cota <cota@braap.org> writes:

> On Tue, Aug 02, 2016 at 18:27:44 +0100, Alex Bennée wrote:
>> Under times of high memory stress the additional small mallocs by a
>> linked list are source of potential memory fragmentation. As we have
>> worked hard to avoid mallocs elsewhere when queuing work we might as
>> well do the same for the list. We convert the lists to a auto-resizeing
>> GArray which will re-size in steps of powers of 2.
>
> Would be nice to see numbers on how this compares to simply using
> tcmalloc/jemalloc (or the glibc allocator, really).

glib just uses the glibc malloc/realloc underneath so it is all the same
allocator just a different usage pattern.

I was trying to find a decent way to measure the allocation usage and
fragmentation other than watching the differential in htop's memory
usage display with the two methods. Any ideas/suggestions?

>
> Thanks,
>
> 		Emilio


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-02 19:22   ` Emilio G. Cota
@ 2016-08-03 21:02     ` Alex Bennée
  2016-08-03 23:17       ` Emilio G. Cota
  2016-08-28  0:21     ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Alex Bennée @ 2016-08-03 21:02 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite, Riku Voipio


Emilio G. Cota <cota@braap.org> writes:

> On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote:
>> From: Sergey Fedorov <serge.fdrv@gmail.com>
>>
>> This patch is based on the ideas found in work of KONRAD Frederic [1],
>> Alex Bennée [2], and Alvise Rigo [3].
>>
>> This mechanism allows to perform an operation safely in a quiescent
>> state. Quiescent state means: (1) no vCPU is running and (2) BQL in
>> system-mode or 'exclusive_lock' in user-mode emulation is held while
>> performing the operation. This functionality is required e.g. for
>> performing translation buffer flush safely in multi-threaded user-mode
>> emulation.
>>
>> The existing CPU work queue is used to schedule such safe operations. A
>> new 'safe' flag is added into struct qemu_work_item to designate the
>> special requirements of the safe work. An operation in a quiescent sate
>
> s/sate/state/
>
> (snip)
>> index a233f01..6d5da15 100644
>> --- a/cpu-exec-common.c
>> +++ b/cpu-exec-common.c
>> @@ -25,6 +25,7 @@
>>
>>  bool exit_request;
>>  CPUState *tcg_current_cpu;
>> +int tcg_pending_threads;
>>
>>  /* exit the current TB, but without causing any exception to be raised */
>>  void cpu_loop_exit_noexc(CPUState *cpu)
>> @@ -79,6 +80,35 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
>>  }
>>
>>  QemuCond qemu_work_cond;
>> +QemuCond qemu_safe_work_cond;
>> +QemuCond qemu_exclusive_cond;
>> +
>> +static int safe_work_pending;
>> +
>> +#ifdef CONFIG_USER_ONLY
>> +#define can_wait_for_safe() (1)
>> +#else
>> +/*
>> + * We never sleep in SoftMMU emulation because we would deadlock as
>> + * all vCPUs are in the same thread. This will change for MTTCG
>> + * however.
>> + */
>> +#define can_wait_for_safe() (0)
>> +#endif
>> +
>> +void wait_safe_cpu_work(void)
>> +{
>> +    while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {
>
> The atomic here is puzzling, see below.
>
>> +        /*
>> +         * If there is pending safe work and no pending threads we
>> +         * need to signal another thread to start its work.
>> +         */
>> +        if (tcg_pending_threads == 0) {
>> +            qemu_cond_signal(&qemu_exclusive_cond);
>> +        }
>> +        qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
>> +    }
>> +}
>>
>>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>>  {
>> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>>      cpu->queued_work_last = wi;
>>      wi->next = NULL;
>>      wi->done = false;
>> +    if (wi->safe) {
>> +        atomic_inc(&safe_work_pending);
>> +    }
>
> This doesn't seem right. Operating on the condvar's shared 'state' variable
> should always be done with the condvar's mutex held. Otherwise, there's
> no guarantee that sleepers will always see a consistent state when they're
> woken up, which can easily lead to deadlock.

How so? Surely the barriers around the atomic accesses and the implicit
barriers of the mutexes ensure it is?

>
> I suspect this is what caused the deadlock you saw in the last iteration
> of the series.
>
> An additional requirement is the fact that new CPUs can come anytime in
> user-mode (imagine we're flushing the TB while a new pthread was just
> spawned). This is easily triggered by greatly reducing the size of the
> translation buffer, and spawning dozens of threads.

I don't suppose you have a test case written up for this already?

My kvm-unit-tests are fairly extensive for SoftMMU mode but for
user-mode I was only using pigz with the TB buffer scaled down.
Obviously I need to expand the user-mode testing.

> This patch, as it
> stands, won't catch the new threads coming in, because at the time
> "safe work" was assigned, the new threads might not be seen by
> CPU_FOREACH (btw, the CPU list should be converted to RCU, but a
> ppc machine might be affected, see [1])

That sounds like a separate patch to RCUify.

>
> A possible fix is to sched safe work after exiting the CPU loop, i.e.
> with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset
> and doesn't scale very well on 64 cores (too much contention
> on tb_lock), although at least it doesn't deadlock.

Where exactly? Surely tb_lock contention shouldn't be a problem as it is
only held for generation and patching now?

> An alternative is to have a separate lock for safe work, and check for
> safe work once there are no other locks held; a good place to do this is
> at the beginning of cpu_loop_exec. This scales better, and I'd argue
> it's simpler. In fact, I posted a patch that does this about a year
> ago (!):
>   https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html

I'll have another look at this. One thing I prefer about this series is
it keeps all the work mechanisms together. I think that's worth striving
for if we can.

> Paolo didn't like condvars, but now I see them coming up again. I guess
> he still won't like the synchronize_rcu() call in there, and I don't like
> it either, but I don't think that's an essential part of that patch.
>
> Thanks,
>
> 		Emilio
>
> [1] https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02581.html


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-03 21:02     ` Alex Bennée
@ 2016-08-03 23:17       ` Emilio G. Cota
  2016-08-04  6:44         ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Emilio G. Cota @ 2016-08-03 23:17 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite, Riku Voipio

On Wed, Aug 03, 2016 at 22:02:04 +0100, Alex Bennée wrote:
> Emilio G. Cota <cota@braap.org> writes:
> 
> > On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote:
(snip)
> >> +void wait_safe_cpu_work(void)
> >> +{
> >> +    while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {
> >
> > The atomic here is puzzling, see below.
> >
> >> +        /*
> >> +         * If there is pending safe work and no pending threads we
> >> +         * need to signal another thread to start its work.
> >> +         */
> >> +        if (tcg_pending_threads == 0) {
> >> +            qemu_cond_signal(&qemu_exclusive_cond);
> >> +        }
> >> +        qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
> >> +    }
> >> +}
> >>
> >>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> >>  {
> >> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
> >>      cpu->queued_work_last = wi;
> >>      wi->next = NULL;
> >>      wi->done = false;
> >> +    if (wi->safe) {
> >> +        atomic_inc(&safe_work_pending);
> >> +    }
> >
> > This doesn't seem right. Operating on the condvar's shared 'state' variable
> > should always be done with the condvar's mutex held. Otherwise, there's
> > no guarantee that sleepers will always see a consistent state when they're
> > woken up, which can easily lead to deadlock.
> 
> How so? Surely the barriers around the atomic accesses and the implicit
> barriers of the mutexes ensure it is?

Barriers guarantee that accesses will be perceived in the right order.
However, they do not determine *when* those accesses will be seen by
other CPUs. For that we need stronger primitives, i.e. atomics like
the ones embedded in locks. Otherwise we might end up with races that
are very hard to debug.

> > I suspect this is what caused the deadlock you saw in the last iteration
> > of the series.
> >
> > An additional requirement is the fact that new CPUs can come anytime in
> > user-mode (imagine we're flushing the TB while a new pthread was just
> > spawned). This is easily triggered by greatly reducing the size of the
> > translation buffer, and spawning dozens of threads.
> 
> I don't suppose you have a test case written up for this already?
> 
> My kvm-unit-tests are fairly extensive for SoftMMU mode but for
> user-mode I was only using pigz with the TB buffer scaled down.
> Obviously I need to expand the user-mode testing.

A tiny TB buffer (redefining the constants in translate-all.c), plus
a program running under linux-user that spawns many threads that do actual
work is a good test.

> > A possible fix is to sched safe work after exiting the CPU loop, i.e.
> > with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset
> > and doesn't scale very well on 64 cores (too much contention
> > on tb_lock), although at least it doesn't deadlock.
> 
> Where exactly? Surely tb_lock contention shouldn't be a problem as it is
> only held for generation and patching now?

Booting up 64 cores on x86_64 can show contention on a 64-core host,
since CPU kicks are frequent. Do you have this v5 + mttcg + cmpxchg in a
branch so that I can test?

> > An alternative is to have a separate lock for safe work, and check for
> > safe work once there are no other locks held; a good place to do this is
> > at the beginning of cpu_loop_exec. This scales better, and I'd argue
> > it's simpler. In fact, I posted a patch that does this about a year
> > ago (!):
> >   https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html
> 
> I'll have another look at this. One thing I prefer about this series is
> it keeps all the work mechanisms together. I think that's worth striving
> for if we can.

Sure. I don't think that old patchset has much value apart from raising
some issues that aren't mentioned in this series.

By the way before even considering the merge of this patchset, I think
we should look at merging first the cmpxchg work (at least for x86)
so that we can thoroughly test this set at least with linux-user. Otherwise
we'll see errors/segfaults and we won't know what caused them.

Thanks,

		Emilio

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-03 23:17       ` Emilio G. Cota
@ 2016-08-04  6:44         ` Alex Bennée
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-04  6:44 UTC (permalink / raw)
  To: Emilio G. Cota
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite, Riku Voipio


Emilio G. Cota <cota@braap.org> writes:

> On Wed, Aug 03, 2016 at 22:02:04 +0100, Alex Bennée wrote:
>> Emilio G. Cota <cota@braap.org> writes:
>>
>> > On Tue, Aug 02, 2016 at 18:27:42 +0100, Alex Bennée wrote:
> (snip)
>> >> +void wait_safe_cpu_work(void)
>> >> +{
>> >> +    while (can_wait_for_safe() && atomic_mb_read(&safe_work_pending) > 0) {
>> >
>> > The atomic here is puzzling, see below.
>> >
>> >> +        /*
>> >> +         * If there is pending safe work and no pending threads we
>> >> +         * need to signal another thread to start its work.
>> >> +         */
>> >> +        if (tcg_pending_threads == 0) {
>> >> +            qemu_cond_signal(&qemu_exclusive_cond);
>> >> +        }
>> >> +        qemu_cond_wait(&qemu_safe_work_cond, qemu_get_cpu_work_mutex());
>> >> +    }
>> >> +}
>> >>
>> >>  static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>> >>  {
>> >> @@ -91,9 +121,18 @@ static void queue_work_on_cpu(CPUState *cpu, struct qemu_work_item *wi)
>> >>      cpu->queued_work_last = wi;
>> >>      wi->next = NULL;
>> >>      wi->done = false;
>> >> +    if (wi->safe) {
>> >> +        atomic_inc(&safe_work_pending);
>> >> +    }
>> >
>> > This doesn't seem right. Operating on the condvar's shared 'state' variable
>> > should always be done with the condvar's mutex held. Otherwise, there's
>> > no guarantee that sleepers will always see a consistent state when they're
>> > woken up, which can easily lead to deadlock.
>>
>> How so? Surely the barriers around the atomic accesses and the implicit
>> barriers of the mutexes ensure it is?
>
> Barriers guarantee that accesses will be perceived in the right order.
> However, they do not determine *when* those accesses will be seen by
> other CPUs. For that we need stronger primitives, i.e. atomics like
> the ones embedded in locks. Otherwise we might end up with races that
> are very hard to debug.

But that's what we have, atomics incs followed by a kick
(cpu_exit/signal conds) to wake up threads. FWIW I did fix a problem
with the MTTCG logic last night to do with sleeping:

    https://github.com/stsquad/qemu/commit/2f4f3ed149cfe87794a3bd4adfccf04b4cd81873

>
>> > I suspect this is what caused the deadlock you saw in the last iteration
>> > of the series.
>> >
>> > An additional requirement is the fact that new CPUs can come anytime in
>> > user-mode (imagine we're flushing the TB while a new pthread was just
>> > spawned). This is easily triggered by greatly reducing the size of the
>> > translation buffer, and spawning dozens of threads.
>>
>> I don't suppose you have a test case written up for this already?
>>
>> My kvm-unit-tests are fairly extensive for SoftMMU mode but for
>> user-mode I was only using pigz with the TB buffer scaled down.
>> Obviously I need to expand the user-mode testing.
>
> A tiny TB buffer (redefining the constants in translate-all.c), plus
> a program running under linux-user that spawns many threads that do actual
> work is a good test.

Yeah this is what the pigz test does. I made the buffer very small and
had several clashing flush events but it was all stable.

>
>> > A possible fix is to sched safe work after exiting the CPU loop, i.e.
>> > with qemu_get_cpu_work_mutex held. I tried this on v4 of this patchset
>> > and doesn't scale very well on 64 cores (too much contention
>> > on tb_lock), although at least it doesn't deadlock.
>>
>> Where exactly? Surely tb_lock contention shouldn't be a problem as it is
>> only held for generation and patching now?
>
> Booting up 64 cores on x86_64 can show contention on a 64-core host,
> since CPU kicks are frequent. Do you have this v5 + mttcg + cmpxchg in a
> branch so that I can test?

You can have async-work-v5 + base-patches-v4-wip:

  https://github.com/stsquad/qemu/commits/mttcg/base-patches-v4-04082016-for-emilio

Last time I experimented with merging your cmpxchg work it went in
without any conflicts so hopefully that is the same now.

This tree currently runs all of the kvm-unit-tests for ARMv7 and v8 in
-accel tcg,thread=single mode and runs the tcg and tlbflush test groups
in -accel tcg,thread=multi mode.

It looks like I have some silly compile errors to fix for Travis to be
happy though.

>> > An alternative is to have a separate lock for safe work, and check for
>> > safe work once there are no other locks held; a good place to do this is
>> > at the beginning of cpu_loop_exec. This scales better, and I'd argue
>> > it's simpler. In fact, I posted a patch that does this about a year
>> > ago (!):
>> >   https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html
>>
>> I'll have another look at this. One thing I prefer about this series is
>> it keeps all the work mechanisms together. I think that's worth striving
>> for if we can.
>
> Sure. I don't think that old patchset has much value apart from raising
> some issues that aren't mentioned in this series.
>
> By the way before even considering the merge of this patchset, I think
> we should look at merging first the cmpxchg work (at least for x86)
> so that we can thoroughly test this set at least with linux-user. Otherwise
> we'll see errors/segfaults and we won't know what caused them.

Yes I reckon the cmpxchg and barrier work should get merged first as it
is both testable and worthwhile for the existing linux-user case. If we
are happy async-work is fit for purpose in SoftMMU mode as well I would
expect that to get merged ahead of the main base enabling set.

In fact the base patches have pulled a bunch of the cputlb patches from
the ARM enabling series I think we are pretty close to base-patches-v4 +
any functioning atomic fix = support for all platforms ;-)

There are a few wrinkles that need to be checked out for each platform
though so I still expect we'll be enabling combos as we go. This mainly
has to do with things like booting up cpus in a MTTCG safe way.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-02 19:22   ` Emilio G. Cota
  2016-08-03 21:02     ` Alex Bennée
@ 2016-08-28  0:21     ` Paolo Bonzini
  2016-08-29 17:26       ` Paolo Bonzini
  1 sibling, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2016-08-28  0:21 UTC (permalink / raw)
  To: Emilio G. Cota, Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, Sergey Fedorov,
	Peter Crosthwaite, jan.kiszka, Riku Voipio, mark.burton, a.rigo,
	qemu-devel, serge.fdrv, bobby.prani, rth, fred.konrad

On 02/08/2016 21:22, Emilio G. Cota wrote:
> An alternative is to have a separate lock for safe work, and check for
> safe work once there are no other locks held; a good place to do this is
> at the beginning of cpu_loop_exec. This scales better, and I'd argue
> it's simpler. In fact, I posted a patch that does this about a year
> ago (!):
>   https://lists.nongnu.org/archive/html/qemu-devel/2015-08/msg02576.html

I like the idea.

> Paolo didn't like condvars, but now I see them coming up again. I guess
> he still won't like the synchronize_rcu() call in there, and I don't like
> it either, but I don't think that's an essential part of that patch.

The problem with CPUs coming up late is indeed present in this patch,
I'll review your patch on the flight. :)

synchronize_rcu() is actually relatively cheap with URCU, so I guess
that's fine.  An alternative to that could be a pthread_barrier_t, but
it can be added later.

Another way to fix the issue with a variable number of waiters could be
to wrap safe work with rcu_read_lock and rcu_read_unlock, and put a
synchronize_rcu() at the beginning of the CPU thread function.  But it
can be done later too.

Your patch from a year ago, right now, seems to be the best to me.  I'd
like to make it use regular work items instead of the special
cpu->tcg_work_func, but that's pretty much it.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-28  0:21     ` Paolo Bonzini
@ 2016-08-29 17:26       ` Paolo Bonzini
  2016-08-31 10:09         ` Alex Bennée
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2016-08-29 17:26 UTC (permalink / raw)
  To: Emilio G. Cota, Alex Bennée
  Cc: mttcg, peter.maydell, Sergey Fedorov, Peter Crosthwaite,
	jan.kiszka, Riku Voipio, claudio.fontana, a.rigo, qemu-devel,
	mark.burton, serge.fdrv, bobby.prani, fred.konrad, rth


> The problem with CPUs coming up late is indeed present in this patch,
> I'll review your patch on the flight. :)
> 
> synchronize_rcu() is actually relatively cheap with URCU, so I guess
> that's fine.  An alternative to that could be a pthread_barrier_t, but
> it can be added later.
> 
> Another way to fix the issue with a variable number of waiters could be
> to wrap safe work with rcu_read_lock and rcu_read_unlock, and put a
> synchronize_rcu() at the beginning of the CPU thread function.  But it
> can be done later too.
> 
> Your patch from a year ago, right now, seems to be the best to me.  I'd
> like to make it use regular work items instead of the special
> cpu->tcg_work_func, but that's pretty much it.

Ok, I think I have something.  It only uses condition variables when
there is a safe work in flight, to enter and leave the function at the
right time.  It also makes linux-user's start_exclusive/end_exclusive
use the same synchronization logic.  I'll test it and post; most
preliminary patches are straight from this series.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu()
  2016-08-29 17:26       ` Paolo Bonzini
@ 2016-08-31 10:09         ` Alex Bennée
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Bennée @ 2016-08-31 10:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Emilio G. Cota, mttcg, peter.maydell, Sergey Fedorov,
	Peter Crosthwaite, jan.kiszka, Riku Voipio, claudio.fontana,
	a.rigo, qemu-devel, mark.burton, serge.fdrv, bobby.prani,
	fred.konrad, rth


Paolo Bonzini <pbonzini@redhat.com> writes:

>> The problem with CPUs coming up late is indeed present in this patch,
>> I'll review your patch on the flight. :)
>>
>> synchronize_rcu() is actually relatively cheap with URCU, so I guess
>> that's fine.  An alternative to that could be a pthread_barrier_t, but
>> it can be added later.
>>
>> Another way to fix the issue with a variable number of waiters could be
>> to wrap safe work with rcu_read_lock and rcu_read_unlock, and put a
>> synchronize_rcu() at the beginning of the CPU thread function.  But it
>> can be done later too.
>>
>> Your patch from a year ago, right now, seems to be the best to me.  I'd
>> like to make it use regular work items instead of the special
>> cpu->tcg_work_func, but that's pretty much it.
>
> Ok, I think I have something.  It only uses condition variables when
> there is a safe work in flight, to enter and leave the function at the
> right time.  It also makes linux-user's start_exclusive/end_exclusive
> use the same synchronization logic.  I'll test it and post; most
> preliminary patches are straight from this series.

Good stuff, I look forward to seeing the patches. I'll see if I can come
up with some better stress tests for linux-user in the meantime.

>
> Paolo


--
Alex Bennée

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2016-08-31 10:09 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-02 17:27 [Qemu-devel] [PATCH v5 00/13] cpu-exec: Safe work in quiescent state Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 01/13] atomic: introduce atomic_dec_fetch Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 02/13] cpus: pass CPUState to run_on_cpu helpers Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 03/13] cpus: Move common code out of {async_, }run_on_cpu() Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 04/13] cpus: Wrap mutex used to protect CPU work Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 05/13] cpus: Rename flush_queued_work() Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 06/13] linux-user: Use QemuMutex and QemuCond Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 07/13] linux-user: Rework exclusive operation mechanism Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 08/13] linux-user: Add qemu_cpu_is_self() and qemu_cpu_kick() Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 09/13] linux-user: Support CPU work queue Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 10/13] bsd-user: " Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 11/13] cpu-exec-common: Introduce async_safe_run_on_cpu() Alex Bennée
2016-08-02 19:22   ` Emilio G. Cota
2016-08-03 21:02     ` Alex Bennée
2016-08-03 23:17       ` Emilio G. Cota
2016-08-04  6:44         ` Alex Bennée
2016-08-28  0:21     ` Paolo Bonzini
2016-08-29 17:26       ` Paolo Bonzini
2016-08-31 10:09         ` Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 12/13] tcg: Make tb_flush() thread safe Alex Bennée
2016-08-02 17:27 ` [Qemu-devel] [PATCH v5 13/13] cpu-exec: replace cpu->queued_work with GArray Alex Bennée
2016-08-02 17:36   ` Alex Bennée
2016-08-02 17:42   ` Alex Bennée
2016-08-02 18:53   ` Emilio G. Cota
2016-08-03  8:34     ` Alex Bennée

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).