[Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG
@ 2016-06-03 20:40 Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool Alex Bennée
                   ` (19 more replies)
  0 siblings, 20 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée

This is the third iteration of the RFC patch set which aims to provide
the basic framework for MTTCG. There have been some considerable
changes since the last extensive review (thanks to all the reviewers).

   - many updates to docs/multi-thread-tcg.txt design document
   - added assertions for all the locking requirements
   - split apart the big enable thread-per-vCPU patch
   - removed locking from the hot-path

In general the main thread functions are a lot less messy (especially
the single thread variant). The splitting apart of the big enabling
patch was helped by removing tcg_current_cpu and the global
exit_request as separate patches. Finally the big performance boost of
a lockless hot-path is made possible by Emilio's QHT work which this
is based on.

The branch can be found at:

  https://github.com/stsquad/qemu/tree/mttcg/base-patches-v3

It has been built up on top of:

  https://github.com/stsquad/qemu/tree/review/qht-v6

What's next
===========

This series should now behave as normal in the default single threaded
mode and in multi-threaded mode where SMP==1. Please report any
regressions/bugs you find.

multi-threaded mode works for any guests that don't make assumptions
about atomicity, memory consistency or cache flushes. Basically simple
kvm-unit-test cases. I've been using the simple locking-test:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=[single|multi]

Default:
real    0m14.804s
user    0m14.752s
sys     0m0.032s

Multi-threaded:
real    0m6.307s
user    0m23.396s
sys     0m0.020s

With the lockless hot-path (thanks QHT ;-) we can now see a benefit in
running with multiple threads with a slight cost to overall CPU time.
I intend to run a more complete set of benchmarks once I'm re-based by
ARM specific series on top of this patch-set (and compared the LL/SC
work with the simple lock based exclusives in the original series.)

In the meantime any review comments gratefully received.

Cheers,

Alex

Alex Bennée (15):
  cpus: make all_vcpus_paused() return bool
  translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  translate-all: add DEBUG_LOCKING asserts
  docs: new design document multi-thread-tcg.txt (DRAFTING)
  exec: add assert_debug_safe and notes on debug structures
  translate-all: Add assert_memory_lock annotations
  target-arm/arm-powerctl: wake up sleeping CPUs
  tcg: cpus rm tcg_exec_all()
  tcg: add kick timer for single-threaded vCPU emulation
  tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  tcg: remove global exit_request
  tcg: drop global lock during TCG code execution
  tcg: move locking for tb_invalidate_phys_page_range up
  tcg: enable thread-per-vCPU
  cpu-exec: remove tb_lock from the hot-path

KONRAD Frederic (2):
  tcg: protect TBContext with tb_lock.
  tcg: add options for enabling MTTCG

Paolo Bonzini (1):
  tcg: comment on which functions have to be called with tb_lock held

Sergey Fedorov (1):
  tcg: Ensure safe TB lookup out of 'tb_lock'

 bsd-user/mmap.c           |   5 +
 cpu-exec-common.c         |   3 -
 cpu-exec.c                | 104 ++++++-----
 cpus.c                    | 448 ++++++++++++++++++++++++++++------------------
 cputlb.c                  |   1 +
 docs/multi-thread-tcg.txt | 225 +++++++++++++++++++++++
 exec.c                    |  87 ++++++++-
 hw/i386/kvmvapic.c        |   4 +
 include/exec/exec-all.h   |   6 +-
 include/qom/cpu.h         |  11 ++
 include/sysemu/cpus.h     |   2 +
 linux-user/elfload.c      |   4 +
 linux-user/main.c         |   1 +
 linux-user/mmap.c         |   5 +
 memory.c                  |   2 +
 qemu-options.hx           |  20 +++
 softmmu_template.h        |  17 ++
 target-arm/Makefile.objs  |   2 +-
 target-arm/arm-powerctl.c |   2 +
 target-i386/smm_helper.c  |   7 +
 tcg/tcg.h                 |   2 +
 translate-all.c           | 187 +++++++++++++++----
 vl.c                      |  45 ++++-
 23 files changed, 931 insertions(+), 259 deletions(-)
 create mode 100644 docs/multi-thread-tcg.txt

-- 
2.7.4

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-07 15:05   ` Richard Henderson
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 02/19] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>

---
v3
  - add r-b tags
---
 cpus.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/cpus.c b/cpus.c
index 665f9bb..c404dd7 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1258,17 +1258,17 @@ void qemu_mutex_unlock_iothread(void)
     qemu_mutex_unlock(&qemu_global_mutex);
 }
 
-static int all_vcpus_paused(void)
+static bool all_vcpus_paused(void)
 {
     CPUState *cpu;
 
     CPU_FOREACH(cpu) {
         if (!cpu->stopped) {
-            return 0;
+            return false;
         }
     }
 
-    return 1;
+    return true;
 }
 
 void pause_all_vcpus(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool Alex Bennée
@ 2016-06-07 15:05   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2016-06-07 15:05 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 06/03/2016 01:40 PM, Alex Bennée wrote:
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
> 
> ---
> v3
>   - add r-b tags
> ---
>  cpus.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 665f9bb..c404dd7 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1258,17 +1258,17 @@ void qemu_mutex_unlock_iothread(void)
>      qemu_mutex_unlock(&qemu_global_mutex);
>  }
>  
> -static int all_vcpus_paused(void)
> +static bool all_vcpus_paused(void)
>  {
>      CPUState *cpu;
>  
>      CPU_FOREACH(cpu) {
>          if (!cpu->stopped) {
> -            return 0;
> +            return false;
>          }
>      }
>  
> -    return 1;
> +    return true;
>  }
>  
>  void pause_all_vcpus(void)
> 

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 02/19] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-07 14:54   ` Richard Henderson
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts Alex Bennée
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Make the debug define consistent with the others. The flush operation is
all about invalidating TranslationBlocks on flush events.

Also fix up the commenting on the other DEBUG for the benefit of
checkpatch.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 translate-all.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/translate-all.c b/translate-all.c
index 625411f..ec6fdbb 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -58,10 +58,10 @@
 #include "qemu/timer.h"
 #include "exec/log.h"
 
-//#define DEBUG_TB_INVALIDATE
-//#define DEBUG_FLUSH
+/* #define DEBUG_TB_INVALIDATE */
+/* #define DEBUG_TB_FLUSH */
 /* make various TB consistency checks */
-//#define DEBUG_TB_CHECK
+/* #define DEBUG_TB_CHECK */
 
 #if !defined(CONFIG_USER_ONLY)
 /* TB consistency checks only implemented for usermode emulation.  */
@@ -836,7 +836,7 @@ static void page_flush_tb(void)
 /* XXX: tb_flush is currently not thread safe */
 void tb_flush(CPUState *cpu)
 {
-#if defined(DEBUG_FLUSH)
+#if defined(DEBUG_TB_FLUSH)
     printf("qemu: flush code_size=%ld nb_tbs=%d avg_tb_size=%ld\n",
            (unsigned long)(tcg_ctx.code_gen_ptr - tcg_ctx.code_gen_buffer),
            tcg_ctx.tb_ctx.nb_tbs, tcg_ctx.tb_ctx.nb_tbs > 0 ?
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 02/19] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 02/19] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
@ 2016-06-07 14:54   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2016-06-07 14:54 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 06/03/2016 01:40 PM, Alex Bennée wrote:
> Make the debug define consistent with the others. The flush operation is
> all about invalidating TranslationBlocks on flush events.
> 
> Also fix up the commenting on the other DEBUG for the benefit of
> checkpatch.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  translate-all.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 02/19] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-23 14:34   ` Sergey Fedorov
  2016-07-01 23:21   ` Richard Henderson
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 04/19] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
                   ` (16 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Blue Swirl, Peter Crosthwaite,
	Riku Voipio

This adds asserts to check the locking on the various translation
engines structures. There are two sets of structures that are protected
by locks.

The first the l1map and PageDesc structures used to track which
translation blocks are associated with which physical addresses. In
user-mode this is covered by the mmap_lock.

The second case are TB context related structures which are protected by
tb_lock which is also user-mode only.

Currently the asserts do nothing in SoftMMU mode but this will change
for MTTCG.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 bsd-user/mmap.c         |  5 +++++
 include/exec/exec-all.h |  1 +
 linux-user/mmap.c       |  5 +++++
 translate-all.c         | 41 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+)

diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index 6ab5334..4a61ad0 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -43,6 +43,11 @@ void mmap_unlock(void)
     }
 }
 
+bool have_mmap_lock(void)
+{
+    return mmap_lock_count > 0 ? true : false;
+}
+
 /* Grab lock to make sure things are in a consistent state after fork().  */
 void mmap_fork_start(void)
 {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 70ab361..ea1c925 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -369,6 +369,7 @@ void tlb_fill(CPUState *cpu, target_ulong addr, int is_write, int mmu_idx,
 #if defined(CONFIG_USER_ONLY)
 void mmap_lock(void);
 void mmap_unlock(void);
+bool have_mmap_lock(void);
 
 static inline tb_page_addr_t get_page_addr_code(CPUArchState *env1, target_ulong addr)
 {
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 3519147..fd68249 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -44,6 +44,11 @@ void mmap_unlock(void)
     }
 }
 
+bool have_mmap_lock(void)
+{
+    return mmap_lock_count > 0 ? true : false;
+}
+
 /* Grab lock to make sure things are in a consistent state after fork().  */
 void mmap_fork_start(void)
 {
diff --git a/translate-all.c b/translate-all.c
index ec6fdbb..e3f44d9 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -33,6 +33,7 @@
 #include "tcg.h"
 #if defined(CONFIG_USER_ONLY)
 #include "qemu.h"
+#include "exec/exec-all.h"
 #if defined(__FreeBSD__) || defined(__FreeBSD_kernel__)
 #include <sys/param.h>
 #if __FreeBSD_version >= 700104
@@ -60,6 +61,7 @@
 
 /* #define DEBUG_TB_INVALIDATE */
 /* #define DEBUG_TB_FLUSH */
+/* #define DEBUG_LOCKING */
 /* make various TB consistency checks */
 /* #define DEBUG_TB_CHECK */
 
@@ -68,6 +70,28 @@
 #undef DEBUG_TB_CHECK
 #endif
 
+/* Access to the various translations structures need to be serialised via locks
+ * for consistency. This is automatic for SoftMMU based system
+ * emulation due to its single threaded nature. In user-mode emulation
+ * access to the memory related structures are protected with the
+ * mmap_lock.
+ */
+#ifdef DEBUG_LOCKING
+#define DEBUG_MEM_LOCKS 1
+#else
+#define DEBUG_MEM_LOCKS 0
+#endif
+
+#ifdef CONFIG_SOFTMMU
+#define assert_memory_lock() do { /* nothing */ } while (0)
+#else
+#define assert_memory_lock() do {               \
+        if (DEBUG_MEM_LOCKS) {                  \
+            g_assert(have_mmap_lock());         \
+        }                                       \
+    } while (0)
+#endif
+
 #define SMC_BITMAP_USE_THRESHOLD 10
 
 typedef struct PageDesc {
@@ -155,6 +179,23 @@ void tb_lock_reset(void)
 #endif
 }
 
+#ifdef DEBUG_LOCKING
+#define DEBUG_TB_LOCKS 1
+#else
+#define DEBUG_TB_LOCKS 0
+#endif
+
+#ifdef CONFIG_SOFTMMU
+#define assert_tb_lock() do { /* nothing */ } while (0)
+#else
+#define assert_tb_lock() do {               \
+        if (DEBUG_TB_LOCKS) {               \
+            g_assert(have_tb_lock);         \
+        }                                   \
+    } while (0)
+#endif
+
+
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
 
 void cpu_gen_init(void)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts Alex Bennée
@ 2016-06-23 14:34   ` Sergey Fedorov
  2016-06-23 17:14     ` Alex Bennée
  2016-07-01 23:21   ` Richard Henderson
  1 sibling, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-23 14:34 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Blue Swirl, Peter Crosthwaite, Riku Voipio

On 03/06/16 23:40, Alex Bennée wrote:
> diff --git a/translate-all.c b/translate-all.c
> index ec6fdbb..e3f44d9 100644
> --- a/translate-all.c
> +++ b/translate-all.c
(snip)
> @@ -60,6 +61,7 @@
>  
>  /* #define DEBUG_TB_INVALIDATE */
>  /* #define DEBUG_TB_FLUSH */
> +/* #define DEBUG_LOCKING */

A bit of bikeshedding: have you considered naming it 'DEBUG_LOCKS'. How
does it sound for a native English speaker? :)

>  /* make various TB consistency checks */
>  /* #define DEBUG_TB_CHECK */
>  
> @@ -68,6 +70,28 @@
>  #undef DEBUG_TB_CHECK
>  #endif
>  
> +/* Access to the various translations structures need to be serialised via locks
> + * for consistency. This is automatic for SoftMMU based system
> + * emulation due to its single threaded nature. In user-mode emulation
> + * access to the memory related structures are protected with the
> + * mmap_lock.
> + */
> +#ifdef DEBUG_LOCKING
> +#define DEBUG_MEM_LOCKS 1
> +#else
> +#define DEBUG_MEM_LOCKS 0
> +#endif
> +
> +#ifdef CONFIG_SOFTMMU
> +#define assert_memory_lock() do { /* nothing */ } while (0)
> +#else
> +#define assert_memory_lock() do {               \
> +        if (DEBUG_MEM_LOCKS) {                  \
> +            g_assert(have_mmap_lock());         \
> +        }                                       \
> +    } while (0)
> +#endif
> +

Why not simply:

#if !defined(DEBUG_LOCKING) || defined(CONFIG_SOFTMMU)
#    define assert_memory_lock() do { /* nothing */ } while (0)
#else
#    define assert_memory_lock() g_assert(have_mmap_lock())
#endif

One more nit: maybe it would be a bit more clear to name it after the
lock name, i.e. assert_mmap_lock(), or check_mmap_lock(), or
debug_mmap_lock() etc?

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts
  2016-06-23 14:34   ` Sergey Fedorov
@ 2016-06-23 17:14     ` Alex Bennée
  2016-06-23 18:43       ` Sergey Fedorov
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-23 17:14 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Blue Swirl, Peter Crosthwaite, Riku Voipio


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 03/06/16 23:40, Alex Bennée wrote:
>> diff --git a/translate-all.c b/translate-all.c
>> index ec6fdbb..e3f44d9 100644
>> --- a/translate-all.c
>> +++ b/translate-all.c
> (snip)
>> @@ -60,6 +61,7 @@
>>
>>  /* #define DEBUG_TB_INVALIDATE */
>>  /* #define DEBUG_TB_FLUSH */
>> +/* #define DEBUG_LOCKING */
>
> A bit of bikeshedding: have you considered naming it 'DEBUG_LOCKS'. How
> does it sound for a native English speaker? :)
>
>>  /* make various TB consistency checks */
>>  /* #define DEBUG_TB_CHECK */
>>
>> @@ -68,6 +70,28 @@
>>  #undef DEBUG_TB_CHECK
>>  #endif
>>
>> +/* Access to the various translations structures need to be serialised via locks
>> + * for consistency. This is automatic for SoftMMU based system
>> + * emulation due to its single threaded nature. In user-mode emulation
>> + * access to the memory related structures are protected with the
>> + * mmap_lock.
>> + */
>> +#ifdef DEBUG_LOCKING
>> +#define DEBUG_MEM_LOCKS 1
>> +#else
>> +#define DEBUG_MEM_LOCKS 0
>> +#endif
>> +
>> +#ifdef CONFIG_SOFTMMU
>> +#define assert_memory_lock() do { /* nothing */ } while (0)
>> +#else
>> +#define assert_memory_lock() do {               \
>> +        if (DEBUG_MEM_LOCKS) {                  \
>> +            g_assert(have_mmap_lock());         \
>> +        }                                       \
>> +    } while (0)
>> +#endif
>> +
>
> Why not simply:
>
> #if !defined(DEBUG_LOCKING) || defined(CONFIG_SOFTMMU)
> #    define assert_memory_lock() do { /* nothing */ } while (0)
> #else
> #    define assert_memory_lock() g_assert(have_mmap_lock())
> #endif
>
> One more nit: maybe it would be a bit more clear to name it after the
> lock name, i.e. assert_mmap_lock(), or check_mmap_lock(), or
> debug_mmap_lock() etc?

Yes I can do it that way around. The if (FOO) form makes more sense for
debug output to ensure the compiler checks format strings etc. The
resulting code should be the same either way.

>
> Thanks,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts
  2016-06-23 17:14     ` Alex Bennée
@ 2016-06-23 18:43       ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-23 18:43 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Blue Swirl, Peter Crosthwaite, Riku Voipio

On 23/06/16 20:14, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 03/06/16 23:40, Alex Bennée wrote:
>>> diff --git a/translate-all.c b/translate-all.c
>>> index ec6fdbb..e3f44d9 100644
>>> --- a/translate-all.c
>>> +++ b/translate-all.c
>> (snip)
>>> @@ -60,6 +61,7 @@
>>>
>>>  /* #define DEBUG_TB_INVALIDATE */
>>>  /* #define DEBUG_TB_FLUSH */
>>> +/* #define DEBUG_LOCKING */
>> A bit of bikeshedding: have you considered naming it 'DEBUG_LOCKS'. How
>> does it sound for a native English speaker? :)
>>
>>>  /* make various TB consistency checks */
>>>  /* #define DEBUG_TB_CHECK */
>>>
>>> @@ -68,6 +70,28 @@
>>>  #undef DEBUG_TB_CHECK
>>>  #endif
>>>
>>> +/* Access to the various translations structures need to be serialised via locks
>>> + * for consistency. This is automatic for SoftMMU based system
>>> + * emulation due to its single threaded nature. In user-mode emulation
>>> + * access to the memory related structures are protected with the
>>> + * mmap_lock.
>>> + */
>>> +#ifdef DEBUG_LOCKING
>>> +#define DEBUG_MEM_LOCKS 1
>>> +#else
>>> +#define DEBUG_MEM_LOCKS 0
>>> +#endif
>>> +
>>> +#ifdef CONFIG_SOFTMMU
>>> +#define assert_memory_lock() do { /* nothing */ } while (0)
>>> +#else
>>> +#define assert_memory_lock() do {               \
>>> +        if (DEBUG_MEM_LOCKS) {                  \
>>> +            g_assert(have_mmap_lock());         \
>>> +        }                                       \
>>> +    } while (0)
>>> +#endif
>>> +
>> Why not simply:
>>
>> #if !defined(DEBUG_LOCKING) || defined(CONFIG_SOFTMMU)
>> #    define assert_memory_lock() do { /* nothing */ } while (0)
>> #else
>> #    define assert_memory_lock() g_assert(have_mmap_lock())
>> #endif
>>
>> One more nit: maybe it would be a bit more clear to name it after the
>> lock name, i.e. assert_mmap_lock(), or check_mmap_lock(), or
>> debug_mmap_lock() etc?
> Yes I can do it that way around. The if (FOO) form makes more sense for
> debug output to ensure the compiler checks format strings etc. The
> resulting code should be the same either way.

You are right, this is a good thing, I just missed it.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts Alex Bennée
  2016-06-23 14:34   ` Sergey Fedorov
@ 2016-07-01 23:21   ` Richard Henderson
  1 sibling, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2016-07-01 23:21 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Blue Swirl, Peter Crosthwaite, Riku Voipio

On 06/03/2016 01:40 PM, Alex Bennée wrote:
> This adds asserts to check the locking on the various translation
> engines structures. There are two sets of structures that are protected
> by locks.
>
> The first the l1map and PageDesc structures used to track which
> translation blocks are associated with which physical addresses. In
> user-mode this is covered by the mmap_lock.
>
> The second case are TB context related structures which are protected by
> tb_lock which is also user-mode only.
>
> Currently the asserts do nothing in SoftMMU mode but this will change
> for MTTCG.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  bsd-user/mmap.c         |  5 +++++
>  include/exec/exec-all.h |  1 +
>  linux-user/mmap.c       |  5 +++++
>  translate-all.c         | 41 +++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 52 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 04/19] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (2 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-23 21:33   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 05/19] exec: add assert_debug_safe and notes on debug structures Alex Bennée
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée

This is a current DRAFT of a design proposal for upgrading TCG emulation
to take advantage of modern CPUs by running a thread-per-CPU. The
document goes through the various areas of the code affected by such a
change and proposes design requirements for each part of the solution.

It has been written *without* explicit reference to the current ongoing
efforts to introduce this feature. The hope being we can review and
discuss the design choices without assuming the current choices taken by
the implementation are correct.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1
  - initial version
v2
  - update discussion on locks
  - bit more detail on vCPU scheduling
  - explicitly mention Translation Blocks
  - emulated hardware state already covered by iomutex
  - a few minor rewords
v3
  - mention this covers system-mode
  - describe main main-loop and lookup hot-path
  - mention multi-concurrent-reader lookups
  - enumerate reasons for invalidation
  - add more details on lookup structures
  - describe the softmmu hot-path better
  - mention store-after-load barrier problem
---
 docs/multi-thread-tcg.txt | 225 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 225 insertions(+)
 create mode 100644 docs/multi-thread-tcg.txt

diff --git a/docs/multi-thread-tcg.txt b/docs/multi-thread-tcg.txt
new file mode 100644
index 0000000..5c88c99
--- /dev/null
+++ b/docs/multi-thread-tcg.txt
@@ -0,0 +1,225 @@
+Copyright (c) 2015 Linaro Ltd.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.  See
+the COPYING file in the top-level directory.
+
+STATUS: DRAFTING
+
+Introduction
+============
+
+This document outlines the design for multi-threaded TCG system-mode
+emulation. The current user-mode emulation mirrors the thread
+structure of the translated executable.
+
+The original system-mode TCG implementation was single threaded and
+dealt with multiple CPUs by with simple round-robin scheduling. This
+simplified a lot of things but became increasingly limited as systems
+being emulated gained additional cores and per-core performance gains
+for host systems started to level off.
+
+vCPU Scheduling
+===============
+
+We introduce a new running mode where each vCPU will run on its own
+user-space thread. This will be enabled by default for all
+FE/BE combinations that have had the required work done to support
+this safely.
+
+In the general case of running translated code there should be no
+inter-vCPU dependencies and all vCPUs should be able to run at full
+speed. Synchronisation will only be required while accessing internal
+shared data structures or when the emulated architecture requires a
+coherent representation of the emulated machine state.
+
+Shared Data Structures
+======================
+
+Main Run Loop
+-------------
+
+Even when there is no code being generated there are a number of
+structures associated with the hot-path through the main run-loop.
+These are associated with looking up the next translation block to
+execute. These include:
+
+    tb_jmp_cache (per-vCPU, cache of recent jumps)
+    tb_phys_hash (global, phys address->tb lookup)
+
+As TB linking only occurs when blocks are in the same page this code
+is critical to performance as looking up the next TB to execute is the
+most common reason to exit the generated code.
+
+DESIGN REQUIREMENT: Make access to lookup structures safe with
+multiple reader/writer threads. Minimise any lock contention to do it.
+
+Global TCG State
+----------------
+
+We need to protect the entire code generation cycle including any post
+generation patching of the translated code. This also implies a shared
+translation buffer which contains code running on all cores. Any
+execution path that comes to the main run loop will need to hold a
+mutex for code generation. This also includes times when we need flush
+code or entries from any shared lookups/caches. Structures held on a
+per-vCPU basis won't need locking unless other vCPUs will need to
+modify them.
+
+DESIGN REQUIREMENT: Add locking around all code generation and TB
+patching. If possible make shared lookup/caches able to handle multiple
+readers without locks otherwise protect them with locks as well.
+
+Translation Blocks
+------------------
+
+Currently the whole system shares a single code generation buffer
+which when full will force a flush of all translations and start from
+scratch again.
+
+Once a basic block has been translated it will continue to be used
+until it is invalidated. These invalidation events are typically due
+a change to the state of a physical page:
+  - code modification (self modify code, patching code)
+  - page changes (new mapping to physical page)
+  - debugging operations (breakpoint insertion/removal)
+
+There exist several places reference to TBs exist which need to be
+cleared in a safe way.
+
+The main reference is a global page table (l1_map) which provides a 2
+level look-up for PageDesc structures which contain pointers to the
+start of a linked list of all Translation Blocks in that page (see
+page_next).
+
+When a block is invalidated any blocks which directly jump to it need
+to have those jumps removed. This requires navigating the tb_jump_list
+linked list as well as patching the jump code in a safe way.
+
+Finally there are a number of look-up mechanisms for accelerating
+lookup of the next TB. These cache and hashed tables need to have
+references removed in a safe way.
+
+DESIGN REQUIREMENT: Safely handle invalidation of TBs
+                      - safely patch direct jumps
+                      - remove central PageDesc lookup entries
+                      - ensure lookup caches/hashes are safely updated
+
+Memory maps and TLBs
+--------------------
+
+The memory handling code is fairly critical to the speed of memory
+access in the emulated system. The SoftMMU code is designed so the
+hot-path can be handled entirely within translated code. This is
+handled with a per-vCPU TLB structure which once populated will allow
+a series of accesses to the page to occur without exiting the
+translated code. It is possible to set flags in the TLB address which
+will ensure the slow-path is taken for each access. This can be done
+to support:
+
+  - Memory regions (dividing up access to PIO, MMIO and RAM)
+  - Dirty page tracking (for code gen, migration and display)
+  - Virtual TLB (for translating guest address->real address)
+
+When the TLB tables are updated we need to ensure they are done in a
+safe way by bringing all executing threads to a halt before making the
+modifications.
+
+DESIGN REQUIREMENTS:
+
+  - TLB Flush All/Page
+    - can be across-CPUs
+    - will need all other CPUs brought to a halt
+  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
+    - This is a per-CPU table - by definition can't race
+    - updated by its own thread when the slow-path is forced
+
+Emulated hardware state
+-----------------------
+
+Currently thanks to KVM work any access to IO memory is automatically
+protected by the global iothread mutex. Any IO region that doesn't use
+global mutex is expected to do its own locking.
+
+Memory Consistency
+==================
+
+Between emulated guests and host systems there are a range of memory
+consistency models. Even emulating weakly ordered systems on strongly
+ordered hosts needs to ensure things like store-after-load re-ordering
+can be prevented when the guest wants to.
+
+Memory Barriers
+---------------
+
+Barriers (sometimes known as fences) provide a mechanism for software
+to enforce a particular ordering of memory operations from the point
+of view of external observers (e.g. another processor core). They can
+apply to any memory operations as well as just loads or stores.
+
+The Linux kernel has an excellent write-up on the various forms of
+memory barrier and the guarantees they can provide [1].
+
+Barriers are often wrapped around synchronisation primitives to
+provide explicit memory ordering semantics. However they can be used
+by themselves to provide safe lockless access by ensuring for example
+a signal flag will always be set after a payload.
+
+DESIGN REQUIREMENT: Add a new tcg_memory_barrier op
+
+This would enforce a strong load/store ordering so all loads/stores
+complete at the memory barrier. On single-core non-SMP strongly
+ordered backends this could become a NOP.
+
+There may be a case for further refinement if this causes performance
+bottlenecks.
+
+Memory Control and Maintenance
+------------------------------
+
+This includes a class of instructions for controlling system cache
+behaviour. While QEMU doesn't model cache behaviour these instructions
+are often seen when code modification has taken place to ensure the
+changes take effect.
+
+Synchronisation Primitives
+--------------------------
+
+There are two broad types of synchronisation primitives found in
+modern ISAs: atomic instructions and exclusive regions.
+
+The first type offer a simple atomic instruction which will guarantee
+some sort of test and conditional store will be truly atomic w.r.t.
+other cores sharing access to the memory. The classic example is the
+x86 cmpxchg instruction.
+
+The second type offer a pair of load/store instructions which offer a
+guarantee that an region of memory has not been touched between the
+load and store instructions. An example of this is ARM's ldrex/strex
+pair where the strex instruction will return a flag indicating a
+successful store only if no other CPU has accessed the memory region
+since the ldrex.
+
+Traditionally TCG has generated a series of operations that work
+because they are within the context of a single translation block so
+will have completed before another CPU is scheduled. However with
+the ability to have multiple threads running to emulate multiple CPUs
+we will need to explicitly expose these semantics.
+
+DESIGN REQUIREMENTS:
+ - atomics
+   - Introduce some atomic TCG ops for the common semantics
+   - The default fallback helper function will use qemu_atomics
+   - Each backend can then add a more efficient implementation
+ - load/store exclusive
+   [AJB:
+        There are currently a number proposals of interest:
+     - Greensocs tweaks to ldst ex (using locks)
+     - Slow-path for atomic instruction translation [2]
+     - Helper-based Atomic Instruction Emulation (AIE) [3]
+    ]
+
+==========
+
+[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt
+[2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
+[3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 04/19] docs: new design document multi-thread-tcg.txt (DRAFTING)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 04/19] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
@ 2016-06-23 21:33   ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-23 21:33 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana

On 03/06/16 23:40, Alex Bennée wrote:
> This is a current DRAFT of a design proposal for upgrading TCG emulation
> to take advantage of modern CPUs by running a thread-per-CPU. The
> document goes through the various areas of the code affected by such a
> change and proposes design requirements for each part of the solution.
>
> It has been written *without* explicit reference to the current ongoing
> efforts to introduce this feature. The hope being we can review and
> discuss the design choices without assuming the current choices taken by
> the implementation are correct.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v1
>   - initial version
> v2
>   - update discussion on locks
>   - bit more detail on vCPU scheduling
>   - explicitly mention Translation Blocks
>   - emulated hardware state already covered by iomutex
>   - a few minor rewords
> v3
>   - mention this covers system-mode
>   - describe main main-loop and lookup hot-path
>   - mention multi-concurrent-reader lookups
>   - enumerate reasons for invalidation
>   - add more details on lookup structures
>   - describe the softmmu hot-path better
>   - mention store-after-load barrier problem
> ---
>  docs/multi-thread-tcg.txt | 225 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 225 insertions(+)
>  create mode 100644 docs/multi-thread-tcg.txt
>
> diff --git a/docs/multi-thread-tcg.txt b/docs/multi-thread-tcg.txt
> new file mode 100644
> index 0000000..5c88c99
> --- /dev/null
> +++ b/docs/multi-thread-tcg.txt
> @@ -0,0 +1,225 @@
> +Copyright (c) 2015 Linaro Ltd.
> +
> +This work is licensed under the terms of the GNU GPL, version 2 or later.  See
> +the COPYING file in the top-level directory.
> +
> +STATUS: DRAFTING
> +
> +Introduction
> +============
> +
> +This document outlines the design for multi-threaded TCG system-mode
> +emulation. The current user-mode emulation mirrors the thread
> +structure of the translated executable.
> +
> +The original system-mode TCG implementation was single threaded and
> +dealt with multiple CPUs by with simple round-robin scheduling. This
> +simplified a lot of things but became increasingly limited as systems
> +being emulated gained additional cores and per-core performance gains
> +for host systems started to level off.
> +
> +vCPU Scheduling
> +===============
> +
> +We introduce a new running mode where each vCPU will run on its own
> +user-space thread. This will be enabled by default for all
> +FE/BE combinations that have had the required work done to support
> +this safely.
> +
> +In the general case of running translated code there should be no
> +inter-vCPU dependencies and all vCPUs should be able to run at full
> +speed. Synchronisation will only be required while accessing internal
> +shared data structures or when the emulated architecture requires a
> +coherent representation of the emulated machine state.
> +
> +Shared Data Structures
> +======================
> +
> +Main Run Loop
> +-------------
> +
> +Even when there is no code being generated there are a number of
> +structures associated with the hot-path through the main run-loop.
> +These are associated with looking up the next translation block to
> +execute. These include:
> +
> +    tb_jmp_cache (per-vCPU, cache of recent jumps)
> +    tb_phys_hash (global, phys address->tb lookup)
> +
> +As TB linking only occurs when blocks are in the same page this code
> +is critical to performance as looking up the next TB to execute is the
> +most common reason to exit the generated code.
> +
> +DESIGN REQUIREMENT: Make access to lookup structures safe with
> +multiple reader/writer threads. Minimise any lock contention to do it.
> +
> +Global TCG State
> +----------------
> +
> +We need to protect the entire code generation cycle including any post
> +generation patching of the translated code. This also implies a shared
> +translation buffer which contains code running on all cores. Any
> +execution path that comes to the main run loop will need to hold a
> +mutex for code generation. This also includes times when we need flush
> +code or entries from any shared lookups/caches. Structures held on a
> +per-vCPU basis won't need locking unless other vCPUs will need to
> +modify them.
> +
> +DESIGN REQUIREMENT: Add locking around all code generation and TB
> +patching. If possible make shared lookup/caches able to handle multiple
> +readers without locks otherwise protect them with locks as well.
> +
> +Translation Blocks
> +------------------
> + 
> +Currently the whole system shares a single code generation buffer
> +which when full will force a flush of all translations and start from
> +scratch again.
> +
> +Once a basic block has been translated it will continue to be used
> +until it is invalidated. These invalidation events are typically due
> +a change to the state of a physical page:
> +  - code modification (self modify code, patching code)
> +  - page changes (new mapping to physical page)

Mapping changes does invalidate translation blocks in user-mode
emulation only. In system-mode emulation, we just do TLB flush and clean
CPU's 'tb_jmp_cache' but don't invalidate any translation block, just
follow tlb_flush(). That is why in system-mode emulation we can't do
direct jumps to another page or to a TB which span a page boundary.

> +  - debugging operations (breakpoint insertion/removal)
> +
> +There exist several places reference to TBs exist which need to be
> +cleared in a safe way.

You mean: "There are several places where references to TBs exist which
..."?

> +
> +The main reference is a global page table (l1_map) which provides a 2
> +level look-up for PageDesc structures which contain pointers to the
> +start of a linked list of all Translation Blocks in that page (see
> +page_next).

Actually, 'l1_map' is multi-level, see the comment above 'V_L2_BITS'
macro definition.

> +
> +When a block is invalidated any blocks which directly jump to it need
> +to have those jumps removed. This requires navigating the tb_jump_list
> +linked list as well as patching the jump code in a safe way.
> +
> +Finally there are a number of look-up mechanisms for accelerating
> +lookup of the next TB. These cache and hashed tables need to have
> +references removed in a safe way.
> +
> +DESIGN REQUIREMENT: Safely handle invalidation of TBs
> +                      - safely patch direct jumps
> +                      - remove central PageDesc lookup entries
> +                      - ensure lookup caches/hashes are safely updated
> +
> +Memory maps and TLBs
> +--------------------
> +
> +The memory handling code is fairly critical to the speed of memory
> +access in the emulated system. The SoftMMU code is designed so the
> +hot-path can be handled entirely within translated code. This is
> +handled with a per-vCPU TLB structure which once populated will allow
> +a series of accesses to the page to occur without exiting the
> +translated code. It is possible to set flags in the TLB address which
> +will ensure the slow-path is taken for each access. This can be done
> +to support:
> +
> +  - Memory regions (dividing up access to PIO, MMIO and RAM)
> +  - Dirty page tracking (for code gen, migration and display)
> +  - Virtual TLB (for translating guest address->real address)
> +
> +When the TLB tables are updated we need to ensure they are done in a
> +safe way by bringing all executing threads to a halt before making the
> +modifications.

Actually, we just need to be thread-safe when modifying vCPU TLB entries
from other thread. If it is possible to do using (relaxed) atomic memory
access, we could obviously benefit from it.

> +
> +DESIGN REQUIREMENTS:
> +
> +  - TLB Flush All/Page
> +    - can be across-CPUs
> +    - will need all other CPUs brought to a halt

Only *cross-CPU* TLB flush *may* need other CPU brought to halt.

> +  - TLB Update (update a CPUTLBEntry, via tlb_set_page_with_attrs)
> +    - This is a per-CPU table - by definition can't race
> +    - updated by its own thread when the slow-path is forced
> +
> +Emulated hardware state
> +-----------------------
> +
> +Currently thanks to KVM work any access to IO memory is automatically
> +protected by the global iothread mutex. Any IO region that doesn't use
> +global mutex is expected to do its own locking.

Worth it mentioning that we're going to push iothread locking down in
TCG as much as possible?

> +
> +Memory Consistency
> +==================
> +
> +Between emulated guests and host systems there are a range of memory
> +consistency models. Even emulating weakly ordered systems on strongly
> +ordered hosts needs to ensure things like store-after-load re-ordering
> +can be prevented when the guest wants to.
> +
> +Memory Barriers
> +---------------
> +
> +Barriers (sometimes known as fences) provide a mechanism for software
> +to enforce a particular ordering of memory operations from the point
> +of view of external observers (e.g. another processor core). They can
> +apply to any memory operations as well as just loads or stores.
> +
> +The Linux kernel has an excellent write-up on the various forms of
> +memory barrier and the guarantees they can provide [1].
> +
> +Barriers are often wrapped around synchronisation primitives to
> +provide explicit memory ordering semantics. However they can be used
> +by themselves to provide safe lockless access by ensuring for example
> +a signal flag will always be set after a payload.
> +
> +DESIGN REQUIREMENT: Add a new tcg_memory_barrier op
> +
> +This would enforce a strong load/store ordering so all loads/stores
> +complete at the memory barrier. On single-core non-SMP strongly
> +ordered backends this could become a NOP.
> +
> +There may be a case for further refinement if this causes performance
> +bottlenecks.

I think it's worth mentioning that aside from explicit standalone memory
barrier instructions there's also implicit memory ordering semantics
which comes with each guest memory access instruction, e.g. relaxed,
acquire/release, sequentially consistent.

> +
> +Memory Control and Maintenance
> +------------------------------
> +
> +This includes a class of instructions for controlling system cache
> +behaviour. While QEMU doesn't model cache behaviour these instructions
> +are often seen when code modification has taken place to ensure the
> +changes take effect.
> +
> +Synchronisation Primitives
> +--------------------------
> +
> +There are two broad types of synchronisation primitives found in
> +modern ISAs: atomic instructions and exclusive regions.
> +
> +The first type offer a simple atomic instruction which will guarantee
> +some sort of test and conditional store will be truly atomic w.r.t.
> +other cores sharing access to the memory. The classic example is the
> +x86 cmpxchg instruction.
> +
> +The second type offer a pair of load/store instructions which offer a
> +guarantee that an region of memory has not been touched between the
> +load and store instructions. An example of this is ARM's ldrex/strex
> +pair where the strex instruction will return a flag indicating a
> +successful store only if no other CPU has accessed the memory region
> +since the ldrex.
> +
> +Traditionally TCG has generated a series of operations that work
> +because they are within the context of a single translation block so
> +will have completed before another CPU is scheduled. However with
> +the ability to have multiple threads running to emulate multiple CPUs
> +we will need to explicitly expose these semantics.
> +
> +DESIGN REQUIREMENTS:
> + - atomics

What "atomics" means here ...

> +   - Introduce some atomic TCG ops for the common semantics
> +   - The default fallback helper function will use qemu_atomics
> +   - Each backend can then add a more efficient implementation
> + - load/store exclusive

... and how they relates to "load/store exclusive"?

> +   [AJB:
> +        There are currently a number proposals of interest:
> +     - Greensocs tweaks to ldst ex (using locks)
> +     - Slow-path for atomic instruction translation [2]
> +     - Helper-based Atomic Instruction Emulation (AIE) [3]
> +    ]

We also have a problem emulating 64-bit guest on 32-bit host: while a
naturally aligned machine-word memory access is usually atomic, a single
64-bit guest memory access instruction is translated into a series of
two 32-bit host memory access instruction. That can (will) break guest
code assumptions about memory access atomicity.

Kind regards,
Sergey

> +
> +==========
> +
> +[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/plain/Documentation/memory-barriers.txt
> +[2] http://thread.gmane.org/gmane.comp.emulators.qemu/334561
> +[3] http://thread.gmane.org/gmane.comp.emulators.qemu/335297

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 05/19] exec: add assert_debug_safe and notes on debug structures
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (3 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 04/19] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-24 15:28   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 06/19] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

The debug structures are different from many of the other shared ones as
they are modified from two places:

  - architectural debug registers
  - the gdb stub

The first is usually in the context of vCPU currently running so
modifications are safe. The second is generally only changed while the
vCPUs are halted and therefore not going to clash.

I fixed the commenting on DEBUG_SUBPAGE when I added the new DEBUG_DEBUG
flag.

KNOWN ISSUES: arm powerctl resets will trigger the asserts in
multi-thread mode with smp > 1 as the reset operation is cross-cpu and
clears all watchpoints.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 exec.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/exec.c b/exec.c
index a3a93ae..b225282 100644
--- a/exec.c
+++ b/exec.c
@@ -25,6 +25,7 @@
 #include "qemu/cutils.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
+#include "qom/cpu.h"
 #include "tcg.h"
 #include "hw/qdev-core.h"
 #if !defined(CONFIG_USER_ONLY)
@@ -62,7 +63,46 @@
 #include "qemu/mmap-alloc.h"
 #endif
 
-//#define DEBUG_SUBPAGE
+/* #define DEBUG_SUBPAGE */
+/* #define DEBUG_DEBUG */
+
+#ifdef DEBUG_DEBUG
+#define CHECK_DEBUG_SAFE 1
+#else
+#define CHECK_DEBUG_SAFE 0
+#endif
+
+/*
+ * Safe access to debugging structures.
+ *
+ * Breakpoints and Watchpoints are kept in the vCPU structures. There
+ * are two ways they are manipulated:
+ *
+ *   - Outside of the context of the vCPU thread (e.g. gdbstub)
+ *   - Inside the context of the vCPU (architectural debug registers)
+ *
+ * In system emulation mode the chance of corruption is usually
+ * mitigated by the fact the vCPUs is usually suspended whenever these
+ * changes are made.
+ *
+ * In user emulation mode it is less clear (XXX: work this out)
+ */
+
+#ifdef CONFIG_SOFTMMU
+#define assert_debug_safe(cpu) do {                                     \
+        if (CHECK_DEBUG_SAFE) {                                         \
+            g_assert(!cpu->created ||                                   \
+                     (cpu_is_stopped(cpu) || cpu == current_cpu));      \
+        }                                                               \
+    } while (0)
+#else
+#define assert_debug_safe(cpu) do {                                     \
+        if (CHECK_DEBUG_SAFE) {                                         \
+            g_assert(false);                                            \
+        }                                                               \
+    } while (0)
+#endif
+
 
 #if !defined(CONFIG_USER_ONLY)
 /* ram_list is read under rcu_read_lock()/rcu_read_unlock().  Writes
@@ -737,6 +777,8 @@ int cpu_watchpoint_insert(CPUState *cpu, vaddr addr, vaddr len,
 {
     CPUWatchpoint *wp;
 
+    assert_debug_safe(cpu);
+
     /* forbid ranges which are empty or run off the end of the address space */
     if (len == 0 || (addr + len - 1) < addr) {
         error_report("tried to set invalid watchpoint at %"
@@ -769,6 +811,8 @@ int cpu_watchpoint_remove(CPUState *cpu, vaddr addr, vaddr len,
 {
     CPUWatchpoint *wp;
 
+    assert_debug_safe(cpu);
+
     QTAILQ_FOREACH(wp, &cpu->watchpoints, entry) {
         if (addr == wp->vaddr && len == wp->len
                 && flags == (wp->flags & ~BP_WATCHPOINT_HIT)) {
@@ -782,6 +826,8 @@ int cpu_watchpoint_remove(CPUState *cpu, vaddr addr, vaddr len,
 /* Remove a specific watchpoint by reference.  */
 void cpu_watchpoint_remove_by_ref(CPUState *cpu, CPUWatchpoint *watchpoint)
 {
+    assert_debug_safe(cpu);
+
     QTAILQ_REMOVE(&cpu->watchpoints, watchpoint, entry);
 
     tlb_flush_page(cpu, watchpoint->vaddr);
@@ -794,6 +840,8 @@ void cpu_watchpoint_remove_all(CPUState *cpu, int mask)
 {
     CPUWatchpoint *wp, *next;
 
+    assert_debug_safe(cpu);
+
     QTAILQ_FOREACH_SAFE(wp, &cpu->watchpoints, entry, next) {
         if (wp->flags & mask) {
             cpu_watchpoint_remove_by_ref(cpu, wp);
@@ -829,6 +877,8 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
 {
     CPUBreakpoint *bp;
 
+    assert_debug_safe(cpu);
+
     bp = g_malloc(sizeof(*bp));
 
     bp->pc = pc;
@@ -849,11 +899,16 @@ int cpu_breakpoint_insert(CPUState *cpu, vaddr pc, int flags,
     return 0;
 }
 
-/* Remove a specific breakpoint.  */
+/* Remove a specific breakpoint.
+ *
+ * See cpu_breakpoint_insert notes for information about locking.
+ */
 int cpu_breakpoint_remove(CPUState *cpu, vaddr pc, int flags)
 {
     CPUBreakpoint *bp;
 
+    assert_debug_safe(cpu);
+
     QTAILQ_FOREACH(bp, &cpu->breakpoints, entry) {
         if (bp->pc == pc && bp->flags == flags) {
             cpu_breakpoint_remove_by_ref(cpu, bp);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 05/19] exec: add assert_debug_safe and notes on debug structures
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 05/19] exec: add assert_debug_safe and notes on debug structures Alex Bennée
@ 2016-06-24 15:28   ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-24 15:28 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> diff --git a/exec.c b/exec.c
> index a3a93ae..b225282 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -25,6 +25,7 @@
>  #include "qemu/cutils.h"
>  #include "cpu.h"
>  #include "exec/exec-all.h"
> +#include "qom/cpu.h"
>  #include "tcg.h"
>  #include "hw/qdev-core.h"
>  #if !defined(CONFIG_USER_ONLY)
> @@ -62,7 +63,46 @@
>  #include "qemu/mmap-alloc.h"
>  #endif
>  
> -//#define DEBUG_SUBPAGE
> +/* #define DEBUG_SUBPAGE */
> +/* #define DEBUG_DEBUG */
> +
> +#ifdef DEBUG_DEBUG
> +#define CHECK_DEBUG_SAFE 1
> +#else
> +#define CHECK_DEBUG_SAFE 0
> +#endif
> +
> +/*
> + * Safe access to debugging structures.
> + *
> + * Breakpoints and Watchpoints are kept in the vCPU structures. There
> + * are two ways they are manipulated:
> + *
> + *   - Outside of the context of the vCPU thread (e.g. gdbstub)
> + *   - Inside the context of the vCPU (architectural debug registers)
> + *
> + * In system emulation mode the chance of corruption is usually
> + * mitigated by the fact the vCPUs is usually suspended whenever these
> + * changes are made.
> + *
> + * In user emulation mode it is less clear (XXX: work this out)
> + */
> +
> +#ifdef CONFIG_SOFTMMU
> +#define assert_debug_safe(cpu) do {                                     \
> +        if (CHECK_DEBUG_SAFE) {                                         \
> +            g_assert(!cpu->created ||                                   \
> +                     (cpu_is_stopped(cpu) || cpu == current_cpu));      \

There's no need in parentheses around "cpu_is_stopped(cpu) || cpu ==
current_cpu".

> +        }                                                               \
> +    } while (0)
> +#else
> +#define assert_debug_safe(cpu) do {                                     \
> +        if (CHECK_DEBUG_SAFE) {                                         \
> +            g_assert(false);                                            \
> +        }                                                               \

Can be simply:

#define assert_debug_safe(cpu)

Kind regards,
Sergey

> +    } while (0)
> +#endif
> +
>  
>  #if !defined(CONFIG_USER_ONLY)
>  /* ram_list is read under rcu_read_lock()/rcu_read_unlock().  Writes
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 06/19] tcg: comment on which functions have to be called with tb_lock held
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (4 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 05/19] exec: add assert_debug_safe and notes on debug structures Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-24 15:38   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 07/19] translate-all: Add assert_memory_lock annotations Alex Bennée
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

From: Paolo Bonzini <pbonzini@redhat.com>

softmmu requires more functions to be thread-safe, because translation
blocks can be invalidated from e.g. notdirty callbacks.  Probably the
same holds for user-mode emulation, it's just that no one has ever
tried to produce a coherent locking there.

This patch will guide the introduction of more tb_lock and tb_unlock
calls for system emulation.

Note that after this patch some (most) of the mentioned functions are
still called outside tb_lock/tb_unlock.  The next one will rectify this.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1(ajb):
  - just s-o-b
v2
  - clarify write lock on tb_jump_cache
v3
  - drop RCU comment for debug stuff (separate commit now)
---
 include/exec/exec-all.h |  1 +
 include/qom/cpu.h       |  3 +++
 tcg/tcg.h               |  2 ++
 translate-all.c         | 32 ++++++++++++++++++++++++++------
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ea1c925..e30f02b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -313,6 +313,7 @@ static inline void tb_set_jmp_target(TranslationBlock *tb,
 
 #endif
 
+/* Called with tb_lock held.  */
 static inline void tb_add_jump(TranslationBlock *tb, int n,
                                TranslationBlock *tb_next)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index c9ba16c..b82625f 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -313,7 +313,10 @@ struct CPUState {
     MemoryRegion *memory;
 
     void *env_ptr; /* CPUArchState */
+
+    /* Writes protected by tb_lock, reads not thread-safe  */
     struct TranslationBlock *tb_jmp_cache[TB_JMP_CACHE_SIZE];
+
     struct GDBRegisterState *gdb_regs;
     int gdb_num_regs;
     int gdb_num_g_regs;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 909db3f..db6a062 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -638,6 +638,7 @@ static inline bool tcg_op_buf_full(void)
 
 /* pool based memory allocation */
 
+/* tb_lock must be held for tcg_malloc_internal. */
 void *tcg_malloc_internal(TCGContext *s, int size);
 void tcg_pool_reset(TCGContext *s);
 void tcg_pool_delete(TCGContext *s);
@@ -646,6 +647,7 @@ void tb_lock(void);
 void tb_unlock(void);
 void tb_lock_reset(void);
 
+/* Called with tb_lock held.  */
 static inline void *tcg_malloc(int size)
 {
     TCGContext *s = &tcg_ctx;
diff --git a/translate-all.c b/translate-all.c
index e3f44d9..8b162ff 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -290,7 +290,9 @@ static int encode_search(TranslationBlock *tb, uint8_t *block)
     return p - block;
 }
 
-/* The cpu state corresponding to 'searched_pc' is restored.  */
+/* The cpu state corresponding to 'searched_pc' is restored.
+ * Called with tb_lock held.
+ */
 static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
                                      uintptr_t searched_pc)
 {
@@ -347,8 +349,10 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
         cpu_restore_state_from_tb(cpu, tb, retaddr);
         if (tb->cflags & CF_NOCACHE) {
             /* one-shot translation, invalidate it immediately */
+            tb_lock();
             tb_phys_invalidate(tb, -1);
             tb_free(tb);
+            tb_unlock();
         }
         return true;
     }
@@ -440,6 +444,7 @@ static void page_init(void)
 }
 
 /* If alloc=1:
+ * Called with tb_lock held for system emulation.
  * Called with mmap_lock held for user-mode emulation.
  */
 static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
@@ -804,8 +809,12 @@ bool tcg_enabled(void)
     return tcg_ctx.code_gen_buffer != NULL;
 }
 
-/* Allocate a new translation block. Flush the translation buffer if
-   too many translation blocks or too much generated code. */
+/*
+ * Allocate a new translation block. Flush the translation buffer if
+ * too many translation blocks or too much generated code.
+ *
+ * Called with tb_lock held.
+ */
 static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
@@ -819,6 +828,7 @@ static TranslationBlock *tb_alloc(target_ulong pc)
     return tb;
 }
 
+/* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
     /* In practice this is mostly used for single use temporary TB
@@ -918,7 +928,11 @@ do_tb_invalidate_check(struct qht *ht, void *p, uint32_t hash, void *userp)
     }
 }
 
-static void tb_invalidate_check(target_ulong address)
+/* verify that all the pages have correct rights for code
+ *
+ * Called with tb_lock held.
+ */
+static void tb_page_check(void)
 {
     address &= TARGET_PAGE_MASK;
     qht_iter(&tcg_ctx.tb_ctx.htable, do_tb_invalidate_check, &address);
@@ -1022,7 +1036,10 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
     }
 }
 
-/* invalidate one TB */
+/* invalidate one TB
+ *
+ * Called with tb_lock held.
+ */
 void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
 {
     CPUState *cpu;
@@ -1453,7 +1470,9 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
     }
     if (!p->code_bitmap &&
         ++p->code_write_count >= SMC_BITMAP_USE_THRESHOLD) {
-        /* build code bitmap */
+        /* build code bitmap.  FIXME: writes should be protected by
+         * tb_lock, reads by tb_lock or RCU.
+         */
         build_page_bitmap(p);
     }
     if (p->code_bitmap) {
@@ -1593,6 +1612,7 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
 }
 #endif /* !defined(CONFIG_USER_ONLY) */
 
+/* Called with tb_lock held.  */
 void tb_check_watchpoint(CPUState *cpu)
 {
     TranslationBlock *tb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 06/19] tcg: comment on which functions have to be called with tb_lock held
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 06/19] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-06-24 15:38   ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-24 15:38 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> diff --git a/translate-all.c b/translate-all.c
> index e3f44d9..8b162ff 100644
> --- a/translate-all.c
> +++ b/translate-all.c
(snip)
> @@ -347,8 +349,10 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr)
>          cpu_restore_state_from_tb(cpu, tb, retaddr);
>          if (tb->cflags & CF_NOCACHE) {
>              /* one-shot translation, invalidate it immediately */
> +            tb_lock();
>              tb_phys_invalidate(tb, -1);
>              tb_free(tb);
> +            tb_unlock();

Looks like this belongs to another patch ;-)

>          }
>          return true;
>      }
>

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 07/19] translate-all: Add assert_memory_lock annotations
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (5 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 06/19] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-24 15:48   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 08/19] tcg: protect TBContext with tb_lock Alex Bennée
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Riku Voipio, Peter Crosthwaite

This adds calls to the assert_memory_lock for all public APIs which are
documented as holding the mmap_lock for user-mode.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 linux-user/elfload.c |  4 ++++
 translate-all.c      | 20 ++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index bb2558f..f72c275 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -1839,6 +1839,8 @@ static void load_elf_image(const char *image_name, int image_fd,
     info->pt_dynamic_addr = 0;
 #endif
 
+    mmap_lock();
+
     /* Find the maximum size of the image and allocate an appropriate
        amount of memory to handle that.  */
     loaddr = -1, hiaddr = 0;
@@ -1999,6 +2001,8 @@ static void load_elf_image(const char *image_name, int image_fd,
         load_symbols(ehdr, image_fd, load_bias);
     }
 
+    mmap_unlock();
+
     close(image_fd);
     return;
 
diff --git a/translate-all.c b/translate-all.c
index 8b162ff..aba6cb6 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -453,6 +453,10 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
     void **lp;
     int i;
 
+    if (alloc) {
+        assert_memory_lock();
+    }
+
     /* Level 1.  Always allocated.  */
     lp = l1_map + ((index >> V_L1_SHIFT) & (V_L1_SIZE - 1));
 
@@ -819,6 +823,8 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 {
     TranslationBlock *tb;
 
+    assert_tb_lock();
+
     if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
         return NULL;
     }
@@ -831,6 +837,8 @@ static TranslationBlock *tb_alloc(target_ulong pc)
 /* Called with tb_lock held.  */
 void tb_free(TranslationBlock *tb)
 {
+    assert_tb_lock();
+
     /* In practice this is mostly used for single use temporary TB
        Ignore the hard cases and just back up if this TB happens to
        be the last one generated.  */
@@ -1047,6 +1055,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
     uint32_t h;
     tb_page_addr_t phys_pc;
 
+    assert_tb_lock();
+
     /* remove the TB from the hash list */
     phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
     h = tb_hash_func(phys_pc, tb->pc, tb->flags);
@@ -1125,6 +1135,8 @@ static inline void tb_alloc_page(TranslationBlock *tb,
     bool page_already_protected;
 #endif
 
+    assert_memory_lock();
+
     tb->page_addr[n] = page_addr;
     p = page_find_alloc(page_addr >> TARGET_PAGE_BITS, 1);
     tb->page_next[n] = p->first_tb;
@@ -1181,6 +1193,8 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
 {
     uint32_t h;
 
+    assert_memory_lock();
+
     /* add in the hash table */
     h = tb_hash_func(phys_pc, tb->pc, tb->flags);
     qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
@@ -1212,6 +1226,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
 #ifdef CONFIG_PROFILER
     int64_t ti;
 #endif
+    assert_memory_lock();
 
     phys_pc = get_page_addr_code(env, pc);
     if (use_icount && !(cflags & CF_IGNORE_ICOUNT)) {
@@ -1339,6 +1354,8 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  */
 void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
 {
+    assert_memory_lock();
+
     while (start < end) {
         tb_invalidate_phys_page_range(start, end, 0);
         start &= TARGET_PAGE_MASK;
@@ -1375,6 +1392,8 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     uint32_t current_flags = 0;
 #endif /* TARGET_HAS_PRECISE_SMC */
 
+    assert_memory_lock();
+
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
         return;
@@ -1972,6 +1991,7 @@ void page_set_flags(target_ulong start, target_ulong end, int flags)
     assert(end < ((target_ulong)1 << L1_MAP_ADDR_SPACE_BITS));
 #endif
     assert(start < end);
+    assert_memory_lock();
 
     start = start & TARGET_PAGE_MASK;
     end = TARGET_PAGE_ALIGN(end);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 07/19] translate-all: Add assert_memory_lock annotations
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 07/19] translate-all: Add assert_memory_lock annotations Alex Bennée
@ 2016-06-24 15:48   ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-24 15:48 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Riku Voipio, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> This adds calls to the assert_memory_lock for all public APIs which are
> documented as holding the mmap_lock for user-mode.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  linux-user/elfload.c |  4 ++++
>  translate-all.c      | 20 ++++++++++++++++++++
>  2 files changed, 24 insertions(+)
>
> diff --git a/linux-user/elfload.c b/linux-user/elfload.c
> index bb2558f..f72c275 100644
> --- a/linux-user/elfload.c
> +++ b/linux-user/elfload.c
> @@ -1839,6 +1839,8 @@ static void load_elf_image(const char *image_name, int image_fd,
>      info->pt_dynamic_addr = 0;
>  #endif
>  
> +    mmap_lock();
> +
>      /* Find the maximum size of the image and allocate an appropriate
>         amount of memory to handle that.  */
>      loaddr = -1, hiaddr = 0;
> @@ -1999,6 +2001,8 @@ static void load_elf_image(const char *image_name, int image_fd,
>          load_symbols(ehdr, image_fd, load_bias);
>      }
>  
> +    mmap_unlock();
> +

This change is out of scope of this patch, I think.

>      close(image_fd);
>      return;
>  
> diff --git a/translate-all.c b/translate-all.c
> index 8b162ff..aba6cb6 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -453,6 +453,10 @@ static PageDesc *page_find_alloc(tb_page_addr_t index, int alloc)
>      void **lp;
>      int i;
>  
> +    if (alloc) {
> +        assert_memory_lock();
> +    }
> +
>      /* Level 1.  Always allocated.  */
>      lp = l1_map + ((index >> V_L1_SHIFT) & (V_L1_SIZE - 1));
>  
> @@ -819,6 +823,8 @@ static TranslationBlock *tb_alloc(target_ulong pc)
>  {
>      TranslationBlock *tb;
>  
> +    assert_tb_lock();
> +

Hmm, the patch subject doesn't mention tb_lock...

>      if (tcg_ctx.tb_ctx.nb_tbs >= tcg_ctx.code_gen_max_blocks) {
>          return NULL;
>      }

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 08/19] tcg: protect TBContext with tb_lock.
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (6 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 07/19] translate-all: Add assert_memory_lock annotations Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-07-01 23:40   ` Richard Henderson
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 09/19] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite,
	Eduardo Habkost, Michael S. Tsirkin

From: KONRAD Frederic <fred.konrad@greensocs.com>

This protects TBContext with tb_lock to make tb_* thread safe.

We can still have issue with tb_flush in case of multithread TCG:
another CPU can be executing code during a flush.

This can be fixed later by making all other TCG thread exiting before calling
tb_flush().

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: moved into tree, clean-up history]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v3 (base-patches, ajb):
  - more explicit comments on resetting tb_lock
  - more explicit comments about thread safety of user-mode tb_flush
v2 (base-patches, ajb):
  - re-base fixes
v7 (FK, MTTCG):
  - Drop a tb_lock in already locked restore_state_to_opc.
v6 (FK, MTTCG):
  - Drop a tb_lock arround tb_find_fast in cpu-exec.c.
---
 cpu-exec.c         |  6 ++++++
 exec.c             |  6 ++++++
 hw/i386/kvmvapic.c |  4 ++++
 translate-all.c    | 35 +++++++++++++++++++++++++++++------
 4 files changed, 45 insertions(+), 6 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b840e1d..ae81e8e 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -212,16 +212,22 @@ static void cpu_exec_nocache(CPUState *cpu, int max_cycles,
 
     old_tb_flushed = cpu->tb_flushed;
     cpu->tb_flushed = false;
+    tb_lock();
     tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
                      max_cycles | CF_NOCACHE
                          | (ignore_icount ? CF_IGNORE_ICOUNT : 0));
     tb->orig_tb = cpu->tb_flushed ? NULL : orig_tb;
     cpu->tb_flushed |= old_tb_flushed;
+
+    tb_unlock();
     /* execute the generated code */
     trace_exec_tb_nocache(tb, tb->pc);
     cpu_tb_exec(cpu, tb);
+    tb_lock();
+
     tb_phys_invalidate(tb, -1);
     tb_free(tb);
+    tb_unlock();
 }
 #endif
 
diff --git a/exec.c b/exec.c
index b225282..e23039c 100644
--- a/exec.c
+++ b/exec.c
@@ -2148,6 +2148,12 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                     continue;
                 }
                 cpu->watchpoint_hit = wp;
+
+                /* The tb_lock will be reset when cpu_loop_exit or
+                 * cpu_resume_from_signal longjmp back into the cpu_exec
+                 * main loop.
+                 */
+                tb_lock();
                 tb_check_watchpoint(cpu);
                 if (wp->flags & BP_STOP_BEFORE_ACCESS) {
                     cpu->exception_index = EXCP_DEBUG;
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index 5b71b1b..d98fe2a 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -17,6 +17,7 @@
 #include "sysemu/kvm.h"
 #include "hw/i386/apic_internal.h"
 #include "hw/sysbus.h"
+#include "tcg/tcg.h"
 
 #define VAPIC_IO_PORT           0x7e
 
@@ -449,6 +450,9 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
+        /* tb_lock will be reset when cpu_resume_from_signal longjmps
+         * back into the cpu_exec loop. */
+        tb_lock();
         tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
         cpu_resume_from_signal(cs, NULL);
     }
diff --git a/translate-all.c b/translate-all.c
index aba6cb6..f54ab3e 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -891,8 +891,13 @@ static void page_flush_tb(void)
     }
 }
 
-/* flush all the translation blocks */
-/* XXX: tb_flush is currently not thread safe */
+/* Flush all the translation blocks:
+ *
+ * System emulation calls it only with tb_lock taken or from
+ * safe_work, so no need to take tb_lock here.
+ *
+ * User-mode tb_flush is currently not thread safe (FIXME).
+ */
 void tb_flush(CPUState *cpu)
 {
 #if defined(DEBUG_TB_FLUSH)
@@ -1407,6 +1412,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
+    tb_lock();
     tb = p->first_tb;
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
@@ -1466,6 +1472,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
         cpu_resume_from_signal(cpu, NULL);
     }
 #endif
+    tb_unlock();
 }
 
 #ifdef CONFIG_SOFTMMU
@@ -1532,6 +1539,8 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     if (!p) {
         return;
     }
+
+    tb_lock();
     tb = p->first_tb;
 #ifdef TARGET_HAS_PRECISE_SMC
     if (tb && pc != 0) {
@@ -1572,9 +1581,13 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
         if (locked) {
             mmap_unlock();
         }
+
+        /* tb_lock will be reset after cpu_resume_from_signal longjmps
+         * back into the cpu_exec loop. */
         cpu_resume_from_signal(cpu, puc);
     }
 #endif
+    tb_unlock();
 }
 #endif
 
@@ -1668,6 +1681,7 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     target_ulong pc, cs_base;
     uint32_t flags;
 
+    tb_lock();
     tb = tb_find_pc(retaddr);
     if (!tb) {
         cpu_abort(cpu, "cpu_io_recompile: could not find TB for pc=%p",
@@ -1719,11 +1733,16 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
     /* FIXME: In theory this could raise an exception.  In practice
        we have already translated the block once so it's probably ok.  */
     tb_gen_code(cpu, pc, cs_base, flags, cflags);
+
     /* TODO: If env->pc != tb->pc (i.e. the faulting instruction was not
-       the first in the TB) then we end up generating a whole new TB and
-       repeating the fault, which is horribly inefficient.
-       Better would be to execute just this insn uncached, or generate a
-       second new TB.  */
+     * the first in the TB) then we end up generating a whole new TB and
+     * repeating the fault, which is horribly inefficient.
+     * Better would be to execute just this insn uncached, or generate a
+     * second new TB.
+     *
+     * cpu_resume_from_signal will longjmp back to cpu_exec where the
+     * tb_lock gets reset.
+     */
     cpu_resume_from_signal(cpu, NULL);
 }
 
@@ -1752,6 +1771,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
     size_t hgram_bins;
     char *hgram;
 
+    tb_lock();
+
     target_code_size = 0;
     max_target_code_size = 0;
     cross_page = 0;
@@ -1839,6 +1860,8 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf)
             tcg_ctx.tb_ctx.tb_phys_invalidate_count);
     cpu_fprintf(f, "TLB flush count     %d\n", tlb_flush_count);
     tcg_dump_info(f, cpu_fprintf);
+
+    tb_unlock();
 }
 
 void dump_opcount_info(FILE *f, fprintf_function cpu_fprintf)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 08/19] tcg: protect TBContext with tb_lock.
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 08/19] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-07-01 23:40   ` Richard Henderson
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2016-07-01 23:40 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite, Eduardo Habkost, Michael S. Tsirkin

On 06/03/2016 01:40 PM, Alex Bennée wrote:
> From: KONRAD Frederic <fred.konrad@greensocs.com>
>
> This protects TBContext with tb_lock to make tb_* thread safe.
>
> We can still have issue with tb_flush in case of multithread TCG:
> another CPU can be executing code during a flush.
>
> This can be fixed later by making all other TCG thread exiting before calling
> tb_flush().
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> Message-Id: <1439220437-23957-8-git-send-email-fred.konrad@greensocs.com>
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [AJB: moved into tree, clean-up history]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 09/19] target-arm/arm-powerctl: wake up sleeping CPUs
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (7 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 08/19] tcg: protect TBContext with tb_lock Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all() Alex Bennée
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Alexander Spyridakis,
	open list:ARM

Testing with Alexander's bare metal syncronisation tests fails in MTTCG
leaving one CPU spinning forever waiting for the second CPU to wake up.
We simply need to kick the vCPU once we have processed the PSCI power on
call.

As the power control API is for system emulation only as is the
qemu_kick_cpu function we also ensure we only build arm-powerctl for
SoftMMU builds.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
CC: Alexander Spyridakis <a.spyridakis@virtualopensystems.com>
Message-Id: <1439220437-23957-20-git-send-email-fred.konrad@greensocs.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

---
v3
  - re-base caught arm_powerctl re-factor
  - include cpu.h header for kick definition
  - fix Makefile.objs to only build for softmmu
---
 target-arm/Makefile.objs  | 2 +-
 target-arm/arm-powerctl.c | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs
index f206411..847fb52 100644
--- a/target-arm/Makefile.objs
+++ b/target-arm/Makefile.objs
@@ -9,4 +9,4 @@ obj-y += neon_helper.o iwmmxt_helper.o
 obj-y += gdbstub.o
 obj-$(TARGET_AARCH64) += cpu64.o translate-a64.o helper-a64.o gdbstub64.o
 obj-y += crypto_helper.o
-obj-y += arm-powerctl.o
+obj-$(CONFIG_SOFTMMU) += arm-powerctl.o
diff --git a/target-arm/arm-powerctl.c b/target-arm/arm-powerctl.c
index d452230..4baa051 100644
--- a/target-arm/arm-powerctl.c
+++ b/target-arm/arm-powerctl.c
@@ -166,6 +166,8 @@ int arm_set_cpu_on(uint64_t cpuid, uint64_t entry, uint64_t context_id,
     /* Start the new CPU at the requested address */
     cpu_set_pc(target_cpu_state, entry);
 
+    qemu_cpu_kick(target_cpu_state);
+
     /* We are good to go */
     return QEMU_ARM_POWERCTL_RET_SUCCESS;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all()
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (8 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 09/19] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-24 17:09   ` Sergey Fedorov
  2016-07-01 23:50   ` Richard Henderson
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG Alex Bennée
                   ` (9 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

In preparation for multi-threaded TCG we remove tcg_exec_all and move
all the CPU cycling into the main thread function. When MTTCG is enabled
we shall use a separate thread function which only handles one vCPU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - update timer calls to new API on rebase
v3
  - move tcg_cpu_exec above thread function, drop static fwd declaration
---
 cpus.c | 185 +++++++++++++++++++++++++++++++----------------------------------
 1 file changed, 89 insertions(+), 96 deletions(-)

diff --git a/cpus.c b/cpus.c
index c404dd7..4cc2ce6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -69,7 +69,6 @@
 
 #endif /* CONFIG_LINUX */
 
-static CPUState *next_cpu;
 int64_t max_delay;
 int64_t max_advance;
 
@@ -1119,7 +1118,67 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
 #endif
 }
 
-static void tcg_exec_all(void);
+static int64_t tcg_get_icount_limit(void)
+{
+    int64_t deadline;
+
+    if (replay_mode != REPLAY_MODE_PLAY) {
+        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+        /* Maintain prior (possibly buggy) behaviour where if no deadline
+         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
+         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
+         * nanoseconds.
+         */
+        if ((deadline < 0) || (deadline > INT32_MAX)) {
+            deadline = INT32_MAX;
+        }
+
+        return qemu_icount_round(deadline);
+    } else {
+        return replay_get_instructions();
+    }
+}
+
+static int tcg_cpu_exec(CPUState *cpu)
+{
+    int ret;
+#ifdef CONFIG_PROFILER
+    int64_t ti;
+#endif
+
+#ifdef CONFIG_PROFILER
+    ti = profile_getclock();
+#endif
+    if (use_icount) {
+        int64_t count;
+        int decr;
+        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+                                    + cpu->icount_extra);
+        cpu->icount_decr.u16.low = 0;
+        cpu->icount_extra = 0;
+        count = tcg_get_icount_limit();
+        timers_state.qemu_icount += count;
+        decr = (count > 0xffff) ? 0xffff : count;
+        count -= decr;
+        cpu->icount_decr.u16.low = decr;
+        cpu->icount_extra = count;
+    }
+    ret = cpu_exec(cpu);
+#ifdef CONFIG_PROFILER
+    tcg_time += profile_getclock() - ti;
+#endif
+    if (use_icount) {
+        /* Fold pending instructions back into the
+           instruction counter, and clear the interrupt flag.  */
+        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
+                        + cpu->icount_extra);
+        cpu->icount_decr.u32 = 0;
+        cpu->icount_extra = 0;
+        replay_account_executed_instructions();
+    }
+    return ret;
+}
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
@@ -1150,8 +1209,35 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
+    cpu = first_cpu;
+
     while (1) {
-        tcg_exec_all();
+        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
+        qemu_account_warp_timer();
+
+        if (!cpu) {
+            cpu = first_cpu;
+        }
+
+        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+
+            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
+                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
+
+            if (cpu_can_run(cpu)) {
+                int r = tcg_cpu_exec(cpu);
+                if (r == EXCP_DEBUG) {
+                    cpu_handle_guest_debug(cpu);
+                    break;
+                }
+            } else if (cpu->stop || cpu->stopped) {
+                break;
+            }
+
+        } /* for cpu.. */
+
+        /* Pairs with smp_wmb in qemu_cpu_kick.  */
+        atomic_mb_set(&exit_request, 0);
 
         if (use_icount) {
             int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
@@ -1448,99 +1534,6 @@ int vm_stop_force_state(RunState state)
     }
 }
 
-static int64_t tcg_get_icount_limit(void)
-{
-    int64_t deadline;
-
-    if (replay_mode != REPLAY_MODE_PLAY) {
-        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
-
-        /* Maintain prior (possibly buggy) behaviour where if no deadline
-         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
-         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
-         * nanoseconds.
-         */
-        if ((deadline < 0) || (deadline > INT32_MAX)) {
-            deadline = INT32_MAX;
-        }
-
-        return qemu_icount_round(deadline);
-    } else {
-        return replay_get_instructions();
-    }
-}
-
-static int tcg_cpu_exec(CPUState *cpu)
-{
-    int ret;
-#ifdef CONFIG_PROFILER
-    int64_t ti;
-#endif
-
-#ifdef CONFIG_PROFILER
-    ti = profile_getclock();
-#endif
-    if (use_icount) {
-        int64_t count;
-        int decr;
-        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
-                                    + cpu->icount_extra);
-        cpu->icount_decr.u16.low = 0;
-        cpu->icount_extra = 0;
-        count = tcg_get_icount_limit();
-        timers_state.qemu_icount += count;
-        decr = (count > 0xffff) ? 0xffff : count;
-        count -= decr;
-        cpu->icount_decr.u16.low = decr;
-        cpu->icount_extra = count;
-    }
-    ret = cpu_exec(cpu);
-#ifdef CONFIG_PROFILER
-    tcg_time += profile_getclock() - ti;
-#endif
-    if (use_icount) {
-        /* Fold pending instructions back into the
-           instruction counter, and clear the interrupt flag.  */
-        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
-                        + cpu->icount_extra);
-        cpu->icount_decr.u32 = 0;
-        cpu->icount_extra = 0;
-        replay_account_executed_instructions();
-    }
-    return ret;
-}
-
-static void tcg_exec_all(void)
-{
-    int r;
-
-    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
-    qemu_account_warp_timer();
-
-    if (next_cpu == NULL) {
-        next_cpu = first_cpu;
-    }
-    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
-        CPUState *cpu = next_cpu;
-
-        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
-                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
-
-        if (cpu_can_run(cpu)) {
-            r = tcg_cpu_exec(cpu);
-            if (r == EXCP_DEBUG) {
-                cpu_handle_guest_debug(cpu);
-                break;
-            }
-        } else if (cpu->stop || cpu->stopped) {
-            break;
-        }
-    }
-
-    /* Pairs with smp_wmb in qemu_cpu_kick.  */
-    atomic_mb_set(&exit_request, 0);
-}
-
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
 {
     /* XXX: implement xxx_cpu_list for targets that still miss it */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all()
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all() Alex Bennée
@ 2016-06-24 17:09   ` Sergey Fedorov
  2016-07-01 23:50   ` Richard Henderson
  1 sibling, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-24 17:09 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> In preparation for multi-threaded TCG we remove tcg_exec_all and move
> all the CPU cycling into the main thread function. When MTTCG is enabled
> we shall use a separate thread function which only handles one vCPU.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>

>
> ---
> v2
>   - update timer calls to new API on rebase
> v3
>   - move tcg_cpu_exec above thread function, drop static fwd declaration
> ---
>  cpus.c | 185 +++++++++++++++++++++++++++++++----------------------------------
>  1 file changed, 89 insertions(+), 96 deletions(-)
>
> diff --git a/cpus.c b/cpus.c
> index c404dd7..4cc2ce6 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -69,7 +69,6 @@
>  
>  #endif /* CONFIG_LINUX */
>  
> -static CPUState *next_cpu;
>  int64_t max_delay;
>  int64_t max_advance;
>  
> @@ -1119,7 +1118,67 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
>  #endif
>  }
>  
> -static void tcg_exec_all(void);
> +static int64_t tcg_get_icount_limit(void)
> +{
> +    int64_t deadline;
> +
> +    if (replay_mode != REPLAY_MODE_PLAY) {
> +        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> +
> +        /* Maintain prior (possibly buggy) behaviour where if no deadline
> +         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
> +         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
> +         * nanoseconds.
> +         */
> +        if ((deadline < 0) || (deadline > INT32_MAX)) {
> +            deadline = INT32_MAX;
> +        }
> +
> +        return qemu_icount_round(deadline);
> +    } else {
> +        return replay_get_instructions();
> +    }
> +}
> +
> +static int tcg_cpu_exec(CPUState *cpu)
> +{
> +    int ret;
> +#ifdef CONFIG_PROFILER
> +    int64_t ti;
> +#endif
> +
> +#ifdef CONFIG_PROFILER
> +    ti = profile_getclock();
> +#endif
> +    if (use_icount) {
> +        int64_t count;
> +        int decr;
> +        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
> +                                    + cpu->icount_extra);
> +        cpu->icount_decr.u16.low = 0;
> +        cpu->icount_extra = 0;
> +        count = tcg_get_icount_limit();
> +        timers_state.qemu_icount += count;
> +        decr = (count > 0xffff) ? 0xffff : count;
> +        count -= decr;
> +        cpu->icount_decr.u16.low = decr;
> +        cpu->icount_extra = count;
> +    }
> +    ret = cpu_exec(cpu);
> +#ifdef CONFIG_PROFILER
> +    tcg_time += profile_getclock() - ti;
> +#endif
> +    if (use_icount) {
> +        /* Fold pending instructions back into the
> +           instruction counter, and clear the interrupt flag.  */
> +        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
> +                        + cpu->icount_extra);
> +        cpu->icount_decr.u32 = 0;
> +        cpu->icount_extra = 0;
> +        replay_account_executed_instructions();
> +    }
> +    return ret;
> +}
>  
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
> @@ -1150,8 +1209,35 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      /* process any pending work */
>      atomic_mb_set(&exit_request, 1);
>  
> +    cpu = first_cpu;
> +
>      while (1) {
> -        tcg_exec_all();
> +        /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
> +        qemu_account_warp_timer();
> +
> +        if (!cpu) {
> +            cpu = first_cpu;
> +        }
> +
> +        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
> +
> +            qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
> +                              (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
> +
> +            if (cpu_can_run(cpu)) {
> +                int r = tcg_cpu_exec(cpu);
> +                if (r == EXCP_DEBUG) {
> +                    cpu_handle_guest_debug(cpu);
> +                    break;
> +                }
> +            } else if (cpu->stop || cpu->stopped) {
> +                break;
> +            }
> +
> +        } /* for cpu.. */
> +
> +        /* Pairs with smp_wmb in qemu_cpu_kick.  */
> +        atomic_mb_set(&exit_request, 0);
>  
>          if (use_icount) {
>              int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> @@ -1448,99 +1534,6 @@ int vm_stop_force_state(RunState state)
>      }
>  }
>  
> -static int64_t tcg_get_icount_limit(void)
> -{
> -    int64_t deadline;
> -
> -    if (replay_mode != REPLAY_MODE_PLAY) {
> -        deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> -
> -        /* Maintain prior (possibly buggy) behaviour where if no deadline
> -         * was set (as there is no QEMU_CLOCK_VIRTUAL timer) or it is more than
> -         * INT32_MAX nanoseconds ahead, we still use INT32_MAX
> -         * nanoseconds.
> -         */
> -        if ((deadline < 0) || (deadline > INT32_MAX)) {
> -            deadline = INT32_MAX;
> -        }
> -
> -        return qemu_icount_round(deadline);
> -    } else {
> -        return replay_get_instructions();
> -    }
> -}
> -
> -static int tcg_cpu_exec(CPUState *cpu)
> -{
> -    int ret;
> -#ifdef CONFIG_PROFILER
> -    int64_t ti;
> -#endif
> -
> -#ifdef CONFIG_PROFILER
> -    ti = profile_getclock();
> -#endif
> -    if (use_icount) {
> -        int64_t count;
> -        int decr;
> -        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
> -                                    + cpu->icount_extra);
> -        cpu->icount_decr.u16.low = 0;
> -        cpu->icount_extra = 0;
> -        count = tcg_get_icount_limit();
> -        timers_state.qemu_icount += count;
> -        decr = (count > 0xffff) ? 0xffff : count;
> -        count -= decr;
> -        cpu->icount_decr.u16.low = decr;
> -        cpu->icount_extra = count;
> -    }
> -    ret = cpu_exec(cpu);
> -#ifdef CONFIG_PROFILER
> -    tcg_time += profile_getclock() - ti;
> -#endif
> -    if (use_icount) {
> -        /* Fold pending instructions back into the
> -           instruction counter, and clear the interrupt flag.  */
> -        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
> -                        + cpu->icount_extra);
> -        cpu->icount_decr.u32 = 0;
> -        cpu->icount_extra = 0;
> -        replay_account_executed_instructions();
> -    }
> -    return ret;
> -}
> -
> -static void tcg_exec_all(void)
> -{
> -    int r;
> -
> -    /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
> -    qemu_account_warp_timer();
> -
> -    if (next_cpu == NULL) {
> -        next_cpu = first_cpu;
> -    }
> -    for (; next_cpu != NULL && !exit_request; next_cpu = CPU_NEXT(next_cpu)) {
> -        CPUState *cpu = next_cpu;
> -
> -        qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
> -                          (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
> -
> -        if (cpu_can_run(cpu)) {
> -            r = tcg_cpu_exec(cpu);
> -            if (r == EXCP_DEBUG) {
> -                cpu_handle_guest_debug(cpu);
> -                break;
> -            }
> -        } else if (cpu->stop || cpu->stopped) {
> -            break;
> -        }
> -    }
> -
> -    /* Pairs with smp_wmb in qemu_cpu_kick.  */
> -    atomic_mb_set(&exit_request, 0);
> -}
> -
>  void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg)
>  {
>      /* XXX: implement xxx_cpu_list for targets that still miss it */

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all()
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all() Alex Bennée
  2016-06-24 17:09   ` Sergey Fedorov
@ 2016-07-01 23:50   ` Richard Henderson
  1 sibling, 0 replies; 62+ messages in thread
From: Richard Henderson @ 2016-07-01 23:50 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 06/03/2016 01:40 PM, Alex Bennée wrote:
> In preparation for multi-threaded TCG we remove tcg_exec_all and move
> all the CPU cycling into the main thread function. When MTTCG is enabled
> we shall use a separate thread function which only handles one vCPU.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (9 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all() Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-27 21:07   ` Sergey Fedorov
  2016-07-01 23:53   ` Richard Henderson
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
                   ` (8 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

From: KONRAD Frederic <fred.konrad@greensocs.com>

We know there will be cases where MTTCG won't work until additional work
is done in the front/back ends to support. It will however be useful to
be able to turn it on.

As a result MTTCG will default to off unless the combination is
supported. However the user can turn it on for the sake of testing.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[AJB: move to -accel tcg,thread=multi|single, defaults]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1:
  - merge with add mttcg option.
  - update commit message
v2:
  - machine_init->opts_init
v3:
  - moved from -tcg to -accel tcg,thread=single|multi
  - fix checkpatch warnings
---
 cpus.c                | 28 ++++++++++++++++++++++++++++
 include/qom/cpu.h     |  8 ++++++++
 include/sysemu/cpus.h |  2 ++
 qemu-options.hx       | 20 ++++++++++++++++++++
 vl.c                  | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 5 files changed, 102 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 4cc2ce6..1694ce9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -25,6 +25,7 @@
 /* Needed early for CONFIG_BSD etc. */
 #include "qemu/osdep.h"
 #include "qemu-common.h"
+#include "qemu/config-file.h"
 #include "cpu.h"
 #include "monitor/monitor.h"
 #include "qapi/qmp/qerror.h"
@@ -148,6 +149,33 @@ typedef struct TimersState {
 } TimersState;
 
 static TimersState timers_state;
+static bool mttcg_enabled;
+
+static bool default_mttcg_enabled(void)
+{
+    /*
+     * TODO: Check if we have a chance to have MTTCG working on this guest/host.
+     *       Basically is the atomic instruction implemented? Is there any
+     *       memory ordering issue?
+     */
+    return false;
+}
+
+void qemu_tcg_configure(QemuOpts *opts)
+{
+    const char *t = qemu_opt_get(opts, "thread");
+    if (t) {
+        mttcg_enabled = (strcmp(t, "multi") == 0);
+    } else {
+        mttcg_enabled = default_mttcg_enabled();
+    }
+}
+
+bool qemu_tcg_mttcg_enabled(void)
+{
+    return mttcg_enabled;
+}
+
 
 int64_t cpu_get_icount_raw(void)
 {
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index b82625f..40823bf 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -376,6 +376,14 @@ extern struct CPUTailQ cpus;
 extern __thread CPUState *current_cpu;
 
 /**
+ * qemu_tcg_mttcg_enabled:
+ * Check whether we are running MultiThread TCG or not.
+ *
+ * Returns: %true if we are in MTTCG mode %false otherwise.
+ */
+bool qemu_tcg_mttcg_enabled(void);
+
+/**
  * cpu_paging_enabled:
  * @cpu: The CPU whose state is to be inspected.
  *
diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
index fe992a8..4948c40 100644
--- a/include/sysemu/cpus.h
+++ b/include/sysemu/cpus.h
@@ -39,4 +39,6 @@ extern int smp_threads;
 
 void list_cpus(FILE *f, fprintf_function cpu_fprintf, const char *optarg);
 
+void qemu_tcg_configure(QemuOpts *opts);
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index 6106520..dc865ec 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -99,6 +99,26 @@ STEXI
 Select CPU model (@code{-cpu help} for list and additional feature selection)
 ETEXI
 
+DEF("accel", HAS_ARG, QEMU_OPTION_accel,
+    "-accel [accel=]accelerator[,thread=single|multi]\n"
+    "               select accelerator ('-accel help for list')\n"
+    "               thread=single|multi (enable multi-threaded TCG)", QEMU_ARCH_ALL)
+STEXI
+@item -accel @var{name}[,prop=@var{value}[,...]]
+@findex -accel
+This is used to enable an accelerator. Depending on the target architecture,
+kvm, xen, or tcg can be available. By default, tcg is used. If there is more
+than one accelerator specified, the next one is used if the previous one fails
+to initialize.
+@table @option
+@item thread=single|multi
+Controls number of TCG threads. When the TCG is multi-threaded there will be one
+thread per vCPU therefor taking advantage of additional host cores. The default
+is to enable multi-threading where both the back-end and front-ends support it and
+no incompatible TCG features have been enabled (e.g. icount/replay).
+@end table
+ETEXI
+
 DEF("smp", HAS_ARG, QEMU_OPTION_smp,
     "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n"
     "                set the number of CPUs to 'n' [default=1]\n"
diff --git a/vl.c b/vl.c
index 18d1423..b1224f9 100644
--- a/vl.c
+++ b/vl.c
@@ -312,6 +312,26 @@ static QemuOptsList qemu_machine_opts = {
     },
 };
 
+static QemuOptsList qemu_accel_opts = {
+    .name = "accel",
+    .implied_opt_name = "accel",
+    .head = QTAILQ_HEAD_INITIALIZER(qemu_accel_opts.head),
+    .merge_lists = true,
+    .desc = {
+        {
+            .name = "accel",
+            .type = QEMU_OPT_STRING,
+            .help = "Select the type of accelerator",
+        },
+        {
+            .name = "thread",
+            .type = QEMU_OPT_STRING,
+            .help = "Enable/disable multi-threaded TCG",
+        },
+        { /* end of list */ }
+    },
+};
+
 static QemuOptsList qemu_boot_opts = {
     .name = "boot-opts",
     .implied_opt_name = "order",
@@ -2946,7 +2966,8 @@ int main(int argc, char **argv, char **envp)
     const char *boot_once = NULL;
     DisplayState *ds;
     int cyls, heads, secs, translation;
-    QemuOpts *hda_opts = NULL, *opts, *machine_opts, *icount_opts = NULL;
+    QemuOpts *opts, *machine_opts;
+    QemuOpts *hda_opts = NULL, *icount_opts = NULL;
     QemuOptsList *olist;
     int optind;
     const char *optarg;
@@ -2996,6 +3017,7 @@ int main(int argc, char **argv, char **envp)
     qemu_add_opts(&qemu_trace_opts);
     qemu_add_opts(&qemu_option_rom_opts);
     qemu_add_opts(&qemu_machine_opts);
+    qemu_add_opts(&qemu_accel_opts);
     qemu_add_opts(&qemu_mem_opts);
     qemu_add_opts(&qemu_smp_opts);
     qemu_add_opts(&qemu_boot_opts);
@@ -3682,6 +3704,27 @@ int main(int argc, char **argv, char **envp)
                 qdev_prop_register_global(&kvm_pit_lost_tick_policy);
                 break;
             }
+            case QEMU_OPTION_accel:
+                opts = qemu_opts_parse_noisily(qemu_find_opts("accel"),
+                                               optarg, true);
+                optarg = qemu_opt_get(opts, "accel");
+
+                olist = qemu_find_opts("machine");
+                if (strcmp("kvm", optarg) == 0) {
+                    qemu_opts_parse_noisily(olist, "accel=kvm", false);
+                } else if (strcmp("xen", optarg) == 0) {
+                    qemu_opts_parse_noisily(olist, "accel=xen", false);
+                } else if (strcmp("tcg", optarg) == 0) {
+                    qemu_opts_parse_noisily(olist, "accel=tcg", false);
+                    qemu_tcg_configure(opts);
+                } else {
+                    if (!is_help_option(optarg)) {
+                        error_printf("Unknown accelerator: %s", optarg);
+                    }
+                    error_printf("Supported accelerators: kvm, xen, tcg\n");
+                    exit(1);
+                }
+                break;
             case QEMU_OPTION_usb:
                 olist = qemu_find_opts("machine");
                 qemu_opts_parse_noisily(olist, "usb=on", false);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-06-27 21:07   ` Sergey Fedorov
  2016-07-22 16:17     ` Alex Bennée
  2016-07-01 23:53   ` Richard Henderson
  1 sibling, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-27 21:07 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> diff --git a/cpus.c b/cpus.c
> index 4cc2ce6..1694ce9 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -25,6 +25,7 @@
>  /* Needed early for CONFIG_BSD etc. */
>  #include "qemu/osdep.h"
>  #include "qemu-common.h"
> +#include "qemu/config-file.h"
>  #include "cpu.h"
>  #include "monitor/monitor.h"
>  #include "qapi/qmp/qerror.h" @@ -148,6 +149,33 @@ typedef struct TimersState { } TimersState;
> static TimersState timers_state; +static bool mttcg_enabled; + +static
> bool default_mttcg_enabled(void) +{ + /* + * TODO: Check if we have a
> chance to have MTTCG working on this guest/host. + * Basically is the
> atomic instruction implemented? Is there any + * memory ordering
> issue? + */ + return false; +} + +void qemu_tcg_configure(QemuOpts
> *opts) +{ + const char *t = qemu_opt_get(opts, "thread");
> +    if (t) {
> +        mttcg_enabled = (strcmp(t, "multi") == 0);
> +    } else {
> +        mttcg_enabled = default_mttcg_enabled();

Wouldn't we like to check for user errors here? Suppose a user have
passed "multy" or "mult" and think they've enabled MTTCG whereas such
values are silently ignored.

> +    }
> +}
> +
> +bool qemu_tcg_mttcg_enabled(void)
> +{
> +    return mttcg_enabled;
> +}
> +
>  
>  int64_t cpu_get_icount_raw(void)
>  {
(snip)
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 6106520..dc865ec 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -99,6 +99,26 @@ STEXI
>  Select CPU model (@code{-cpu help} for list and additional feature selection)
>  ETEXI
>  
> +DEF("accel", HAS_ARG, QEMU_OPTION_accel,
> +    "-accel [accel=]accelerator[,thread=single|multi]\n"
> +    "               select accelerator ('-accel help for list')\n"
> +    "               thread=single|multi (enable multi-threaded TCG)", QEMU_ARCH_ALL)

"mttcg=on|off" or "multithread=on|off", instead of
"thread=single|multi", are another possible variants. The former has
less letters :)

> +STEXI
> +@item -accel @var{name}[,prop=@var{value}[,...]]
> +@findex -accel
> +This is used to enable an accelerator. Depending on the target architecture,
> +kvm, xen, or tcg can be available. By default, tcg is used. If there is more
> +than one accelerator specified, the next one is used if the previous one fails
> +to initialize.
> +@table @option
> +@item thread=single|multi
> +Controls number of TCG threads. When the TCG is multi-threaded there will be one
> +thread per vCPU therefor taking advantage of additional host cores. The default
> +is to enable multi-threading where both the back-end and front-ends support it and
> +no incompatible TCG features have been enabled (e.g. icount/replay).

We could raise and error on attempts to use icount with MTTCG enabled
modifying this piece of code in vl.c:

    if (icount_opts) {
        if (kvm_enabled() || xen_enabled()) {
            error_report("-icount is not allowed with kvm or xen");
            exit(1);
        }
 

> +@end table
> +ETEXI
> +
>  DEF("smp", HAS_ARG, QEMU_OPTION_smp,
>      "-smp [cpus=]n[,maxcpus=cpus][,cores=cores][,threads=threads][,sockets=sockets]\n"
>      "                set the number of CPUs to 'n' [default=1]\n"
> diff --git a/vl.c b/vl.c
> index 18d1423..b1224f9 100644
> --- a/vl.c
> +++ b/vl.c
(snip)
> @@ -3682,6 +3704,27 @@ int main(int argc, char **argv, char **envp)
>                  qdev_prop_register_global(&kvm_pit_lost_tick_policy);
>                  break;
>              }
> +            case QEMU_OPTION_accel:
> +                opts = qemu_opts_parse_noisily(qemu_find_opts("accel"),
> +                                               optarg, true);
> +                optarg = qemu_opt_get(opts, "accel");
> +
> +                olist = qemu_find_opts("machine");
> +                if (strcmp("kvm", optarg) == 0) {
> +                    qemu_opts_parse_noisily(olist, "accel=kvm", false);
> +                } else if (strcmp("xen", optarg) == 0) {
> +                    qemu_opts_parse_noisily(olist, "accel=xen", false);
> +                } else if (strcmp("tcg", optarg) == 0) {
> +                    qemu_opts_parse_noisily(olist, "accel=tcg", false);
> +                    qemu_tcg_configure(opts);
> +                } else {
> +                    if (!is_help_option(optarg)) {
> +                        error_printf("Unknown accelerator: %s", optarg);
> +                    }
> +                    error_printf("Supported accelerators: kvm, xen, tcg\n");
> +                    exit(1);
> +                }

I am wondering if we should use accel_find() here like in
configure_accelerator() and probably also make checks with
AccelClass::available().

Please consider wrapping this code in a separate function.


> +                break;
>              case QEMU_OPTION_usb:
>                  olist = qemu_find_opts("machine");
>                  qemu_opts_parse_noisily(olist, "usb=on", false);

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-06-27 21:07   ` Sergey Fedorov
@ 2016-07-22 16:17     ` Alex Bennée
  0 siblings, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-07-22 16:17 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


I've taken all the suggestions apart from the bellow (comment inline):

Sergey Fedorov <serge.fdrv@gmail.com> writes:

<snip>
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -99,6 +99,26 @@ STEXI
>>  Select CPU model (@code{-cpu help} for list and additional feature selection)
>>  ETEXI
>>
>> +DEF("accel", HAS_ARG, QEMU_OPTION_accel,
>> +    "-accel [accel=]accelerator[,thread=single|multi]\n"
>> +    "               select accelerator ('-accel help for list')\n"
>> +    "               thread=single|multi (enable multi-threaded TCG)", QEMU_ARCH_ALL)
>
> "mttcg=on|off" or "multithread=on|off", instead of
> "thread=single|multi", are another possible variants. The former has
> less letters :)

I went with the previous suggestion during the last round. mttcg could
be a little cryptic, multithread=on|off is much of a muchness.

<snip>
>>              }
>> +            case QEMU_OPTION_accel:
>> +                opts = qemu_opts_parse_noisily(qemu_find_opts("accel"),
>> +                                               optarg, true);
>> +                optarg = qemu_opt_get(opts, "accel");
>> +
>> +                olist = qemu_find_opts("machine");
>> +                if (strcmp("kvm", optarg) == 0) {
>> +                    qemu_opts_parse_noisily(olist, "accel=kvm", false);
>> +                } else if (strcmp("xen", optarg) == 0) {
>> +                    qemu_opts_parse_noisily(olist, "accel=xen", false);
>> +                } else if (strcmp("tcg", optarg) == 0) {
>> +                    qemu_opts_parse_noisily(olist, "accel=tcg", false);
>> +                    qemu_tcg_configure(opts);
>> +                } else {
>> +                    if (!is_help_option(optarg)) {
>> +                        error_printf("Unknown accelerator: %s", optarg);
>> +                    }
>> +                    error_printf("Supported accelerators: kvm, xen, tcg\n");
>> +                    exit(1);
>> +                }
>
> I am wondering if we should use accel_find() here like in
> configure_accelerator() and probably also make checks with
> AccelClass::available().

I'm not sure exactly what magic is going on here in the pseudo-class
stuff.

>
> Please consider wrapping this code in a separate function.

It seems broadly in line with other snippets in vl.c so I've left it for now.

>
>
>> +                break;
>>              case QEMU_OPTION_usb:
>>                  olist = qemu_find_opts("machine");
>>                  qemu_opts_parse_noisily(olist, "usb=on", false);
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG Alex Bennée
  2016-06-27 21:07   ` Sergey Fedorov
@ 2016-07-01 23:53   ` Richard Henderson
  2016-07-02  7:11     ` Alex Bennée
  1 sibling, 1 reply; 62+ messages in thread
From: Richard Henderson @ 2016-07-01 23:53 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 06/03/2016 01:40 PM, Alex Bennée wrote:
> +bool qemu_tcg_mttcg_enabled(void)
> +{
> +    return mttcg_enabled;
> +}

Is there a good reason to expose this via function call, rather than just test 
the variable?


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-07-01 23:53   ` Richard Henderson
@ 2016-07-02  7:11     ` Alex Bennée
  2016-07-02  7:38       ` Sergey Fedorov
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-07-02  7:11 UTC (permalink / raw)
  To: Richard Henderson
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, mark.burton, pbonzini, jan.kiszka, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Richard Henderson <rth@twiddle.net> writes:

> On 06/03/2016 01:40 PM, Alex Bennée wrote:
>> +bool qemu_tcg_mttcg_enabled(void)
>> +{
>> +    return mttcg_enabled;
>> +}
>
> Is there a good reason to expose this via function call, rather than just test
> the variable?

It was in Fred's original patch. I guess there could be an argument
versus a global variable but as mttcg_enabled is set during option
parsing we don't need to do anything dynamic.

>
>
> r~


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-07-02  7:11     ` Alex Bennée
@ 2016-07-02  7:38       ` Sergey Fedorov
  2016-07-04 10:10         ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-07-02  7:38 UTC (permalink / raw)
  To: Alex Bennée, Richard Henderson
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 02/07/16 10:11, Alex Bennée wrote:
> Richard Henderson <rth@twiddle.net> writes:
>
>> On 06/03/2016 01:40 PM, Alex Bennée wrote:
>>> +bool qemu_tcg_mttcg_enabled(void)
>>> +{
>>> +    return mttcg_enabled;
>>> +}
>> Is there a good reason to expose this via function call, rather than just test
>> the variable?
> It was in Fred's original patch. I guess there could be an argument
> versus a global variable but as mttcg_enabled is set during option
> parsing we don't need to do anything dynamic.

I think it just resembles tcg_enabled(), kvm_enabled() etc.

Regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG
  2016-07-02  7:38       ` Sergey Fedorov
@ 2016-07-04 10:10         ` Paolo Bonzini
  0 siblings, 0 replies; 62+ messages in thread
From: Paolo Bonzini @ 2016-07-04 10:10 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, Richard Henderson
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite



On 02/07/2016 09:38, Sergey Fedorov wrote:
>>>> >>> +bool qemu_tcg_mttcg_enabled(void)
>>>> >>> +{
>>>> >>> +    return mttcg_enabled;
>>>> >>> +}
>>> >> Is there a good reason to expose this via function call, rather than just test
>>> >> the variable?
>> > It was in Fred's original patch. I guess there could be an argument
>> > versus a global variable but as mttcg_enabled is set during option
>> > parsing we don't need to do anything dynamic.
> I think it just resembles tcg_enabled(), kvm_enabled() etc.

Those are macros however, not functions.  They are macros in order to
cull KVM code for files that are compiled per-target.

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (10 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-27 21:20   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Currently we rely on the side effect of the main loop grabbing the
iothread_mutex to give any long running basic block chains a kick to
ensure the next vCPU is scheduled. As this code is being re-factored and
rationalised we now do it explicitly here.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v2
  - re-base fixes
  - get_ticks_per_sec() -> NANOSECONDS_PER_SEC
v3
  - add define for TCG_KICK_FREQ
  - fix checkpatch warning
---
 cpus.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/cpus.c b/cpus.c
index 1694ce9..12e04c9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1208,9 +1208,29 @@ static int tcg_cpu_exec(CPUState *cpu)
     return ret;
 }
 
+/* Single-threaded TCG
+ *
+ * In the single-threaded case each vCPU is simulated in turn. If
+ * there is more than a single vCPU we create a simple timer to kick
+ * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
+ * This is done explicitly rather than relying on side-effects
+ * elsewhere.
+ */
+static void qemu_cpu_kick_no_halt(void);
+#define TCG_KICK_FREQ (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + \
+                       NANOSECONDS_PER_SECOND / 10)
+
+static void kick_tcg_thread(void *opaque)
+{
+    QEMUTimer *self = *(QEMUTimer **) opaque;
+    timer_mod(self, TCG_KICK_FREQ);
+    qemu_cpu_kick_no_halt();
+}
+
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
+    QEMUTimer *kick_timer;
 
     rcu_register_thread();
 
@@ -1234,6 +1254,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         }
     }
 
+    /* Set to kick if we have to do more than one vCPU */
+    if (CPU_NEXT(first_cpu)) {
+        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
+                                  &kick_timer);
+        timer_mod(kick_timer, TCG_KICK_FREQ);
+    }
+
     /* process any pending work */
     atomic_mb_set(&exit_request, 1);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-06-27 21:20   ` Sergey Fedorov
  2016-07-02  0:17     ` Richard Henderson
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-27 21:20 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> diff --git a/cpus.c b/cpus.c
> index 1694ce9..12e04c9 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1208,9 +1208,29 @@ static int tcg_cpu_exec(CPUState *cpu)
>      return ret;
>  }
>  
> +/* Single-threaded TCG
> + *
> + * In the single-threaded case each vCPU is simulated in turn. If
> + * there is more than a single vCPU we create a simple timer to kick
> + * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
> + * This is done explicitly rather than relying on side-effects
> + * elsewhere.
> + */
> +static void qemu_cpu_kick_no_halt(void);
> +#define TCG_KICK_FREQ (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + \
> +                       NANOSECONDS_PER_SECOND / 10)

Hmm, it doesn't look nice to wrap calculation of the next timeout in a
macro and name it '*_FREQ'. I think we'd better do like this:

#define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)

static inline int64_t qemu_tcg_next_kick(void)
{
    return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
}

and use it like this:

timer_mod(kick_timer, qemu_tcg_next_kick());

Kind regards,
Sergey

> +
> +static void kick_tcg_thread(void *opaque)
> +{
> +    QEMUTimer *self = *(QEMUTimer **) opaque;
> +    timer_mod(self, TCG_KICK_FREQ);
> +    qemu_cpu_kick_no_halt();
> +}
> +
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
> +    QEMUTimer *kick_timer;
>  
>      rcu_register_thread();
>  
> @@ -1234,6 +1254,13 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>          }
>      }
>  
> +    /* Set to kick if we have to do more than one vCPU */
> +    if (CPU_NEXT(first_cpu)) {
> +        kick_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,  kick_tcg_thread,
> +                                  &kick_timer);
> +        timer_mod(kick_timer, TCG_KICK_FREQ);
> +    }
> +
>      /* process any pending work */
>      atomic_mb_set(&exit_request, 1);
>  

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation
  2016-06-27 21:20   ` Sergey Fedorov
@ 2016-07-02  0:17     ` Richard Henderson
  2016-07-02  7:36       ` Sergey Fedorov
  0 siblings, 1 reply; 62+ messages in thread
From: Richard Henderson @ 2016-07-02  0:17 UTC (permalink / raw)
  To: Sergey Fedorov, Alex Bennée, mttcg, qemu-devel, fred.konrad,
	a.rigo, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 06/27/2016 02:20 PM, Sergey Fedorov wrote:
> On 03/06/16 23:40, Alex Bennée wrote:
>> diff --git a/cpus.c b/cpus.c
>> index 1694ce9..12e04c9 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1208,9 +1208,29 @@ static int tcg_cpu_exec(CPUState *cpu)
>>      return ret;
>>  }
>>
>> +/* Single-threaded TCG
>> + *
>> + * In the single-threaded case each vCPU is simulated in turn. If
>> + * there is more than a single vCPU we create a simple timer to kick
>> + * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
>> + * This is done explicitly rather than relying on side-effects
>> + * elsewhere.
>> + */
>> +static void qemu_cpu_kick_no_halt(void);
>> +#define TCG_KICK_FREQ (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + \
>> +                       NANOSECONDS_PER_SECOND / 10)
>
> Hmm, it doesn't look nice to wrap calculation of the next timeout in a
> macro and name it '*_FREQ'. I think we'd better do like this:
>
> #define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
>
> static inline int64_t qemu_tcg_next_kick(void)
> {
>     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
> }
>
> and use it like this:
>
> timer_mod(kick_timer, qemu_tcg_next_kick());

Agreed.

As an aside, surely a period of 10ns is too small.
That's on the order of 20-50 host instructions.


r~

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation
  2016-07-02  0:17     ` Richard Henderson
@ 2016-07-02  7:36       ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-07-02  7:36 UTC (permalink / raw)
  To: Richard Henderson, Alex Bennée, mttcg, qemu-devel,
	fred.konrad, a.rigo, cota, bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, peter.maydell, claudio.fontana,
	Peter Crosthwaite

On 02/07/16 03:17, Richard Henderson wrote:
> On 06/27/2016 02:20 PM, Sergey Fedorov wrote:
>> On 03/06/16 23:40, Alex Bennée wrote:
>>> diff --git a/cpus.c b/cpus.c
>>> index 1694ce9..12e04c9 100644
>>> --- a/cpus.c
>>> +++ b/cpus.c
>>> @@ -1208,9 +1208,29 @@ static int tcg_cpu_exec(CPUState *cpu)
>>>      return ret;
>>>  }
>>>
>>> +/* Single-threaded TCG
>>> + *
>>> + * In the single-threaded case each vCPU is simulated in turn. If
>>> + * there is more than a single vCPU we create a simple timer to kick
>>> + * the vCPU and ensure we don't get stuck in a tight loop in one vCPU.
>>> + * This is done explicitly rather than relying on side-effects
>>> + * elsewhere.
>>> + */
>>> +static void qemu_cpu_kick_no_halt(void);
>>> +#define TCG_KICK_FREQ (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + \
>>> +                       NANOSECONDS_PER_SECOND / 10)
>>
>> Hmm, it doesn't look nice to wrap calculation of the next timeout in a
>> macro and name it '*_FREQ'. I think we'd better do like this:
>>
>> #define TCG_KICK_PERIOD (NANOSECONDS_PER_SECOND / 10)
>>
>> static inline int64_t qemu_tcg_next_kick(void)
>> {
>>     return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + TCG_KICK_PERIOD;
>> }
>>
>> and use it like this:
>>
>> timer_mod(kick_timer, qemu_tcg_next_kick());
>
> Agreed.
>
> As an aside, surely a period of 10ns is too small.
> That's on the order of 20-50 host instructions.

I think the period is 10 times in a second which is 100 ms.

Regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (11 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-06 15:30   ` Paolo Bonzini
  2016-06-07 12:59   ` Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request Alex Bennée
                   ` (6 subsequent siblings)
  19 siblings, 2 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

..and make the definition local to cpus. In preparation for MTTCG the
concept of a global tcg_current_cpu will no longer make sense. However
we still need to keep track of it in the single-threaded case to be able
to exit quickly when required.

qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec-common.c       |  1 -
 cpu-exec.c              |  3 ---
 cpus.c                  | 41 ++++++++++++++++++++++++-----------------
 include/exec/exec-all.h |  2 --
 4 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 132cd03..f42d24a 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -24,7 +24,6 @@
 #include "exec/memory-internal.h"
 
 bool exit_request;
-CPUState *tcg_current_cpu;
 
 /* exit the current TB from a signal handler. The host registers are
    restored in a state compatible with the CPU emulator
diff --git a/cpu-exec.c b/cpu-exec.c
index ae81e8e..68e804b 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -597,7 +597,6 @@ int cpu_exec(CPUState *cpu)
         return EXCP_HALTED;
     }
 
-    atomic_mb_set(&tcg_current_cpu, cpu);
     rcu_read_lock();
 
     if (unlikely(atomic_mb_read(&exit_request))) {
@@ -658,7 +657,5 @@ int cpu_exec(CPUState *cpu)
     /* fail safe : never use current_cpu outside cpu_exec() */
     current_cpu = NULL;
 
-    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-    atomic_set(&tcg_current_cpu, NULL);
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index 12e04c9..a139ad3 100644
--- a/cpus.c
+++ b/cpus.c
@@ -942,6 +942,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
 {
     struct qemu_work_item wi;
 
+    /* Always true when using tcg RR scheduling from a vCPU context */
     if (qemu_cpu_is_self(cpu)) {
         func(data);
         return;
@@ -1216,15 +1217,29 @@ static int tcg_cpu_exec(CPUState *cpu)
  * This is done explicitly rather than relying on side-effects
  * elsewhere.
  */
-static void qemu_cpu_kick_no_halt(void);
 #define TCG_KICK_FREQ (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + \
                        NANOSECONDS_PER_SECOND / 10)
 
+/* only used in single-thread tcg mode */
+static CPUState *tcg_current_rr_cpu;
+
+/* Kick the currently round-robin scheduled vCPU */
+static void qemu_cpu_kick_rr_cpu(void)
+{
+    CPUState *cpu;
+    do {
+        cpu = atomic_mb_read(&tcg_current_rr_cpu);
+        if (cpu) {
+            cpu_exit(cpu);
+        }
+    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
+}
+
 static void kick_tcg_thread(void *opaque)
 {
     QEMUTimer *self = *(QEMUTimer **) opaque;
     timer_mod(self, TCG_KICK_FREQ);
-    qemu_cpu_kick_no_halt();
+    qemu_cpu_kick_rr_cpu();
 }
 
 static void *qemu_tcg_cpu_thread_fn(void *arg)
@@ -1275,6 +1290,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         }
 
         for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+            atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                               (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1290,6 +1306,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             }
 
         } /* for cpu.. */
+        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
+        atomic_set(&tcg_current_rr_cpu, NULL);
 
         /* Pairs with smp_wmb in qemu_cpu_kick.  */
         atomic_mb_set(&exit_request, 0);
@@ -1326,24 +1344,13 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
 #endif
 }
 
-static void qemu_cpu_kick_no_halt(void)
-{
-    CPUState *cpu;
-    /* Ensure whatever caused the exit has reached the CPU threads before
-     * writing exit_request.
-     */
-    atomic_mb_set(&exit_request, 1);
-    cpu = atomic_mb_read(&tcg_current_cpu);
-    if (cpu) {
-        cpu_exit(cpu);
-    }
-}
-
 void qemu_cpu_kick(CPUState *cpu)
 {
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
-        qemu_cpu_kick_no_halt();
+        cpu_exit(cpu);
+        /* Also ensure current RR cpu is kicked */
+        qemu_cpu_kick_rr_cpu();
     } else {
         qemu_cpu_kick_thread(cpu);
     }
@@ -1384,7 +1391,7 @@ void qemu_mutex_lock_iothread(void)
         atomic_dec(&iothread_requesting_mutex);
     } else {
         if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_no_halt();
+            qemu_cpu_kick_rr_cpu();
             qemu_mutex_lock(&qemu_global_mutex);
         }
         atomic_dec(&iothread_requesting_mutex);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index e30f02b..31f4c38 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -405,8 +405,6 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 /* vl.c */
 extern int singlestep;
 
-/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
-extern CPUState *tcg_current_cpu;
 extern bool exit_request;
 
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
@ 2016-06-06 15:30   ` Paolo Bonzini
  2016-06-06 16:05     ` Alex Bennée
  2016-06-07 12:59   ` Alex Bennée
  1 sibling, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2016-06-06 15:30 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo,
	serge.fdrv, cota, bobby.prani
  Cc: mark.burton, jan.kiszka, rth, peter.maydell, claudio.fontana,
	Peter Crosthwaite



On 03/06/2016 22:40, Alex Bennée wrote:
> +/* Kick the currently round-robin scheduled vCPU */
> +static void qemu_cpu_kick_rr_cpu(void)
> +{
> +    CPUState *cpu;
> +    do {
> +        cpu = atomic_mb_read(&tcg_current_rr_cpu);
> +        if (cpu) {
> +            cpu_exit(cpu);
> +        }
> +    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
> +}

Interesting way to get rid of the global exit_request.  I like it, but
could you get stuck on NULL tcg_current_rr_cpu now?

I think you should redo these two patches like this:

- rename qemu_cpu_kick_no_halt to qemu_cpu_kick_rr_cpu and
tcg_current_cpu to tcg_current_rr_cpu; possibly move functions around

- extract handle_icount_deadline

- then everything else in patches 13 and 14, squashed

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-06 15:30   ` Paolo Bonzini
@ 2016-06-06 16:05     ` Alex Bennée
  2016-06-06 17:05       ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-06 16:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani, mark.burton, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 03/06/2016 22:40, Alex Bennée wrote:
>> +/* Kick the currently round-robin scheduled vCPU */
>> +static void qemu_cpu_kick_rr_cpu(void)
>> +{
>> +    CPUState *cpu;
>> +    do {
>> +        cpu = atomic_mb_read(&tcg_current_rr_cpu);
>> +        if (cpu) {
>> +            cpu_exit(cpu);
>> +        }
>> +    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
>> +}
>
> Interesting way to get rid of the global exit_request.  I like it, but
> could you get stuck on NULL tcg_current_rr_cpu now?

If tcg_current_rr_cpu is still NULL when you read the second time (i.e.
hasn't changed) then we'll exit the loop and your don't really need to
do anything as your outside the vCPU TCG code.

>
> I think you should redo these two patches like this:
>
> - rename qemu_cpu_kick_no_halt to qemu_cpu_kick_rr_cpu and
> tcg_current_cpu to tcg_current_rr_cpu; possibly move functions around
>

That is this patch isn't it?


> - extract handle_icount_deadline

Sure I can do that separately.

>
> - then everything else in patches 13 and 14, squashed

I was wondering after I posted if I should split all the current_cpu
stuff out as I don't think I need it straight away...

>
> Paolo


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-06 16:05     ` Alex Bennée
@ 2016-06-06 17:05       ` Paolo Bonzini
  2016-06-06 17:26         ` Alex Bennée
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2016-06-06 17:05 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, claudio.fontana, Peter Crosthwaite,
	jan.kiszka, mark.burton, a.rigo, qemu-devel, cota, serge.fdrv,
	bobby.prani, rth, fred.konrad



On 06/06/2016 18:05, Alex Bennée wrote:
> > - rename qemu_cpu_kick_no_halt to qemu_cpu_kick_rr_cpu and
> > tcg_current_cpu to tcg_current_rr_cpu; possibly move functions around
> 
> That is this patch isn't it?

There's some renaming left in patch 14 which complicates things.

Paolo

>> > - extract handle_icount_deadline
> Sure I can do that separately.
> 
>> >
>> > - then everything else in patches 13 and 14, squashed
> I was wondering after I posted if I should split all the current_cpu
> stuff out as I don't think I need it straight away...
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-06 17:05       ` Paolo Bonzini
@ 2016-06-06 17:26         ` Alex Bennée
  2016-06-06 18:25           ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-06 17:26 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: mttcg, peter.maydell, claudio.fontana, Peter Crosthwaite,
	jan.kiszka, mark.burton, a.rigo, qemu-devel, cota, serge.fdrv,
	bobby.prani, rth, fred.konrad


Paolo Bonzini <pbonzini@redhat.com> writes:

> On 06/06/2016 18:05, Alex Bennée wrote:
>> > - rename qemu_cpu_kick_no_halt to qemu_cpu_kick_rr_cpu and
>> > tcg_current_cpu to tcg_current_rr_cpu; possibly move functions around
>>
>> That is this patch isn't it?
>
> There's some renaming left in patch 14 which complicates things.

Is there? It looks all exit_request related to me.

>
> Paolo
>
>>> > - extract handle_icount_deadline
>> Sure I can do that separately.
>>
>>> >
>>> > - then everything else in patches 13 and 14, squashed
>> I was wondering after I posted if I should split all the current_cpu
>> stuff out as I don't think I need it straight away...
>>


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-06 17:26         ` Alex Bennée
@ 2016-06-06 18:25           ` Paolo Bonzini
  0 siblings, 0 replies; 62+ messages in thread
From: Paolo Bonzini @ 2016-06-06 18:25 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, peter.maydell, Peter Crosthwaite, mark.burton,
	claudio.fontana, a.rigo, qemu-devel, cota, jan.kiszka, serge.fdrv,
	bobby.prani, fred.konrad, rth



On 06/06/2016 19:26, Alex Bennée wrote:
> 
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>> On 06/06/2016 18:05, Alex Bennée wrote:
>>>> - rename qemu_cpu_kick_no_halt to qemu_cpu_kick_rr_cpu and
>>>> tcg_current_cpu to tcg_current_rr_cpu; possibly move functions around
>>>
>>> That is this patch isn't it?
>>
>> There's some renaming left in patch 14 which complicates things.
> 
> Is there? It looks all exit_request related to me.

You're right sorry, it's the other way round.  It's not that there's 
some renaming in patch 14, it's that there are semantic changes in this 
patch.  I.e. this:


 
-    atomic_mb_set(&tcg_current_cpu, cpu);
     rcu_read_lock();
 
     if (unlikely(atomic_mb_read(&exit_request))) {
@@ -658,7 +657,5 @@ int cpu_exec(CPUState *cpu)
     /* fail safe : never use current_cpu outside cpu_exec() */
     current_cpu = NULL;
 
-    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
-    atomic_set(&tcg_current_cpu, NULL);

and the while-loop in qemu_cpu_kick_rr_cpu.

Paolo

>>
>> Paolo
>>
>>>>> - extract handle_icount_deadline
>>> Sure I can do that separately.
>>>
>>>>>
>>>>> - then everything else in patches 13 and 14, squashed
>>> I was wondering after I posted if I should split all the current_cpu
>>> stuff out as I don't think I need it straight away...
>>>
> 
> 
> --
> Alex Bennée
> 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
  2016-06-06 15:30   ` Paolo Bonzini
@ 2016-06-07 12:59   ` Alex Bennée
  1 sibling, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-07 12:59 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Alex Bennée <alex.bennee@linaro.org> writes:

> ..and make the definition local to cpus. In preparation for MTTCG the
> concept of a global tcg_current_cpu will no longer make sense. However
> we still need to keep track of it in the single-threaded case to be able
> to exit quickly when required.
>
> qemu_cpu_kick_no_halt() moves and becomes qemu_cpu_kick_rr_cpu() to
> emphasise its use-case. qemu_cpu_kick now kicks the relevant cpu as
> well as qemu_kick_rr_cpu() which will become a no-op in MTTCG.

I detected a failure case that a -smp 2 guest doesn't cleanly shutdown
in RR mode. My command line:

running command: ['/home/alex/lsrc/qemu/qemu.git/arm-softmmu/qemu-system-arm', '-machine', 'type=virt', '-display', 'none', '-smp', '1', '-m', '4096', '-cpu', 'cortex-a15', '-serial', 'telnet:127.0.0.1:4444', '-monitor', 'stdio', '-netdev', 'user,id=unet,hostfwd=tcp::2222-:22', '-device', 'virtio-net-device,netdev=unet', '-drive', 'file=/home/alex/lsrc/qemu/images/jessie-arm32.qcow2,id=myblock,index=0,if=none', '-device', 'virtio-blk-device,drive=myblock', '-append', 'console=ttyAMA0 systemd.unit=benchmark.service root=/dev/vda1', '-kernel', '/home/alex/lsrc/qemu/images/aarch32-current-linux-kernel-only.img', '-smp', '2'] (notty=False, with timeout)
command timed out!
run 1: ret=-1 (FALSE), time=20.000228 (0/1)
1 fails

(N.B. the duplicate -smp is just an artefact of my shell script glue.
The -smp 2 is the final smp that gets acted on. The benchmark.service is
just a simple systemd target that shuts the image down again.)

>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  cpu-exec-common.c       |  1 -
>  cpu-exec.c              |  3 ---
>  cpus.c                  | 41 ++++++++++++++++++++++++-----------------
>  include/exec/exec-all.h |  2 --
>  4 files changed, 24 insertions(+), 23 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index 132cd03..f42d24a 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -24,7 +24,6 @@
>  #include "exec/memory-internal.h"
>
>  bool exit_request;
> -CPUState *tcg_current_cpu;
>
>  /* exit the current TB from a signal handler. The host registers are
>     restored in a state compatible with the CPU emulator
> diff --git a/cpu-exec.c b/cpu-exec.c
> index ae81e8e..68e804b 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -597,7 +597,6 @@ int cpu_exec(CPUState *cpu)
>          return EXCP_HALTED;
>      }
>
> -    atomic_mb_set(&tcg_current_cpu, cpu);
>      rcu_read_lock();
>
>      if (unlikely(atomic_mb_read(&exit_request))) {
> @@ -658,7 +657,5 @@ int cpu_exec(CPUState *cpu)
>      /* fail safe : never use current_cpu outside cpu_exec() */
>      current_cpu = NULL;
>
> -    /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
> -    atomic_set(&tcg_current_cpu, NULL);
>      return ret;
>  }
> diff --git a/cpus.c b/cpus.c
> index 12e04c9..a139ad3 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -942,6 +942,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
>  {
>      struct qemu_work_item wi;
>
> +    /* Always true when using tcg RR scheduling from a vCPU context */
>      if (qemu_cpu_is_self(cpu)) {
>          func(data);
>          return;
> @@ -1216,15 +1217,29 @@ static int tcg_cpu_exec(CPUState *cpu)
>   * This is done explicitly rather than relying on side-effects
>   * elsewhere.
>   */
> -static void qemu_cpu_kick_no_halt(void);
>  #define TCG_KICK_FREQ (qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + \
>                         NANOSECONDS_PER_SECOND / 10)
>
> +/* only used in single-thread tcg mode */
> +static CPUState *tcg_current_rr_cpu;
> +
> +/* Kick the currently round-robin scheduled vCPU */
> +static void qemu_cpu_kick_rr_cpu(void)
> +{
> +    CPUState *cpu;
> +    do {
> +        cpu = atomic_mb_read(&tcg_current_rr_cpu);
> +        if (cpu) {
> +            cpu_exit(cpu);
> +        }
> +    } while (cpu != atomic_mb_read(&tcg_current_rr_cpu));
> +}
> +
>  static void kick_tcg_thread(void *opaque)
>  {
>      QEMUTimer *self = *(QEMUTimer **) opaque;
>      timer_mod(self, TCG_KICK_FREQ);
> -    qemu_cpu_kick_no_halt();
> +    qemu_cpu_kick_rr_cpu();
>  }
>
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
> @@ -1275,6 +1290,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>          }
>
>          for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
> +            atomic_mb_set(&tcg_current_rr_cpu, cpu);
>
>              qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
>                                (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
> @@ -1290,6 +1306,8 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>              }
>
>          } /* for cpu.. */
> +        /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
> +        atomic_set(&tcg_current_rr_cpu, NULL);
>
>          /* Pairs with smp_wmb in qemu_cpu_kick.  */
>          atomic_mb_set(&exit_request, 0);
> @@ -1326,24 +1344,13 @@ static void qemu_cpu_kick_thread(CPUState *cpu)
>  #endif
>  }
>
> -static void qemu_cpu_kick_no_halt(void)
> -{
> -    CPUState *cpu;
> -    /* Ensure whatever caused the exit has reached the CPU threads before
> -     * writing exit_request.
> -     */
> -    atomic_mb_set(&exit_request, 1);
> -    cpu = atomic_mb_read(&tcg_current_cpu);
> -    if (cpu) {
> -        cpu_exit(cpu);
> -    }
> -}
> -
>  void qemu_cpu_kick(CPUState *cpu)
>  {
>      qemu_cond_broadcast(cpu->halt_cond);
>      if (tcg_enabled()) {
> -        qemu_cpu_kick_no_halt();
> +        cpu_exit(cpu);
> +        /* Also ensure current RR cpu is kicked */
> +        qemu_cpu_kick_rr_cpu();
>      } else {
>          qemu_cpu_kick_thread(cpu);
>      }
> @@ -1384,7 +1391,7 @@ void qemu_mutex_lock_iothread(void)
>          atomic_dec(&iothread_requesting_mutex);
>      } else {
>          if (qemu_mutex_trylock(&qemu_global_mutex)) {
> -            qemu_cpu_kick_no_halt();
> +            qemu_cpu_kick_rr_cpu();
>              qemu_mutex_lock(&qemu_global_mutex);
>          }
>          atomic_dec(&iothread_requesting_mutex);
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index e30f02b..31f4c38 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -405,8 +405,6 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
>  /* vl.c */
>  extern int singlestep;
>
> -/* cpu-exec.c, accessed with atomic_mb_read/atomic_mb_set */
> -extern CPUState *tcg_current_cpu;
>  extern bool exit_request;
>
>  #endif


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (12 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-28 16:20   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution Alex Bennée
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

The only remaining use of the global exit_request flag is now to ensure
we exit the run_loop when we first start to process pending work. This
is just as easily done by setting the first_cpu->exit_request flag.

We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
cause us to exit the main loop and process any IO requests that might
come along.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 cpu-exec-common.c       |  2 --
 cpu-exec.c              | 10 +++-------
 cpus.c                  | 35 +++++++++++++++++++++++------------
 include/exec/exec-all.h |  2 --
 4 files changed, 26 insertions(+), 23 deletions(-)

diff --git a/cpu-exec-common.c b/cpu-exec-common.c
index f42d24a..1b4fb53 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -23,8 +23,6 @@
 #include "exec/exec-all.h"
 #include "exec/memory-internal.h"
 
-bool exit_request;
-
 /* exit the current TB from a signal handler. The host registers are
    restored in a state compatible with the CPU emulator
  */
diff --git a/cpu-exec.c b/cpu-exec.c
index 68e804b..1613c63 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -541,9 +541,8 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
         /* Something asked us to stop executing
          * chained TBs; just continue round the main
          * loop. Whatever requested the exit will also
-         * have set something else (eg exit_request or
-         * interrupt_request) which we will handle
-         * next time around the loop.  But we need to
+         * have set something else (eg interrupt_request) which we
+         * will handle next time around the loop.  But we need to
          * ensure the tcg_exit_req read in generated code
          * comes before the next read of cpu->exit_request
          * or cpu->interrupt_request.
@@ -594,15 +593,12 @@ int cpu_exec(CPUState *cpu)
     current_cpu = cpu;
 
     if (cpu_handle_halt(cpu)) {
+        cpu->exit_request = true;
         return EXCP_HALTED;
     }
 
     rcu_read_lock();
 
-    if (unlikely(atomic_mb_read(&exit_request))) {
-        cpu->exit_request = 1;
-    }
-
     cc->cpu_exec_enter(cpu);
 
     /* Calculate difference between guest clock and host clock.
diff --git a/cpus.c b/cpus.c
index a139ad3..a84f02c 100644
--- a/cpus.c
+++ b/cpus.c
@@ -1169,6 +1169,18 @@ static int64_t tcg_get_icount_limit(void)
     }
 }
 
+static void handle_icount_deadline(void)
+{
+    if (use_icount) {
+        int64_t deadline =
+            qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+
+        if (deadline == 0) {
+            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
+        }
+    }
+}
+
 static int tcg_cpu_exec(CPUState *cpu)
 {
     int ret;
@@ -1276,11 +1288,11 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
         timer_mod(kick_timer, TCG_KICK_FREQ);
     }
 
-    /* process any pending work */
-    atomic_mb_set(&exit_request, 1);
-
     cpu = first_cpu;
 
+    /* process any pending work */
+    atomic_mb_set(&cpu->exit_request, 1);
+
     while (1) {
         /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
         qemu_account_warp_timer();
@@ -1289,7 +1301,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             cpu = first_cpu;
         }
 
-        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
+        while (cpu && !cpu->exit_request) {
             atomic_mb_set(&tcg_current_rr_cpu, cpu);
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
@@ -1303,22 +1315,21 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
                 }
             } else if (cpu->stop || cpu->stopped) {
                 break;
+            } else if (cpu->exit_request) {
+                break;
             }
 
+            cpu = CPU_NEXT(cpu);
         } /* for cpu.. */
         /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
         atomic_set(&tcg_current_rr_cpu, NULL);
 
-        /* Pairs with smp_wmb in qemu_cpu_kick.  */
-        atomic_mb_set(&exit_request, 0);
+        if (cpu && cpu->exit_request) {
+            atomic_mb_set(&cpu->exit_request, 0);
+        }
 
-        if (use_icount) {
-            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
+        handle_icount_deadline();
 
-            if (deadline == 0) {
-                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
-            }
-        }
         qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
     }
 
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 31f4c38..7919aac 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -405,6 +405,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
 /* vl.c */
 extern int singlestep;
 
-extern bool exit_request;
-
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request Alex Bennée
@ 2016-06-28 16:20   ` Sergey Fedorov
  2016-08-03 11:42     ` Alex Bennée
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-28 16:20 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> The only remaining use of the global exit_request flag is now to ensure
> we exit the run_loop when we first start to process pending work. This
> is just as easily done by setting the first_cpu->exit_request flag.
>
> We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
> cause us to exit the main loop and process any IO requests that might
> come along.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  cpu-exec-common.c       |  2 --
>  cpu-exec.c              | 10 +++-------
>  cpus.c                  | 35 +++++++++++++++++++++++------------
>  include/exec/exec-all.h |  2 --
>  4 files changed, 26 insertions(+), 23 deletions(-)
>
> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
> index f42d24a..1b4fb53 100644
> --- a/cpu-exec-common.c
> +++ b/cpu-exec-common.c
> @@ -23,8 +23,6 @@
>  #include "exec/exec-all.h"
>  #include "exec/memory-internal.h"
>  
> -bool exit_request;
> -
>  /* exit the current TB from a signal handler. The host registers are
>     restored in a state compatible with the CPU emulator
>   */
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 68e804b..1613c63 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -541,9 +541,8 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>          /* Something asked us to stop executing
>           * chained TBs; just continue round the main
>           * loop. Whatever requested the exit will also
> -         * have set something else (eg exit_request or
> -         * interrupt_request) which we will handle
> -         * next time around the loop.  But we need to
> +         * have set something else (eg interrupt_request) which we
> +         * will handle next time around the loop.  But we need to
>           * ensure the tcg_exit_req read in generated code
>           * comes before the next read of cpu->exit_request
>           * or cpu->interrupt_request.
> @@ -594,15 +593,12 @@ int cpu_exec(CPUState *cpu)
>      current_cpu = cpu;
>  
>      if (cpu_handle_halt(cpu)) {
> +        cpu->exit_request = true;

Why do we need to do this here?

>          return EXCP_HALTED;
>      }
>  
>      rcu_read_lock();
>  
> -    if (unlikely(atomic_mb_read(&exit_request))) {
> -        cpu->exit_request = 1;
> -    }
> -
>      cc->cpu_exec_enter(cpu);
>  
>      /* Calculate difference between guest clock and host clock.
> diff --git a/cpus.c b/cpus.c
> index a139ad3..a84f02c 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1169,6 +1169,18 @@ static int64_t tcg_get_icount_limit(void)
>      }
>  }
>  
> +static void handle_icount_deadline(void)
> +{
> +    if (use_icount) {
> +        int64_t deadline =
> +            qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> +
> +        if (deadline == 0) {
> +            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> +        }
> +    }
> +}
> +
>  static int tcg_cpu_exec(CPUState *cpu)
>  {
>      int ret;
> @@ -1276,11 +1288,11 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>          timer_mod(kick_timer, TCG_KICK_FREQ);
>      }
>  
> -    /* process any pending work */
> -    atomic_mb_set(&exit_request, 1);
> -
>      cpu = first_cpu;
>  
> +    /* process any pending work */
> +    atomic_mb_set(&cpu->exit_request, 1);

Sometimes we use atomic_*() to operate on 'cpu->exit_request', sometimes
not. I am wondering if there are some rules about that?

Kind regards,
Sergey

> +
>      while (1) {
>          /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
>          qemu_account_warp_timer();
> @@ -1289,7 +1301,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>              cpu = first_cpu;
>          }
>  
> -        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
> +        while (cpu && !cpu->exit_request) {
>              atomic_mb_set(&tcg_current_rr_cpu, cpu);
>  
>              qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
> @@ -1303,22 +1315,21 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>                  }
>              } else if (cpu->stop || cpu->stopped) {
>                  break;
> +            } else if (cpu->exit_request) {
> +                break;
>              }
>  
> +            cpu = CPU_NEXT(cpu);
>          } /* for cpu.. */
>          /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
>          atomic_set(&tcg_current_rr_cpu, NULL);
>  
> -        /* Pairs with smp_wmb in qemu_cpu_kick.  */
> -        atomic_mb_set(&exit_request, 0);
> +        if (cpu && cpu->exit_request) {
> +            atomic_mb_set(&cpu->exit_request, 0);
> +        }
>  
> -        if (use_icount) {
> -            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
> +        handle_icount_deadline();
>  
> -            if (deadline == 0) {
> -                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
> -            }
> -        }
>          qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
>      }
>  
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 31f4c38..7919aac 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -405,6 +405,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
>  /* vl.c */
>  extern int singlestep;
>  
> -extern bool exit_request;
> -
>  #endif

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request
  2016-06-28 16:20   ` Sergey Fedorov
@ 2016-08-03 11:42     ` Alex Bennée
  0 siblings, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-08-03 11:42 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 03/06/16 23:40, Alex Bennée wrote:
>> The only remaining use of the global exit_request flag is now to ensure
>> we exit the run_loop when we first start to process pending work. This
>> is just as easily done by setting the first_cpu->exit_request flag.
>>
>> We lightly re-factor the main vCPU thread to ensure cpu->exit_requests
>> cause us to exit the main loop and process any IO requests that might
>> come along.
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>>  cpu-exec-common.c       |  2 --
>>  cpu-exec.c              | 10 +++-------
>>  cpus.c                  | 35 +++++++++++++++++++++++------------
>>  include/exec/exec-all.h |  2 --
>>  4 files changed, 26 insertions(+), 23 deletions(-)
>>
>> diff --git a/cpu-exec-common.c b/cpu-exec-common.c
>> index f42d24a..1b4fb53 100644
>> --- a/cpu-exec-common.c
>> +++ b/cpu-exec-common.c
>> @@ -23,8 +23,6 @@
>>  #include "exec/exec-all.h"
>>  #include "exec/memory-internal.h"
>>
>> -bool exit_request;
>> -
>>  /* exit the current TB from a signal handler. The host registers are
>>     restored in a state compatible with the CPU emulator
>>   */
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 68e804b..1613c63 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -541,9 +541,8 @@ static inline void cpu_loop_exec_tb(CPUState *cpu, TranslationBlock *tb,
>>          /* Something asked us to stop executing
>>           * chained TBs; just continue round the main
>>           * loop. Whatever requested the exit will also
>> -         * have set something else (eg exit_request or
>> -         * interrupt_request) which we will handle
>> -         * next time around the loop.  But we need to
>> +         * have set something else (eg interrupt_request) which we
>> +         * will handle next time around the loop.  But we need to
>>           * ensure the tcg_exit_req read in generated code
>>           * comes before the next read of cpu->exit_request
>>           * or cpu->interrupt_request.
>> @@ -594,15 +593,12 @@ int cpu_exec(CPUState *cpu)
>>      current_cpu = cpu;
>>
>>      if (cpu_handle_halt(cpu)) {
>> +        cpu->exit_request = true;
>
> Why do we need to do this here?

Yeah this seems to be a stray.

>
>>          return EXCP_HALTED;
>>      }
>>
>>      rcu_read_lock();
>>
>> -    if (unlikely(atomic_mb_read(&exit_request))) {
>> -        cpu->exit_request = 1;
>> -    }
>> -
>>      cc->cpu_exec_enter(cpu);
>>
>>      /* Calculate difference between guest clock and host clock.
>> diff --git a/cpus.c b/cpus.c
>> index a139ad3..a84f02c 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -1169,6 +1169,18 @@ static int64_t tcg_get_icount_limit(void)
>>      }
>>  }
>>
>> +static void handle_icount_deadline(void)
>> +{
>> +    if (use_icount) {
>> +        int64_t deadline =
>> +            qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>> +
>> +        if (deadline == 0) {
>> +            qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>> +        }
>> +    }
>> +}
>> +
>>  static int tcg_cpu_exec(CPUState *cpu)
>>  {
>>      int ret;
>> @@ -1276,11 +1288,11 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>          timer_mod(kick_timer, TCG_KICK_FREQ);
>>      }
>>
>> -    /* process any pending work */
>> -    atomic_mb_set(&exit_request, 1);
>> -
>>      cpu = first_cpu;
>>
>> +    /* process any pending work */
>> +    atomic_mb_set(&cpu->exit_request, 1);
>
> Sometimes we use atomic_*() to operate on 'cpu->exit_request', sometimes
> not. I am wondering if there are some rules about that?

We need to ensure the exit_request becomes visible to the guest vCPU in
the right order so the out of run-loop code will process whatever needs
to be done. This is done in cpu_exit with an smp_wmb to ensure
cpu->exit_request is always set when the TCG code exits ong
tcg_exit_request. This is paired with smp_rmb() in the cpu_loop_exec_tb.

I think most of the other sets are safe to do in the context of the vCPU
they are running in. The only other case is ensuring any races between
resetting the flag and setting it are handled cleanly. As cpu_exit
guarantee of a write barrier anything set up for the vCPU to execute
before cpu_exit will be visible as we come out the run loop.

>
> Kind regards,
> Sergey
>
>> +
>>      while (1) {
>>          /* Account partial waits to QEMU_CLOCK_VIRTUAL.  */
>>          qemu_account_warp_timer();
>> @@ -1289,7 +1301,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>              cpu = first_cpu;
>>          }
>>
>> -        for (; cpu != NULL && !exit_request; cpu = CPU_NEXT(cpu)) {
>> +        while (cpu && !cpu->exit_request) {
>>              atomic_mb_set(&tcg_current_rr_cpu, cpu);
>>
>>              qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
>> @@ -1303,22 +1315,21 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>                  }
>>              } else if (cpu->stop || cpu->stopped) {
>>                  break;
>> +            } else if (cpu->exit_request) {
>> +                break;
>>              }
>>
>> +            cpu = CPU_NEXT(cpu);
>>          } /* for cpu.. */
>>          /* Does not need atomic_mb_set because a spurious wakeup is okay.  */
>>          atomic_set(&tcg_current_rr_cpu, NULL);
>>
>> -        /* Pairs with smp_wmb in qemu_cpu_kick.  */
>> -        atomic_mb_set(&exit_request, 0);
>> +        if (cpu && cpu->exit_request) {
>> +            atomic_mb_set(&cpu->exit_request, 0);
>> +        }
>>
>> -        if (use_icount) {
>> -            int64_t deadline = qemu_clock_deadline_ns_all(QEMU_CLOCK_VIRTUAL);
>> +        handle_icount_deadline();
>>
>> -            if (deadline == 0) {
>> -                qemu_clock_notify(QEMU_CLOCK_VIRTUAL);
>> -            }
>> -        }
>>          qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
>>      }
>>
>> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
>> index 31f4c38..7919aac 100644
>> --- a/include/exec/exec-all.h
>> +++ b/include/exec/exec-all.h
>> @@ -405,6 +405,4 @@ bool memory_region_is_unassigned(MemoryRegion *mr);
>>  /* vl.c */
>>  extern int singlestep;
>>
>> -extern bool exit_request;
>> -
>>  #endif


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (13 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-28 16:54   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite,
	Eduardo Habkost, Michael S. Tsirkin

This finally allows TCG to benefit from the iothread introduction: Drop
the global mutex while running pure TCG CPU code. Reacquire the lock
when entering MMIO or PIO emulation, or when leaving the TCG loop.

We have to revert a few optimization for the current TCG threading
model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
kicking it in qemu_cpu_kick. We also need to disable RAM block
reordering until we have a more efficient locking mechanism at hand.

Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
These numbers demonstrate where we gain something:

20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm

The guest CPU was fully loaded, but the iothread could still run mostly
independent on a second core. Without the patch we don't get beyond

32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm

We don't benefit significantly, though, when the guest is not fully
loading a host CPU.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
[FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
[EGC: fixed iothread lock for cpu-exec IRQ handling]
Signed-off-by: Emilio G. Cota <cota@braap.org>
[AJB: -smp single-threaded fix, rm old info from commit msg]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v3 (ajb, base-patches):
  - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
  with it in the longjmp).
  - fix re-base conflicts
v2 (ajb):
  - merge with tcg: grab iothread lock in cpu-exec interrupt handling
  - use existing fns for tracking lock state
  - lock iothread for mem_region
    - add assert on mem region modification
    - ensure smm_helper holds iothread
  - Add JK s-o-b
  - Fix-up FK s-o-b annotation
v1 (ajb, base-patches):
  - SMP failure now fixed by previous commit

Changes from Fred Konrad (mttcg-v7 via paolo):
  * Rebase on the current HEAD.
  * Fixes a deadlock in qemu_devices_reset().
  * Remove the mutex in address_space_*
---
 cpu-exec.c               | 11 +++++++++++
 cpus.c                   | 26 +++-----------------------
 cputlb.c                 |  1 +
 exec.c                   | 12 +++++++++---
 hw/i386/kvmvapic.c       |  4 ++--
 memory.c                 |  2 ++
 softmmu_template.h       | 17 +++++++++++++++++
 target-i386/smm_helper.c |  7 +++++++
 translate-all.c          |  9 +++++++--
 9 files changed, 59 insertions(+), 30 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 1613c63..e1fb9ca 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -29,6 +29,7 @@
 #include "qemu/rcu.h"
 #include "exec/tb-hash.h"
 #include "exec/log.h"
+#include "qemu/main-loop.h"
 #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
 #include "hw/i386/apic.h"
 #endif
@@ -460,6 +461,8 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
     int interrupt_request = cpu->interrupt_request;
 
     if (unlikely(interrupt_request)) {
+        qemu_mutex_lock_iothread();
+
         if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
             /* Mask out external interrupts for this step. */
             interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
@@ -514,6 +517,10 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
                the program flow was changed */
             *last_tb = NULL;
         }
+
+        /* If we exit via cpu_loop_exit/longjmp it is reset in cpu_exec */
+        qemu_mutex_unlock_iothread();
+
     }
     if (unlikely(cpu->exit_request || replay_has_interrupt())) {
         cpu->exit_request = 0;
@@ -642,8 +649,12 @@ int cpu_exec(CPUState *cpu)
             g_assert(cpu == current_cpu);
             g_assert(cc == CPU_GET_CLASS(cpu));
 #endif /* buggy compiler */
+
             cpu->can_do_io = 1;
             tb_lock_reset();
+            if (qemu_mutex_iothread_locked()) {
+                qemu_mutex_unlock_iothread();
+            }
         }
     } /* for(;;) */
 
diff --git a/cpus.c b/cpus.c
index a84f02c..35374fd 100644
--- a/cpus.c
+++ b/cpus.c
@@ -915,8 +915,6 @@ static void qemu_kvm_init_cpu_signals(CPUState *cpu)
 #endif /* _WIN32 */
 
 static QemuMutex qemu_global_mutex;
-static QemuCond qemu_io_proceeded_cond;
-static unsigned iothread_requesting_mutex;
 
 static QemuThread io_thread;
 
@@ -932,7 +930,6 @@ void qemu_init_cpu_loop(void)
     qemu_cond_init(&qemu_cpu_cond);
     qemu_cond_init(&qemu_pause_cond);
     qemu_cond_init(&qemu_work_cond);
-    qemu_cond_init(&qemu_io_proceeded_cond);
     qemu_mutex_init(&qemu_global_mutex);
 
     qemu_thread_get_self(&io_thread);
@@ -1045,10 +1042,6 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    while (iothread_requesting_mutex) {
-        qemu_cond_wait(&qemu_io_proceeded_cond, &qemu_global_mutex);
-    }
-
     CPU_FOREACH(cpu) {
         qemu_wait_io_event_common(cpu);
     }
@@ -1205,7 +1198,9 @@ static int tcg_cpu_exec(CPUState *cpu)
         cpu->icount_decr.u16.low = decr;
         cpu->icount_extra = count;
     }
+    qemu_mutex_unlock_iothread();
     ret = cpu_exec(cpu);
+    qemu_mutex_lock_iothread();
 #ifdef CONFIG_PROFILER
     tcg_time += profile_getclock() - ti;
 #endif
@@ -1392,22 +1387,7 @@ bool qemu_mutex_iothread_locked(void)
 
 void qemu_mutex_lock_iothread(void)
 {
-    atomic_inc(&iothread_requesting_mutex);
-    /* In the simple case there is no need to bump the VCPU thread out of
-     * TCG code execution.
-     */
-    if (!tcg_enabled() || qemu_in_vcpu_thread() ||
-        !first_cpu || !first_cpu->created) {
-        qemu_mutex_lock(&qemu_global_mutex);
-        atomic_dec(&iothread_requesting_mutex);
-    } else {
-        if (qemu_mutex_trylock(&qemu_global_mutex)) {
-            qemu_cpu_kick_rr_cpu();
-            qemu_mutex_lock(&qemu_global_mutex);
-        }
-        atomic_dec(&iothread_requesting_mutex);
-        qemu_cond_broadcast(&qemu_io_proceeded_cond);
-    }
+    qemu_mutex_lock(&qemu_global_mutex);
     iothread_locked = true;
 }
 
diff --git a/cputlb.c b/cputlb.c
index 1ff6354..78e9879 100644
--- a/cputlb.c
+++ b/cputlb.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/exec-all.h"
 #include "exec/memory.h"
diff --git a/exec.c b/exec.c
index e23039c..b7744b9 100644
--- a/exec.c
+++ b/exec.c
@@ -2149,9 +2149,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
                 }
                 cpu->watchpoint_hit = wp;
 
-                /* The tb_lock will be reset when cpu_loop_exit or
-                 * cpu_resume_from_signal longjmp back into the cpu_exec
-                 * main loop.
+                /* Both tb_lock and iothread_mutex will be reset when
+                 * cpu_loop_exit or cpu_resume_from_signal longjmp
+                 * back into the cpu_exec main loop.
                  */
                 tb_lock();
                 tb_check_watchpoint(cpu);
@@ -2387,8 +2387,14 @@ static void io_mem_init(void)
     memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
     memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
                           NULL, UINT64_MAX);
+
+    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
+     * which must be called without the iothread mutex.
+     */
     memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
                           NULL, UINT64_MAX);
+    memory_region_clear_global_locking(&io_mem_notdirty);
+
     memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
                           NULL, UINT64_MAX);
 }
diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c
index d98fe2a..12ed58e 100644
--- a/hw/i386/kvmvapic.c
+++ b/hw/i386/kvmvapic.c
@@ -450,8 +450,8 @@ static void patch_instruction(VAPICROMState *s, X86CPU *cpu, target_ulong ip)
     resume_all_vcpus();
 
     if (!kvm_enabled()) {
-        /* tb_lock will be reset when cpu_resume_from_signal longjmps
-         * back into the cpu_exec loop. */
+        /* Both tb_lock and iothread_mutex will be reset when
+         * cpu_resume_from_signal longjmps back into the cpu_exec loop. */
         tb_lock();
         tb_gen_code(cs, current_pc, current_cs_base, current_flags, 1);
         cpu_resume_from_signal(cs, NULL);
diff --git a/memory.c b/memory.c
index 4e3cda8..2a318d7 100644
--- a/memory.c
+++ b/memory.c
@@ -914,6 +914,8 @@ void memory_region_transaction_commit(void)
     AddressSpace *as;
 
     assert(memory_region_transaction_depth);
+    assert(qemu_mutex_iothread_locked());
+
     --memory_region_transaction_depth;
     if (!memory_region_transaction_depth) {
         if (memory_region_update_pending) {
diff --git a/softmmu_template.h b/softmmu_template.h
index 208f808..0b6c609 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -151,6 +151,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     cpu->mem_io_pc = retaddr;
@@ -159,8 +160,15 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     }
 
     cpu->mem_io_vaddr = addr;
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_read(mr, physaddr, &val, 1 << SHIFT,
                                 iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
     return val;
 }
 #endif
@@ -358,6 +366,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     CPUState *cpu = ENV_GET_CPU(env);
     hwaddr physaddr = iotlbentry->addr;
     MemoryRegion *mr = iotlb_to_region(cpu, physaddr, iotlbentry->attrs);
+    bool locked = false;
 
     physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
     if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
@@ -366,8 +375,16 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 
     cpu->mem_io_vaddr = addr;
     cpu->mem_io_pc = retaddr;
+
+    if (mr->global_locking) {
+        qemu_mutex_lock_iothread();
+        locked = true;
+    }
     memory_region_dispatch_write(mr, physaddr, val, 1 << SHIFT,
                                  iotlbentry->attrs);
+    if (locked) {
+        qemu_mutex_unlock_iothread();
+    }
 }
 
 void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
diff --git a/target-i386/smm_helper.c b/target-i386/smm_helper.c
index 4dd6a2c..6a5489b 100644
--- a/target-i386/smm_helper.c
+++ b/target-i386/smm_helper.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include "cpu.h"
 #include "exec/helper-proto.h"
 #include "exec/log.h"
@@ -42,11 +43,14 @@ void helper_rsm(CPUX86State *env)
 #define SMM_REVISION_ID 0x00020000
 #endif
 
+/* Called we iothread lock taken */
 void cpu_smm_update(X86CPU *cpu)
 {
     CPUX86State *env = &cpu->env;
     bool smm_enabled = (env->hflags & HF_SMM_MASK);
 
+    g_assert(qemu_mutex_iothread_locked());
+
     if (cpu->smram) {
         memory_region_set_enabled(cpu->smram, smm_enabled);
     }
@@ -333,7 +337,10 @@ void helper_rsm(CPUX86State *env)
     }
     env->hflags2 &= ~HF2_SMM_INSIDE_NMI_MASK;
     env->hflags &= ~HF_SMM_MASK;
+
+    qemu_mutex_lock_iothread();
     cpu_smm_update(cpu);
+    qemu_mutex_unlock_iothread();
 
     qemu_log_mask(CPU_LOG_INT, "SMM: after RSM\n");
     log_cpu_state_mask(CPU_LOG_INT, CPU(cpu), CPU_DUMP_CCOP);
diff --git a/translate-all.c b/translate-all.c
index f54ab3e..818520e 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1476,7 +1476,9 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 }
 
 #ifdef CONFIG_SOFTMMU
-/* len must be <= 8 and start must be a multiple of len */
+/* len must be <= 8 and start must be a multiple of len.
+ * Called via softmmu_template.h, with iothread mutex not held.
+ */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
     PageDesc *p;
@@ -1670,7 +1672,10 @@ void tb_check_watchpoint(CPUState *cpu)
 
 #ifndef CONFIG_USER_ONLY
 /* in deterministic execution mode, instructions doing device I/Os
-   must be at the end of the TB */
+ * must be at the end of the TB.
+ *
+ * Called by softmmu_template.h, with iothread mutex not held.
+ */
 void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 {
 #if defined(TARGET_MIPS) || defined(TARGET_SH4)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-06-28 16:54   ` Sergey Fedorov
  2016-08-10 13:51     ` Alex Bennée
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-28 16:54 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Eduardo Habkost,
	Michael S. Tsirkin

On 03/06/16 23:40, Alex Bennée wrote:
>

From: Jan Kiszka <jan.kiszka@siemens.com>

(See http://thread.gmane.org/gmane.comp.emulators.qemu/402092/focus=403090)

> This finally allows TCG to benefit from the iothread introduction: Drop
> the global mutex while running pure TCG CPU code. Reacquire the lock
> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>
> We have to revert a few optimization for the current TCG threading
> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
> kicking it in qemu_cpu_kick. We also need to disable RAM block
> reordering until we have a more efficient locking mechanism at hand.
>
> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
> These numbers demonstrate where we gain something:
>
> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>
> The guest CPU was fully loaded, but the iothread could still run mostly
> independent on a second core. Without the patch we don't get beyond
>
> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>
> We don't benefit significantly, though, when the guest is not fully
> loading a host CPU.
>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> [EGC: fixed iothread lock for cpu-exec IRQ handling]
> Signed-off-by: Emilio G. Cota <cota@braap.org>
> [AJB: -smp single-threaded fix, rm old info from commit msg]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
(snip)
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 1613c63..e1fb9ca 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -29,6 +29,7 @@
>  #include "qemu/rcu.h"
>  #include "exec/tb-hash.h"
>  #include "exec/log.h"
> +#include "qemu/main-loop.h"
>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>  #include "hw/i386/apic.h"
>  #endif
> @@ -460,6 +461,8 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
>      int interrupt_request = cpu->interrupt_request;
>  
>      if (unlikely(interrupt_request)) {
> +        qemu_mutex_lock_iothread();
> +

cpu_handle_halt() for target-i386 also needs to protect
'cpu->interrupt_request' with the global lock.

>          if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>              /* Mask out external interrupts for this step. */
>              interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
> @@ -514,6 +517,10 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
>                 the program flow was changed */
>              *last_tb = NULL;
>          }
> +
> +        /* If we exit via cpu_loop_exit/longjmp it is reset in cpu_exec */
> +        qemu_mutex_unlock_iothread();
> +
>      }
>      if (unlikely(cpu->exit_request || replay_has_interrupt())) {
>          cpu->exit_request = 0;
(snip)
> diff --git a/exec.c b/exec.c
> index e23039c..b7744b9 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -2149,9 +2149,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>                  }
>                  cpu->watchpoint_hit = wp;
>  
> -                /* The tb_lock will be reset when cpu_loop_exit or
> -                 * cpu_resume_from_signal longjmp back into the cpu_exec
> -                 * main loop.
> +                /* Both tb_lock and iothread_mutex will be reset when
> +                 * cpu_loop_exit or cpu_resume_from_signal longjmp
> +                 * back into the cpu_exec main loop.
>                   */
>                  tb_lock();
>                  tb_check_watchpoint(cpu);
> @@ -2387,8 +2387,14 @@ static void io_mem_init(void)
>      memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>                            NULL, UINT64_MAX);
> +
> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
> +     * which must be called without the iothread mutex.

"must" or "can"?

> +     */
>      memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>                            NULL, UINT64_MAX);
> +    memory_region_clear_global_locking(&io_mem_notdirty);
> +
>      memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>                            NULL, UINT64_MAX);
>  }
(snip)
> diff --git a/target-i386/smm_helper.c b/target-i386/smm_helper.c
> index 4dd6a2c..6a5489b 100644
> --- a/target-i386/smm_helper.c
> +++ b/target-i386/smm_helper.c
> @@ -18,6 +18,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/main-loop.h"
>  #include "cpu.h"
>  #include "exec/helper-proto.h"
>  #include "exec/log.h"
> @@ -42,11 +43,14 @@ void helper_rsm(CPUX86State *env)
>  #define SMM_REVISION_ID 0x00020000
>  #endif
>  
> +/* Called we iothread lock taken */

s/we/with/

>  void cpu_smm_update(X86CPU *cpu)
>  {
>      CPUX86State *env = &cpu->env;
>      bool smm_enabled = (env->hflags & HF_SMM_MASK);
>  
> +    g_assert(qemu_mutex_iothread_locked());
> +
>      if (cpu->smram) {
>          memory_region_set_enabled(cpu->smram, smm_enabled);
>      }
> @@ -333,7 +337,10 @@ void helper_rsm(CPUX86State *env)
>      }
>      env->hflags2 &= ~HF2_SMM_INSIDE_NMI_MASK;
>      env->hflags &= ~HF_SMM_MASK;
> +
> +    qemu_mutex_lock_iothread();
>      cpu_smm_update(cpu);
> +    qemu_mutex_unlock_iothread();

I'm wondering if there are some other similar places to take the global
lock.

>  
>      qemu_log_mask(CPU_LOG_INT, "SMM: after RSM\n");
>      log_cpu_state_mask(CPU_LOG_INT, CPU(cpu), CPU_DUMP_CCOP);
(snip)

Kind regards
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution
  2016-06-28 16:54   ` Sergey Fedorov
@ 2016-08-10 13:51     ` Alex Bennée
  0 siblings, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-08-10 13:51 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Eduardo Habkost,
	Michael S. Tsirkin


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 03/06/16 23:40, Alex Bennée wrote:
>>
>
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> (See http://thread.gmane.org/gmane.comp.emulators.qemu/402092/focus=403090)
>
>> This finally allows TCG to benefit from the iothread introduction: Drop
>> the global mutex while running pure TCG CPU code. Reacquire the lock
>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>
>> We have to revert a few optimization for the current TCG threading
>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>> reordering until we have a more efficient locking mechanism at hand.
>>
>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>> These numbers demonstrate where we gain something:
>>
>> 20338 jan       20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>> 20337 jan       20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>
>> The guest CPU was fully loaded, but the iothread could still run mostly
>> independent on a second core. Without the patch we don't get beyond
>>
>> 32206 jan       20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>> 32204 jan       20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>
>> We don't benefit significantly, though, when the guest is not fully
>> loading a host CPU.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> Message-Id: <1439220437-23957-10-git-send-email-fred.konrad@greensocs.com>
>> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> [EGC: fixed iothread lock for cpu-exec IRQ handling]
>> Signed-off-by: Emilio G. Cota <cota@braap.org>
>> [AJB: -smp single-threaded fix, rm old info from commit msg]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
> (snip)
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 1613c63..e1fb9ca 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -29,6 +29,7 @@
>>  #include "qemu/rcu.h"
>>  #include "exec/tb-hash.h"
>>  #include "exec/log.h"
>> +#include "qemu/main-loop.h"
>>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>  #include "hw/i386/apic.h"
>>  #endif
>> @@ -460,6 +461,8 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
>>      int interrupt_request = cpu->interrupt_request;
>>
>>      if (unlikely(interrupt_request)) {
>> +        qemu_mutex_lock_iothread();
>> +
>
> cpu_handle_halt() for target-i386 also needs to protect
> 'cpu->interrupt_request' with the global lock.
>
>>          if (unlikely(cpu->singlestep_enabled & SSTEP_NOIRQ)) {
>>              /* Mask out external interrupts for this step. */
>>              interrupt_request &= ~CPU_INTERRUPT_SSTEP_MASK;
>> @@ -514,6 +517,10 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
>>                 the program flow was changed */
>>              *last_tb = NULL;
>>          }
>> +
>> +        /* If we exit via cpu_loop_exit/longjmp it is reset in cpu_exec */
>> +        qemu_mutex_unlock_iothread();
>> +
>>      }
>>      if (unlikely(cpu->exit_request || replay_has_interrupt())) {
>>          cpu->exit_request = 0;
> (snip)
>> diff --git a/exec.c b/exec.c
>> index e23039c..b7744b9 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -2149,9 +2149,9 @@ static void check_watchpoint(int offset, int len, MemTxAttrs attrs, int flags)
>>                  }
>>                  cpu->watchpoint_hit = wp;
>>
>> -                /* The tb_lock will be reset when cpu_loop_exit or
>> -                 * cpu_resume_from_signal longjmp back into the cpu_exec
>> -                 * main loop.
>> +                /* Both tb_lock and iothread_mutex will be reset when
>> +                 * cpu_loop_exit or cpu_resume_from_signal longjmp
>> +                 * back into the cpu_exec main loop.
>>                   */
>>                  tb_lock();
>>                  tb_check_watchpoint(cpu);
>> @@ -2387,8 +2387,14 @@ static void io_mem_init(void)
>>      memory_region_init_io(&io_mem_rom, NULL, &unassigned_mem_ops, NULL, NULL, UINT64_MAX);
>>      memory_region_init_io(&io_mem_unassigned, NULL, &unassigned_mem_ops, NULL,
>>                            NULL, UINT64_MAX);
>> +
>> +    /* io_mem_notdirty calls tb_invalidate_phys_page_fast,
>> +     * which must be called without the iothread mutex.
>
> "must" or "can"?
>
>> +     */
>>      memory_region_init_io(&io_mem_notdirty, NULL, &notdirty_mem_ops, NULL,
>>                            NULL, UINT64_MAX);
>> +    memory_region_clear_global_locking(&io_mem_notdirty);
>> +
>>      memory_region_init_io(&io_mem_watch, NULL, &watch_mem_ops, NULL,
>>                            NULL, UINT64_MAX);
>>  }
> (snip)
>> diff --git a/target-i386/smm_helper.c b/target-i386/smm_helper.c
>> index 4dd6a2c..6a5489b 100644
>> --- a/target-i386/smm_helper.c
>> +++ b/target-i386/smm_helper.c
>> @@ -18,6 +18,7 @@
>>   */
>>
>>  #include "qemu/osdep.h"
>> +#include "qemu/main-loop.h"
>>  #include "cpu.h"
>>  #include "exec/helper-proto.h"
>>  #include "exec/log.h"
>> @@ -42,11 +43,14 @@ void helper_rsm(CPUX86State *env)
>>  #define SMM_REVISION_ID 0x00020000
>>  #endif
>>
>> +/* Called we iothread lock taken */
>
> s/we/with/
>
>>  void cpu_smm_update(X86CPU *cpu)
>>  {
>>      CPUX86State *env = &cpu->env;
>>      bool smm_enabled = (env->hflags & HF_SMM_MASK);
>>
>> +    g_assert(qemu_mutex_iothread_locked());
>> +
>>      if (cpu->smram) {
>>          memory_region_set_enabled(cpu->smram, smm_enabled);
>>      }
>> @@ -333,7 +337,10 @@ void helper_rsm(CPUX86State *env)
>>      }
>>      env->hflags2 &= ~HF2_SMM_INSIDE_NMI_MASK;
>>      env->hflags &= ~HF_SMM_MASK;
>> +
>> +    qemu_mutex_lock_iothread();
>>      cpu_smm_update(cpu);
>> +    qemu_mutex_unlock_iothread();
>
> I'm wondering if there are some other similar places to take the global
> lock.

The two main routes to messing with the IRQ as through helpers (no lock
held by default) and MMIO (generally the global lock is held unless the
region is explicitly unlocked).

For simplicity for cpu_reset_interrupt I take the lock if it is not
already held. cpu_interrupt assumes the lock is held and asserts it now.

>
>>
>>      qemu_log_mask(CPU_LOG_INT, "SMM: after RSM\n");
>>      log_cpu_state_mask(CPU_LOG_INT, CPU(cpu), CPU_DUMP_CCOP);
> (snip)
>
> Kind regards
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (14 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-28 19:43   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU Alex Bennée
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

While we previously assumed an existing memory lock protected the page
look up in the MTTCG SoftMMU case the memory lock is provided by the
tb_lock. As a result we push the taking of this lock up the call tree.
This requires a slightly different entry for the SoftMMU and user-mode
cases from tb_invalidate_phys_range.

This also means user-mode breakpoint insertion needs to take two locks
but it hadn't taken any previously so this is an improvement.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 exec.c          | 16 ++++++++++++++++
 translate-all.c | 37 +++++++++++++++++++++++++++++--------
 2 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/exec.c b/exec.c
index b7744b9..8bb7481 100644
--- a/exec.c
+++ b/exec.c
@@ -734,7 +734,11 @@ void cpu_exec_init(CPUState *cpu, Error **errp)
 #if defined(CONFIG_USER_ONLY)
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
 {
+    mmap_lock();
+    tb_lock();
     tb_invalidate_phys_page_range(pc, pc + 1, 0);
+    tb_unlock();
+    mmap_unlock();
 }
 #else
 static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
@@ -743,6 +747,7 @@ static void breakpoint_invalidate(CPUState *cpu, target_ulong pc)
     hwaddr phys = cpu_get_phys_page_attrs_debug(cpu, pc, &attrs);
     int asidx = cpu_asidx_from_attrs(cpu, attrs);
     if (phys != -1) {
+        /* Locks grabbed by tb_invalidate_phys_addr */
         tb_invalidate_phys_addr(cpu->cpu_ases[asidx].as,
                                 phys | (pc & ~TARGET_PAGE_MASK));
     }
@@ -2072,7 +2077,11 @@ MemoryRegion *qemu_ram_addr_from_host(void *ptr, ram_addr_t *ram_addr)
 static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
                                uint64_t val, unsigned size)
 {
+    bool locked = false;
+
     if (!cpu_physical_memory_get_dirty_flag(ram_addr, DIRTY_MEMORY_CODE)) {
+        locked = true;
+        tb_lock();
         tb_invalidate_phys_page_fast(ram_addr, size);
     }
     switch (size) {
@@ -2088,6 +2097,11 @@ static void notdirty_mem_write(void *opaque, hwaddr ram_addr,
     default:
         abort();
     }
+
+    if (locked) {
+        tb_unlock();
+    }
+
     /* Set both VGA and migration bits for simplicity and to remove
      * the notdirty callback faster.
      */
@@ -2566,7 +2580,9 @@ static void invalidate_and_set_dirty(MemoryRegion *mr, hwaddr addr,
             cpu_physical_memory_range_includes_clean(addr, length, dirty_log_mask);
     }
     if (dirty_log_mask & (1 << DIRTY_MEMORY_CODE)) {
+        tb_lock();
         tb_invalidate_phys_range(addr, addr + length);
+        tb_unlock();
         dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
     }
     cpu_physical_memory_set_dirty_range(addr, length, dirty_log_mask);
diff --git a/translate-all.c b/translate-all.c
index 818520e..4bc5718 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1355,12 +1355,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
  * access: the virtual CPU will exit the current TB if code is modified inside
  * this TB.
  *
- * Called with mmap_lock held for user-mode emulation
+ * Called with mmap_lock held for user-mode emulation, grabs tb_lock
+ * Called with tb_lock held for system-mode emulation
  */
-void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+static void tb_invalidate_phys_range_1(tb_page_addr_t start, tb_page_addr_t end)
 {
-    assert_memory_lock();
-
     while (start < end) {
         tb_invalidate_phys_page_range(start, end, 0);
         start &= TARGET_PAGE_MASK;
@@ -1368,6 +1367,21 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
     }
 }
 
+#ifdef CONFIG_SOFTMMU
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+    assert_tb_lock();
+    tb_invalidate_phys_range_1(start, end);
+}
+#else
+void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
+{
+    assert_memory_lock();
+    tb_lock();
+    tb_invalidate_phys_range_1(start, end);
+    tb_unlock();
+}
+#endif
 /*
  * Invalidate all TBs which intersect with the target physical address range
  * [start;end[. NOTE: start and end must refer to the *same* physical page.
@@ -1375,7 +1389,8 @@ void tb_invalidate_phys_range(tb_page_addr_t start, tb_page_addr_t end)
  * access: the virtual CPU will exit the current TB if code is modified inside
  * this TB.
  *
- * Called with mmap_lock held for user-mode emulation
+ * Called with tb_lock/mmap_lock held for user-mode emulation
+ * Called with tb_lock held for system-mode emulation
  */
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
                                    int is_cpu_write_access)
@@ -1398,6 +1413,7 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
 #endif /* TARGET_HAS_PRECISE_SMC */
 
     assert_memory_lock();
+    assert_tb_lock();
 
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
@@ -1412,7 +1428,6 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
     /* we remove all the TBs in the range [start, end[ */
     /* XXX: see if in some cases it could be faster to invalidate all
        the code */
-    tb_lock();
     tb = p->first_tb;
     while (tb != NULL) {
         n = (uintptr_t)tb & 3;
@@ -1472,12 +1487,12 @@ void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
         cpu_resume_from_signal(cpu, NULL);
     }
 #endif
-    tb_unlock();
 }
 
 #ifdef CONFIG_SOFTMMU
 /* len must be <= 8 and start must be a multiple of len.
- * Called via softmmu_template.h, with iothread mutex not held.
+ * Called via softmmu_template.h when code areas are written to with
+ * tb_lock held.
  */
 void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
 {
@@ -1492,6 +1507,8 @@ void tb_invalidate_phys_page_fast(tb_page_addr_t start, int len)
                   (intptr_t)cpu_single_env->segs[R_CS].base);
     }
 #endif
+    assert_memory_lock();
+
     p = page_find(start >> TARGET_PAGE_BITS);
     if (!p) {
         return;
@@ -1536,6 +1553,8 @@ static void tb_invalidate_phys_page(tb_page_addr_t addr,
     uint32_t current_flags = 0;
 #endif
 
+    assert_memory_lock();
+
     addr &= TARGET_PAGE_MASK;
     p = page_find(addr >> TARGET_PAGE_BITS);
     if (!p) {
@@ -1641,7 +1660,9 @@ void tb_invalidate_phys_addr(AddressSpace *as, hwaddr addr)
         return;
     }
     ram_addr = memory_region_get_ram_addr(mr) + addr;
+    tb_lock();
     tb_invalidate_phys_page_range(ram_addr, ram_addr + 1, 0);
+    tb_unlock();
     rcu_read_unlock();
 }
 #endif /* !defined(CONFIG_USER_ONLY) */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
@ 2016-06-28 19:43   ` Sergey Fedorov
  2016-06-28 19:51     ` Sergey Fedorov
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-28 19:43 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> While we previously assumed an existing memory lock protected the page
> look up in the MTTCG SoftMMU case the memory lock is provided by the
> tb_lock. As a result we push the taking of this lock up the call tree.
> This requires a slightly different entry for the SoftMMU and user-mode
> cases from tb_invalidate_phys_range.

Sorry, I can't understand the description for the patch :( Some
rewording might be helpful, if you don't mind.

Thanks,
Sergey

> This also means user-mode breakpoint insertion needs to take two locks
> but it hadn't taken any previously so this is an improvement.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>  exec.c          | 16 ++++++++++++++++
>  translate-all.c | 37 +++++++++++++++++++++++++++++--------
>  2 files changed, 45 insertions(+), 8 deletions(-)
(snip)

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up
  2016-06-28 19:43   ` Sergey Fedorov
@ 2016-06-28 19:51     ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-28 19:51 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 28/06/16 22:43, Sergey Fedorov wrote:
> On 03/06/16 23:40, Alex Bennée wrote:
>> While we previously assumed an existing memory lock protected the page
>> look up in the MTTCG SoftMMU case the memory lock is provided by the
>> tb_lock. As a result we push the taking of this lock up the call tree.
>> This requires a slightly different entry for the SoftMMU and user-mode
>> cases from tb_invalidate_phys_range.
> Sorry, I can't understand the description for the patch :( Some
> rewording might be helpful, if you don't mind.

Well, do I understand it right that we're gonna use tb_lock to protect
'l1_map' and PageDesc structures in softmmu mode?

Regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (15 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-29 14:09   ` Sergey Fedorov
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 18/19] tcg: Ensure safe TB lookup out of 'tb_lock' Alex Bennée
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite, Riku Voipio

There are a number of changes that occur at the same time here:

  - tb_lock is no longer a NOP for SoftMMU

  The tb_lock protects both translation and memory map structures. The
  debug assert is updated to reflect this.

  - introduce a single vCPU qemu_tcg_cpu_thread_fn

  One of these is spawned per vCPU with its own Thread and Condition
  variables. qemu_tcg_single_cpu_thread_fn is the new name for the old
  single threaded function.

  - the TLS current_cpu variable is now live for the lifetime of MTTCG
    vCPU threads. This is for future work where async jobs need to know
    the vCPU context they are operating in.

The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

However we still default to a single thread for all vCPUs as individual
front-end and back-ends need additional fixes to safely support:
  - atomic behaviour
  - tb invalidation
  - memory ordering

The function default_mttcg_enabled can be tweaked as support is added.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v1 (ajb):
  - fix merge conflicts
  - maintain single-thread approach
v2
  - re-base fixes (no longer has tb_find_fast lock tweak ahead)
  - remove bogus break condition on cpu->stop/stopped
  - only process exiting cpus exit_request
  - handle all cpus idle case (fixes shutdown issues)
  - sleep on EXCP_HALTED in mttcg mode (prevent crash on start-up)
  - move icount timer into helper
v3
  - update the commit message
  - rm kick_timer tweaks (move to earlier tcg_current_cpu tweaks)
  - ensure linux-user clears cpu->exit_request in loop
  - purging of global exit_request and tcg_current_cpu in earlier patches
  - fix checkpatch warnings
---
 cpu-exec.c        |   8 ----
 cpus.c            | 122 ++++++++++++++++++++++++++++++++++++++++--------------
 linux-user/main.c |   1 +
 translate-all.c   |  18 +++-----
 4 files changed, 98 insertions(+), 51 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index e1fb9ca..5ad3865 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -297,7 +297,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
         goto found;
     }
 
-#ifdef CONFIG_USER_ONLY
     /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
      * taken outside tb_lock.  Since we're momentarily dropping
      * tb_lock, there's a chance that our desired tb has been
@@ -311,14 +310,11 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
         mmap_unlock();
         goto found;
     }
-#endif
 
     /* if no translated code available, then translate it now */
     tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
 
-#ifdef CONFIG_USER_ONLY
     mmap_unlock();
-#endif
 
 found:
     /* we add the TB in the virtual pc hash table */
@@ -523,7 +519,6 @@ static inline void cpu_handle_interrupt(CPUState *cpu,
 
     }
     if (unlikely(cpu->exit_request || replay_has_interrupt())) {
-        cpu->exit_request = 0;
         cpu->exception_index = EXCP_INTERRUPT;
         cpu_loop_exit(cpu);
     }
@@ -661,8 +656,5 @@ int cpu_exec(CPUState *cpu)
     cc->cpu_exec_exit(cpu);
     rcu_read_unlock();
 
-    /* fail safe : never use current_cpu outside cpu_exec() */
-    current_cpu = NULL;
-
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index 35374fd..419caa2 100644
--- a/cpus.c
+++ b/cpus.c
@@ -962,10 +962,7 @@ void run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
 
     qemu_cpu_kick(cpu);
     while (!atomic_mb_read(&wi.done)) {
-        CPUState *self_cpu = current_cpu;
-
         qemu_cond_wait(&qemu_work_cond, &qemu_global_mutex);
-        current_cpu = self_cpu;
     }
 }
 
@@ -1027,13 +1024,13 @@ static void flush_queued_work(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+    atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
         cpu->stop = false;
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
     flush_queued_work(cpu);
-    cpu->thread_kicked = false;
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
@@ -1042,9 +1039,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1111,6 +1106,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
+    current_cpu = cpu;
 
     sigemptyset(&waitset);
     sigaddset(&waitset, SIG_IPI);
@@ -1119,9 +1115,7 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
     cpu->created = true;
     qemu_cond_signal(&qemu_cpu_cond);
 
-    current_cpu = cpu;
     while (1) {
-        current_cpu = NULL;
         qemu_mutex_unlock_iothread();
         do {
             int sig;
@@ -1132,7 +1126,6 @@ static void *qemu_dummy_cpu_thread_fn(void *arg)
             exit(1);
         }
         qemu_mutex_lock_iothread();
-        current_cpu = cpu;
         qemu_wait_io_event_common(cpu);
     }
 
@@ -1249,7 +1242,7 @@ static void kick_tcg_thread(void *opaque)
     qemu_cpu_kick_rr_cpu();
 }
 
-static void *qemu_tcg_cpu_thread_fn(void *arg)
+static void *qemu_tcg_single_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
     QEMUTimer *kick_timer;
@@ -1331,6 +1324,69 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
     return NULL;
 }
 
+/* Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *qemu_tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    /* process any pending work */
+    atomic_mb_set(&cpu->exit_request, 1);
+
+    while (1) {
+        bool sleep = false;
+
+        if (cpu_can_run(cpu)) {
+            int r = tcg_cpu_exec(cpu);
+            switch (r) {
+            case EXCP_DEBUG:
+                cpu_handle_guest_debug(cpu);
+                break;
+            case EXCP_HALTED:
+                /* during start-up the vCPU is reset and the thread is
+                 * kicked several times. If we don't ensure we go back
+                 * to sleep in the halted state we won't cleanly
+                 * start-up when the vCPU is enabled.
+                 */
+                sleep = true;
+                break;
+            default:
+                /* Ignore everything else? */
+                break;
+            }
+        } else {
+            sleep = true;
+        }
+
+        handle_icount_deadline();
+
+        if (sleep) {
+            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
+        }
+
+        atomic_mb_set(&cpu->exit_request, 0);
+        qemu_tcg_wait_io_event(cpu);
+    }
+
+    return NULL;
+}
+
 static void qemu_cpu_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -1355,7 +1411,7 @@ void qemu_cpu_kick(CPUState *cpu)
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
         cpu_exit(cpu);
-        /* Also ensure current RR cpu is kicked */
+        /* NOP unless doing single-thread RR */
         qemu_cpu_kick_rr_cpu();
     } else {
         qemu_cpu_kick_thread(cpu);
@@ -1422,13 +1478,6 @@ void pause_all_vcpus(void)
 
     if (qemu_in_vcpu_thread()) {
         cpu_stop_current();
-        if (!kvm_enabled()) {
-            CPU_FOREACH(cpu) {
-                cpu->stop = false;
-                cpu->stopped = true;
-            }
-            return;
-        }
     }
 
     while (!all_vcpus_paused()) {
@@ -1462,29 +1511,42 @@ void resume_all_vcpus(void)
 static void qemu_tcg_init_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *tcg_halt_cond;
-    static QemuThread *tcg_cpu_thread;
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
 
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
         cpu->halt_cond = g_malloc0(sizeof(QemuCond));
         qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name,
+                               qemu_tcg_single_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
 #ifdef _WIN32
         cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
         while (!cpu->created) {
             qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
         }
-        tcg_cpu_thread = cpu->thread;
     } else {
-        cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
     }
 }
 
diff --git a/linux-user/main.c b/linux-user/main.c
index b2bc6ab..522a1d7 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -207,6 +207,7 @@ static inline void cpu_exec_end(CPUState *cpu)
     }
     exclusive_idle();
     pthread_mutex_unlock(&exclusive_lock);
+    cpu->exit_request = false;
 }
 
 void cpu_list_lock(void)
diff --git a/translate-all.c b/translate-all.c
index 4bc5718..95e5284 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -83,7 +83,11 @@
 #endif
 
 #ifdef CONFIG_SOFTMMU
-#define assert_memory_lock() do { /* nothing */ } while (0)
+#define assert_memory_lock() do {           \
+        if (DEBUG_MEM_LOCKS) {              \
+            g_assert(have_tb_lock);         \
+        }                                   \
+    } while (0)
 #else
 #define assert_memory_lock() do {               \
         if (DEBUG_MEM_LOCKS) {                  \
@@ -147,36 +151,28 @@ static void *l1_map[V_L1_SIZE];
 TCGContext tcg_ctx;
 
 /* translation block context */
-#ifdef CONFIG_USER_ONLY
 __thread int have_tb_lock;
-#endif
 
 void tb_lock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(!have_tb_lock);
     qemu_mutex_lock(&tcg_ctx.tb_ctx.tb_lock);
     have_tb_lock++;
-#endif
 }
 
 void tb_unlock(void)
 {
-#ifdef CONFIG_USER_ONLY
     assert(have_tb_lock);
     have_tb_lock--;
     qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
-#endif
 }
 
 void tb_lock_reset(void)
 {
-#ifdef CONFIG_USER_ONLY
     if (have_tb_lock) {
         qemu_mutex_unlock(&tcg_ctx.tb_ctx.tb_lock);
         have_tb_lock = 0;
     }
-#endif
 }
 
 #ifdef DEBUG_LOCKING
@@ -185,15 +181,11 @@ void tb_lock_reset(void)
 #define DEBUG_TB_LOCKS 0
 #endif
 
-#ifdef CONFIG_SOFTMMU
-#define assert_tb_lock() do { /* nothing */ } while (0)
-#else
 #define assert_tb_lock() do {               \
         if (DEBUG_TB_LOCKS) {               \
             g_assert(have_tb_lock);         \
         }                                   \
     } while (0)
-#endif
 
 
 static TranslationBlock *tb_find_pc(uintptr_t tc_ptr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU Alex Bennée
@ 2016-06-29 14:09   ` Sergey Fedorov
  2016-08-10 14:44     ` Alex Bennée
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-29 14:09 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Riku Voipio

On 03/06/16 23:40, Alex Bennée wrote:
> There are a number of changes that occur at the same time here:
>
>   - tb_lock is no longer a NOP for SoftMMU
>
>   The tb_lock protects both translation and memory map structures. The
>   debug assert is updated to reflect this.

This could be a separate patch.

If we use tb_lock in system-mode to protect the structures protected by
mmap_lock in user-mode then maybe we can merge those two locks because,
as I remember, tb_lock in user-mode emulation is only held outside of
mmap_lock for patching TB for direct jumps.

>
>   - introduce a single vCPU qemu_tcg_cpu_thread_fn
>
>   One of these is spawned per vCPU with its own Thread and Condition
>   variables. qemu_tcg_single_cpu_thread_fn is the new name for the old
>   single threaded function.

So we have 'tcg_current_rr_cpu' and 'qemu_cpu_kick_rr_cpu() at this
moment, maybe name this function like qemu_tcg_rr_cpu_thread_fn()? ;)

>
>   - the TLS current_cpu variable is now live for the lifetime of MTTCG
>     vCPU threads. This is for future work where async jobs need to know
>     the vCPU context they are operating in.

This is important change because we set 'current_cpu' to NULL outside of
cpu_exec() before, I wonder why.

>
> The user to switch on multi-thread behaviour and spawn a thread
> per-vCPU. For a simple test like:
>
>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

It would be nice to mention that the simple test is from kvm_unit_tests.

>
> Will now use 4 vCPU threads and have an expected FAIL (instead of the
> unexpected PASS) as the default mode of the test has no protection when
> incrementing a shared variable.
>
> However we still default to a single thread for all vCPUs as individual
> front-end and back-ends need additional fixes to safely support:
>   - atomic behaviour
>   - tb invalidation
>   - memory ordering
>
> The function default_mttcg_enabled can be tweaked as support is added.
>
> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> [AJB: Some fixes, conditionally, commit rewording]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
(snip)
> diff --git a/cpus.c b/cpus.c
> index 35374fd..419caa2 100644
> --- a/cpus.c
> +++ b/cpus.c
(snip)
> @@ -1042,9 +1039,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>          qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>      }
>  
> -    CPU_FOREACH(cpu) {
> -        qemu_wait_io_event_common(cpu);
> -    }
> +    qemu_wait_io_event_common(cpu);

Is it okay for single-threaded CPU loop?

>  }
>  
>  static void qemu_kvm_wait_io_event(CPUState *cpu)
(snip)
> @@ -1331,6 +1324,69 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>      return NULL;
>  }
>  
> +/* Multi-threaded TCG
> + *
> + * In the multi-threaded case each vCPU has its own thread. The TLS
> + * variable current_cpu can be used deep in the code to find the
> + * current CPUState for a given thread.
> + */
> +
> +static void *qemu_tcg_cpu_thread_fn(void *arg)
> +{
> +    CPUState *cpu = arg;
> +
> +    rcu_register_thread();
> +
> +    qemu_mutex_lock_iothread();
> +    qemu_thread_get_self(cpu->thread);
> +
> +    cpu->thread_id = qemu_get_thread_id();
> +    cpu->created = true;
> +    cpu->can_do_io = 1;
> +    current_cpu = cpu;
> +    qemu_cond_signal(&qemu_cpu_cond);
> +
> +    /* process any pending work */
> +    atomic_mb_set(&cpu->exit_request, 1);
> +
> +    while (1) {
> +        bool sleep = false;
> +
> +        if (cpu_can_run(cpu)) {
> +            int r = tcg_cpu_exec(cpu);
> +            switch (r) {
> +            case EXCP_DEBUG:
> +                cpu_handle_guest_debug(cpu);
> +                break;
> +            case EXCP_HALTED:
> +                /* during start-up the vCPU is reset and the thread is
> +                 * kicked several times. If we don't ensure we go back
> +                 * to sleep in the halted state we won't cleanly
> +                 * start-up when the vCPU is enabled.
> +                 */
> +                sleep = true;
> +                break;
> +            default:
> +                /* Ignore everything else? */
> +                break;
> +            }
> +        } else {
> +            sleep = true;
> +        }
> +
> +        handle_icount_deadline();
> +
> +        if (sleep) {
> +            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
> +        }
> +
> +        atomic_mb_set(&cpu->exit_request, 0);
> +        qemu_tcg_wait_io_event(cpu);

Do we really want to wait in qemu_tcg_wait_io_event() while
"all_cpu_threads_idle()"?

> +    }
> +
> +    return NULL;
> +}
> +
>  static void qemu_cpu_kick_thread(CPUState *cpu)
>  {
>  #ifndef _WIN32
> @@ -1355,7 +1411,7 @@ void qemu_cpu_kick(CPUState *cpu)
>      qemu_cond_broadcast(cpu->halt_cond);
>      if (tcg_enabled()) {
>          cpu_exit(cpu);
> -        /* Also ensure current RR cpu is kicked */
> +        /* NOP unless doing single-thread RR */
>          qemu_cpu_kick_rr_cpu();
>      } else {
>          qemu_cpu_kick_thread(cpu);
> @@ -1422,13 +1478,6 @@ void pause_all_vcpus(void)
>  
>      if (qemu_in_vcpu_thread()) {
>          cpu_stop_current();
> -        if (!kvm_enabled()) {
> -            CPU_FOREACH(cpu) {
> -                cpu->stop = false;
> -                cpu->stopped = true;
> -            }
> -            return;
> -        }

I think this change is incompatible with single-threaded CPU loop as well.

>      }
>  
>      while (!all_vcpus_paused()) {
>
(snip)

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU
  2016-06-29 14:09   ` Sergey Fedorov
@ 2016-08-10 14:44     ` Alex Bennée
  0 siblings, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-08-10 14:44 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite, Riku Voipio


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 03/06/16 23:40, Alex Bennée wrote:
>> There are a number of changes that occur at the same time here:
>>
>>   - tb_lock is no longer a NOP for SoftMMU
>>
>>   The tb_lock protects both translation and memory map structures. The
>>   debug assert is updated to reflect this.
>
> This could be a separate patch.
>
> If we use tb_lock in system-mode to protect the structures protected by
> mmap_lock in user-mode then maybe we can merge those two locks because,
> as I remember, tb_lock in user-mode emulation is only held outside of
> mmap_lock for patching TB for direct jumps.

OK

>
>>
>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn
>>
>>   One of these is spawned per vCPU with its own Thread and Condition
>>   variables. qemu_tcg_single_cpu_thread_fn is the new name for the old
>>   single threaded function.
>
> So we have 'tcg_current_rr_cpu' and 'qemu_cpu_kick_rr_cpu() at this
> moment, maybe name this function like qemu_tcg_rr_cpu_thread_fn()? ;)

OK

>
>>
>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG
>>     vCPU threads. This is for future work where async jobs need to know
>>     the vCPU context they are operating in.
>
> This is important change because we set 'current_cpu' to NULL outside of
> cpu_exec() before, I wonder why.

It's hard to tell, it is not heavily defended. The number of places that
check current_cpu != NULL is fairly limited.

>
>>
>> The user to switch on multi-thread behaviour and spawn a thread
>> per-vCPU. For a simple test like:
>>
>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi
>
> It would be nice to mention that the simple test is from kvm_unit_tests.
>
>>
>> Will now use 4 vCPU threads and have an expected FAIL (instead of the
>> unexpected PASS) as the default mode of the test has no protection when
>> incrementing a shared variable.
>>
>> However we still default to a single thread for all vCPUs as individual
>> front-end and back-ends need additional fixes to safely support:
>>   - atomic behaviour
>>   - tb invalidation
>>   - memory ordering
>>
>> The function default_mttcg_enabled can be tweaked as support is added.
>>
>> Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> [AJB: Some fixes, conditionally, commit rewording]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
> (snip)
>> diff --git a/cpus.c b/cpus.c
>> index 35374fd..419caa2 100644
>> --- a/cpus.c
>> +++ b/cpus.c
> (snip)
>> @@ -1042,9 +1039,7 @@ static void qemu_tcg_wait_io_event(CPUState *cpu)
>>          qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>>      }
>>
>> -    CPU_FOREACH(cpu) {
>> -        qemu_wait_io_event_common(cpu);
>> -    }
>> +    qemu_wait_io_event_common(cpu);
>
> Is it okay for single-threaded CPU loop?
>
>>  }
>>
>>  static void qemu_kvm_wait_io_event(CPUState *cpu)
> (snip)
>> @@ -1331,6 +1324,69 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>>      return NULL;
>>  }
>>
>> +/* Multi-threaded TCG
>> + *
>> + * In the multi-threaded case each vCPU has its own thread. The TLS
>> + * variable current_cpu can be used deep in the code to find the
>> + * current CPUState for a given thread.
>> + */
>> +
>> +static void *qemu_tcg_cpu_thread_fn(void *arg)
>> +{
>> +    CPUState *cpu = arg;
>> +
>> +    rcu_register_thread();
>> +
>> +    qemu_mutex_lock_iothread();
>> +    qemu_thread_get_self(cpu->thread);
>> +
>> +    cpu->thread_id = qemu_get_thread_id();
>> +    cpu->created = true;
>> +    cpu->can_do_io = 1;
>> +    current_cpu = cpu;
>> +    qemu_cond_signal(&qemu_cpu_cond);
>> +
>> +    /* process any pending work */
>> +    atomic_mb_set(&cpu->exit_request, 1);
>> +
>> +    while (1) {
>> +        bool sleep = false;
>> +
>> +        if (cpu_can_run(cpu)) {
>> +            int r = tcg_cpu_exec(cpu);
>> +            switch (r) {
>> +            case EXCP_DEBUG:
>> +                cpu_handle_guest_debug(cpu);
>> +                break;
>> +            case EXCP_HALTED:
>> +                /* during start-up the vCPU is reset and the thread is
>> +                 * kicked several times. If we don't ensure we go back
>> +                 * to sleep in the halted state we won't cleanly
>> +                 * start-up when the vCPU is enabled.
>> +                 */
>> +                sleep = true;
>> +                break;
>> +            default:
>> +                /* Ignore everything else? */
>> +                break;
>> +            }
>> +        } else {
>> +            sleep = true;
>> +        }
>> +
>> +        handle_icount_deadline();
>> +
>> +        if (sleep) {
>> +            qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
>> +        }
>> +
>> +        atomic_mb_set(&cpu->exit_request, 0);
>> +        qemu_tcg_wait_io_event(cpu);
>
> Do we really want to wait in qemu_tcg_wait_io_event() while
> "all_cpu_threads_idle()"?

I've cleaned up this logic.

>
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>>  static void qemu_cpu_kick_thread(CPUState *cpu)
>>  {
>>  #ifndef _WIN32
>> @@ -1355,7 +1411,7 @@ void qemu_cpu_kick(CPUState *cpu)
>>      qemu_cond_broadcast(cpu->halt_cond);
>>      if (tcg_enabled()) {
>>          cpu_exit(cpu);
>> -        /* Also ensure current RR cpu is kicked */
>> +        /* NOP unless doing single-thread RR */
>>          qemu_cpu_kick_rr_cpu();
>>      } else {
>>          qemu_cpu_kick_thread(cpu);
>> @@ -1422,13 +1478,6 @@ void pause_all_vcpus(void)
>>
>>      if (qemu_in_vcpu_thread()) {
>>          cpu_stop_current();
>> -        if (!kvm_enabled()) {
>> -            CPU_FOREACH(cpu) {
>> -                cpu->stop = false;
>> -                cpu->stopped = true;
>> -            }
>> -            return;
>> -        }
>
> I think this change is incompatible with single-threaded CPU loop as
> well.

Why, we already stop and kick the vCPU above so we will exit.

>
>>      }
>>
>>      while (!all_vcpus_paused()) {
>>
> (snip)
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 18/19] tcg: Ensure safe TB lookup out of 'tb_lock'
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (16 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path Alex Bennée
  2016-06-04 14:40 ` [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Pranith Kumar
  19 siblings, 0 replies; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Sergey Fedorov, Peter Crosthwaite

From: Sergey Fedorov <serge.fdrv@gmail.com>

First, ensure atomicity of CPU's 'tb_jmp_cache' access by:
 * using atomic_read() to look up a TB when not holding 'tb_lock';
 * using atomic_write() to remove a TB from each CPU's local cache on
   TB invalidation.

Second, add some memory barriers to ensure we don't put the TB being
invalidated back to CPU's 'tb_jmp_cache'. If we fail to look up a TB in
CPU's local cache because it is being invalidated by some other thread
then it must not be found in the shared TB hash table. Otherwise we'd
put it back to CPU's local cache.

Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.

Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
 cpu-exec.c      | 7 ++++++-
 translate-all.c | 7 ++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 5ad3865..b017643 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -292,6 +292,11 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
 {
     TranslationBlock *tb;
 
+    /* Ensure that we won't find a TB in the shared hash table
+     * if it is being invalidated by some other thread.
+     * Otherwise we'd put it back to CPU's local cache.
+     * Pairs with smp_wmb() in tb_phys_invalidate(). */
+    smp_rmb();
     tb = tb_find_physical(cpu, pc, cs_base, flags);
     if (tb) {
         goto found;
@@ -336,7 +341,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
     tb_lock();
-    tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+    tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
         tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index 95e5284..29a7946 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1071,11 +1071,16 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
         invalidate_page_bitmap(p);
     }
 
+    /* Ensure that we won't find the TB in the shared hash table
+     * if we con't see it in CPU's local cache.
+     * Pairs with smp_rmb() in tb_find_slow(). */
+    smp_wmb();
+
     /* remove the TB from the hash list */
     h = tb_jmp_cache_hash_func(tb->pc);
     CPU_FOREACH(cpu) {
         if (cpu->tb_jmp_cache[h] == tb) {
-            cpu->tb_jmp_cache[h] = NULL;
+            atomic_set(&cpu->tb_jmp_cache[h], NULL);
         }
     }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (17 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 18/19] tcg: Ensure safe TB lookup out of 'tb_lock' Alex Bennée
@ 2016-06-03 20:40 ` Alex Bennée
  2016-06-29 14:35   ` Sergey Fedorov
  2016-06-04 14:40 ` [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Pranith Kumar
  19 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-03 20:40 UTC (permalink / raw)
  To: mttcg, qemu-devel, fred.konrad, a.rigo, serge.fdrv, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Alex Bennée, Peter Crosthwaite

Lock contention in the hot path of moving between existing patched
TranslationBlocks is the main drag on MTTCG performance. This patch
pushes the tb_lock() usage down to the two places that really need it:

  - code generation (tb_gen_code)
  - jump patching (tb_add_jump)

The rest of the code doesn't really need to hold a lock as it is either
using per-CPU structures or designed to be used in concurrent read
situations (qht_lookup).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
v3
  - fix merge conflicts with Sergey's patch
---
 cpu-exec.c | 59 ++++++++++++++++++++++++++++++-----------------------------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index b017643..4af0b52 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -298,41 +298,38 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
      * Pairs with smp_wmb() in tb_phys_invalidate(). */
     smp_rmb();
     tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        goto found;
-    }
+    if (!tb) {
 
-    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
-     * taken outside tb_lock.  Since we're momentarily dropping
-     * tb_lock, there's a chance that our desired tb has been
-     * translated.
-     */
-    tb_unlock();
-    mmap_lock();
-    tb_lock();
-    tb = tb_find_physical(cpu, pc, cs_base, flags);
-    if (tb) {
-        mmap_unlock();
-        goto found;
-    }
+        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+         * taken outside tb_lock.
+         */
+        mmap_lock();
+        tb_lock();
 
-    /* if no translated code available, then translate it now */
-    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        /* There's a chance that our desired tb has been translated while
+         * taking the locks so we check again inside the lock.
+         */
+        tb = tb_find_physical(cpu, pc, cs_base, flags);
+        if (!tb) {
+            /* if no translated code available, then translate it now */
+            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+        }
 
-    mmap_unlock();
+        tb_unlock();
+        mmap_unlock();
+    }
 
-found:
-    /* we add the TB in the virtual pc hash table */
+    /* We add the TB in the virtual pc hash table for the fast lookup */
     cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
     return tb;
 }
 
 static inline TranslationBlock *tb_find_fast(CPUState *cpu,
-                                             TranslationBlock **last_tb,
+                                             TranslationBlock **ltbp,
                                              int tb_exit)
 {
     CPUArchState *env = (CPUArchState *)cpu->env_ptr;
-    TranslationBlock *tb;
+    TranslationBlock *tb, *last_tb;
     target_ulong cs_base, pc;
     uint32_t flags;
 
@@ -340,7 +337,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
        always be the same before a given translated block
        is executed. */
     cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
-    tb_lock();
     tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
     if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
                  tb->flags != flags)) {
@@ -350,7 +346,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
         /* Ensure that no TB jump will be modified as the
          * translation buffer has been flushed.
          */
-        *last_tb = NULL;
+        *ltbp = NULL;
         cpu->tb_flushed = false;
     }
 #ifndef CONFIG_USER_ONLY
@@ -359,14 +355,19 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
      * spanning two pages because the mapping for the second page can change.
      */
     if (tb->page_addr[1] != -1) {
-        *last_tb = NULL;
+        *ltbp = NULL;
     }
 #endif
+
     /* See if we can patch the calling TB. */
-    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
-        tb_add_jump(*last_tb, tb_exit, tb);
+    last_tb = *ltbp;
+    if (!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN) &&
+        last_tb &&
+        !last_tb->jmp_list_next[tb_exit]) {
+        tb_lock();
+        tb_add_jump(last_tb, tb_exit, tb);
+        tb_unlock();
     }
-    tb_unlock();
     return tb;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path Alex Bennée
@ 2016-06-29 14:35   ` Sergey Fedorov
  2016-06-29 14:47     ` Alex Bennée
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-29 14:35 UTC (permalink / raw)
  To: Alex Bennée, mttcg, qemu-devel, fred.konrad, a.rigo, cota,
	bobby.prani
  Cc: mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 03/06/16 23:40, Alex Bennée wrote:
> Lock contention in the hot path of moving between existing patched
> TranslationBlocks is the main drag on MTTCG performance. This patch
> pushes the tb_lock() usage down to the two places that really need it:
>
>   - code generation (tb_gen_code)
>   - jump patching (tb_add_jump)
>
> The rest of the code doesn't really need to hold a lock as it is either
> using per-CPU structures or designed to be used in concurrent read
> situations (qht_lookup).
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>
> ---
> v3
>   - fix merge conflicts with Sergey's patch
> ---
>  cpu-exec.c | 59 ++++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 30 insertions(+), 29 deletions(-)
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index b017643..4af0b52 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -298,41 +298,38 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>       * Pairs with smp_wmb() in tb_phys_invalidate(). */
>      smp_rmb();
>      tb = tb_find_physical(cpu, pc, cs_base, flags);
> -    if (tb) {
> -        goto found;
> -    }
> +    if (!tb) {
>  
> -    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
> -     * taken outside tb_lock.  Since we're momentarily dropping
> -     * tb_lock, there's a chance that our desired tb has been
> -     * translated.
> -     */
> -    tb_unlock();
> -    mmap_lock();
> -    tb_lock();
> -    tb = tb_find_physical(cpu, pc, cs_base, flags);
> -    if (tb) {
> -        mmap_unlock();
> -        goto found;
> -    }
> +        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
> +         * taken outside tb_lock.
> +         */
> +        mmap_lock();
> +        tb_lock();
>  
> -    /* if no translated code available, then translate it now */
> -    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
> +        /* There's a chance that our desired tb has been translated while
> +         * taking the locks so we check again inside the lock.
> +         */
> +        tb = tb_find_physical(cpu, pc, cs_base, flags);
> +        if (!tb) {
> +            /* if no translated code available, then translate it now */
> +            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
> +        }
>  
> -    mmap_unlock();
> +        tb_unlock();
> +        mmap_unlock();
> +    }
>  
> -found:
> -    /* we add the TB in the virtual pc hash table */
> +    /* We add the TB in the virtual pc hash table for the fast lookup */
>      cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;

Hmm, seems like I forgot to convert this into atomic_set() in the
previous patch...

>      return tb;
>  }
>  
>  static inline TranslationBlock *tb_find_fast(CPUState *cpu,
> -                                             TranslationBlock **last_tb,
> +                                             TranslationBlock **ltbp,

I'm not sure if it is more readable...

>                                               int tb_exit)
>  {
>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
> -    TranslationBlock *tb;
> +    TranslationBlock *tb, *last_tb;
>      target_ulong cs_base, pc;
>      uint32_t flags;
>  
> @@ -340,7 +337,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>         always be the same before a given translated block
>         is executed. */
>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
> -    tb_lock();
>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>                   tb->flags != flags)) {
> @@ -350,7 +346,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>          /* Ensure that no TB jump will be modified as the
>           * translation buffer has been flushed.
>           */
> -        *last_tb = NULL;
> +        *ltbp = NULL;
>          cpu->tb_flushed = false;
>      }
>  #ifndef CONFIG_USER_ONLY
> @@ -359,14 +355,19 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>       * spanning two pages because the mapping for the second page can change.
>       */
>      if (tb->page_addr[1] != -1) {
> -        *last_tb = NULL;
> +        *ltbp = NULL;
>      }
>  #endif
> +
>      /* See if we can patch the calling TB. */
> -    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
> -        tb_add_jump(*last_tb, tb_exit, tb);
> +    last_tb = *ltbp;
> +    if (!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN) &&
> +        last_tb &&
> +        !last_tb->jmp_list_next[tb_exit]) {

If we're going to check this outside of tb_lock we have to do this with
atomic_{read,set}(). However, I think it is rare case to race on
tb_add_jump() so probably it is okay to do the check under tb_lock.

> +        tb_lock();
> +        tb_add_jump(last_tb, tb_exit, tb);
> +        tb_unlock();
>      }
> -    tb_unlock();
>      return tb;
>  }
>  

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path
  2016-06-29 14:35   ` Sergey Fedorov
@ 2016-06-29 14:47     ` Alex Bennée
  2016-06-29 14:52       ` Sergey Fedorov
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-29 14:47 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 03/06/16 23:40, Alex Bennée wrote:
>> Lock contention in the hot path of moving between existing patched
>> TranslationBlocks is the main drag on MTTCG performance. This patch
>> pushes the tb_lock() usage down to the two places that really need it:
>>
>>   - code generation (tb_gen_code)
>>   - jump patching (tb_add_jump)
>>
>> The rest of the code doesn't really need to hold a lock as it is either
>> using per-CPU structures or designed to be used in concurrent read
>> situations (qht_lookup).
>>
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>
>> ---
>> v3
>>   - fix merge conflicts with Sergey's patch
>> ---
>>  cpu-exec.c | 59 ++++++++++++++++++++++++++++++-----------------------------
>>  1 file changed, 30 insertions(+), 29 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index b017643..4af0b52 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -298,41 +298,38 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>>       * Pairs with smp_wmb() in tb_phys_invalidate(). */
>>      smp_rmb();
>>      tb = tb_find_physical(cpu, pc, cs_base, flags);
>> -    if (tb) {
>> -        goto found;
>> -    }
>> +    if (!tb) {
>>
>> -    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>> -     * taken outside tb_lock.  Since we're momentarily dropping
>> -     * tb_lock, there's a chance that our desired tb has been
>> -     * translated.
>> -     */
>> -    tb_unlock();
>> -    mmap_lock();
>> -    tb_lock();
>> -    tb = tb_find_physical(cpu, pc, cs_base, flags);
>> -    if (tb) {
>> -        mmap_unlock();
>> -        goto found;
>> -    }
>> +        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>> +         * taken outside tb_lock.
>> +         */
>> +        mmap_lock();
>> +        tb_lock();
>>
>> -    /* if no translated code available, then translate it now */
>> -    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>> +        /* There's a chance that our desired tb has been translated while
>> +         * taking the locks so we check again inside the lock.
>> +         */
>> +        tb = tb_find_physical(cpu, pc, cs_base, flags);
>> +        if (!tb) {
>> +            /* if no translated code available, then translate it now */
>> +            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>> +        }
>>
>> -    mmap_unlock();
>> +        tb_unlock();
>> +        mmap_unlock();
>> +    }
>>
>> -found:
>> -    /* we add the TB in the virtual pc hash table */
>> +    /* We add the TB in the virtual pc hash table for the fast lookup */
>>      cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
>
> Hmm, seems like I forgot to convert this into atomic_set() in the
> previous patch...

OK, can you fix that in your quick fixes series?

>
>>      return tb;
>>  }
>>
>>  static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>> -                                             TranslationBlock **last_tb,
>> +                                             TranslationBlock **ltbp,
>
> I'm not sure if it is more readable...

I'll revert. I was trying to keep line lengths short :-/

>
>>                                               int tb_exit)
>>  {
>>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>> -    TranslationBlock *tb;
>> +    TranslationBlock *tb, *last_tb;
>>      target_ulong cs_base, pc;
>>      uint32_t flags;
>>
>> @@ -340,7 +337,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>         always be the same before a given translated block
>>         is executed. */
>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>> -    tb_lock();
>>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>                   tb->flags != flags)) {
>> @@ -350,7 +346,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>          /* Ensure that no TB jump will be modified as the
>>           * translation buffer has been flushed.
>>           */
>> -        *last_tb = NULL;
>> +        *ltbp = NULL;
>>          cpu->tb_flushed = false;
>>      }
>>  #ifndef CONFIG_USER_ONLY
>> @@ -359,14 +355,19 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>       * spanning two pages because the mapping for the second page can change.
>>       */
>>      if (tb->page_addr[1] != -1) {
>> -        *last_tb = NULL;
>> +        *ltbp = NULL;
>>      }
>>  #endif
>> +
>>      /* See if we can patch the calling TB. */
>> -    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>> -        tb_add_jump(*last_tb, tb_exit, tb);
>> +    last_tb = *ltbp;
>> +    if (!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN) &&
>> +        last_tb &&
>> +        !last_tb->jmp_list_next[tb_exit]) {
>
> If we're going to check this outside of tb_lock we have to do this with
> atomic_{read,set}(). However, I think it is rare case to race on
> tb_add_jump() so probably it is okay to do the check under tb_lock.

It's checking for NULL, it gets re-checked in tb_add_jump while under
lock so I the setting race should be OK I think.

>
>> +        tb_lock();
>> +        tb_add_jump(last_tb, tb_exit, tb);
>> +        tb_unlock();
>>      }
>> -    tb_unlock();
>>      return tb;
>>  }
>>
>
> Kind regards,
> Sergey


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path
  2016-06-29 14:47     ` Alex Bennée
@ 2016-06-29 14:52       ` Sergey Fedorov
  2016-06-29 16:08         ` Alex Bennée
  0 siblings, 1 reply; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-29 14:52 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 29/06/16 17:47, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 03/06/16 23:40, Alex Bennée wrote:
>>> Lock contention in the hot path of moving between existing patched
>>> TranslationBlocks is the main drag on MTTCG performance. This patch
>>> pushes the tb_lock() usage down to the two places that really need it:
>>>
>>>   - code generation (tb_gen_code)
>>>   - jump patching (tb_add_jump)
>>>
>>> The rest of the code doesn't really need to hold a lock as it is either
>>> using per-CPU structures or designed to be used in concurrent read
>>> situations (qht_lookup).
>>>
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>
>>> ---
>>> v3
>>>   - fix merge conflicts with Sergey's patch
>>> ---
>>>  cpu-exec.c | 59 ++++++++++++++++++++++++++++++-----------------------------
>>>  1 file changed, 30 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index b017643..4af0b52 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -298,41 +298,38 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>>>       * Pairs with smp_wmb() in tb_phys_invalidate(). */
>>>      smp_rmb();
>>>      tb = tb_find_physical(cpu, pc, cs_base, flags);
>>> -    if (tb) {
>>> -        goto found;
>>> -    }
>>> +    if (!tb) {
>>>
>>> -    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>>> -     * taken outside tb_lock.  Since we're momentarily dropping
>>> -     * tb_lock, there's a chance that our desired tb has been
>>> -     * translated.
>>> -     */
>>> -    tb_unlock();
>>> -    mmap_lock();
>>> -    tb_lock();
>>> -    tb = tb_find_physical(cpu, pc, cs_base, flags);
>>> -    if (tb) {
>>> -        mmap_unlock();
>>> -        goto found;
>>> -    }
>>> +        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>>> +         * taken outside tb_lock.
>>> +         */
>>> +        mmap_lock();
>>> +        tb_lock();
>>>
>>> -    /* if no translated code available, then translate it now */
>>> -    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>>> +        /* There's a chance that our desired tb has been translated while
>>> +         * taking the locks so we check again inside the lock.
>>> +         */
>>> +        tb = tb_find_physical(cpu, pc, cs_base, flags);
>>> +        if (!tb) {
>>> +            /* if no translated code available, then translate it now */
>>> +            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>>> +        }
>>>
>>> -    mmap_unlock();
>>> +        tb_unlock();
>>> +        mmap_unlock();
>>> +    }
>>>
>>> -found:
>>> -    /* we add the TB in the virtual pc hash table */
>>> +    /* We add the TB in the virtual pc hash table for the fast lookup */
>>>      cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
>> Hmm, seems like I forgot to convert this into atomic_set() in the
>> previous patch...
> OK, can you fix that in your quick fixes series?

Sure. I think that patch and this are both ready-to-go into mainline.

>
>>>      return tb;
>>>  }
>>>
>>>  static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>> -                                             TranslationBlock **last_tb,
>>> +                                             TranslationBlock **ltbp,
>> I'm not sure if it is more readable...
> I'll revert. I was trying to keep line lengths short :-/
>
>>>                                               int tb_exit)
>>>  {
>>>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>> -    TranslationBlock *tb;
>>> +    TranslationBlock *tb, *last_tb;
>>>      target_ulong cs_base, pc;
>>>      uint32_t flags;
>>>
>>> @@ -340,7 +337,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>         always be the same before a given translated block
>>>         is executed. */
>>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>> -    tb_lock();
>>>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>>                   tb->flags != flags)) {
>>> @@ -350,7 +346,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>          /* Ensure that no TB jump will be modified as the
>>>           * translation buffer has been flushed.
>>>           */
>>> -        *last_tb = NULL;
>>> +        *ltbp = NULL;
>>>          cpu->tb_flushed = false;
>>>      }
>>>  #ifndef CONFIG_USER_ONLY
>>> @@ -359,14 +355,19 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>       * spanning two pages because the mapping for the second page can change.
>>>       */
>>>      if (tb->page_addr[1] != -1) {
>>> -        *last_tb = NULL;
>>> +        *ltbp = NULL;
>>>      }
>>>  #endif
>>> +
>>>      /* See if we can patch the calling TB. */
>>> -    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>> -        tb_add_jump(*last_tb, tb_exit, tb);
>>> +    last_tb = *ltbp;
>>> +    if (!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN) &&
>>> +        last_tb &&
>>> +        !last_tb->jmp_list_next[tb_exit]) {
>> If we're going to check this outside of tb_lock we have to do this with
>> atomic_{read,set}(). However, I think it is rare case to race on
>> tb_add_jump() so probably it is okay to do the check under tb_lock.
> It's checking for NULL, it gets re-checked in tb_add_jump while under
> lock so I the setting race should be OK I think.

I think we could just skip this check and be fine. What do you think
regarding this?

Thanks,
Sergey

>
>>> +        tb_lock();
>>> +        tb_add_jump(last_tb, tb_exit, tb);
>>> +        tb_unlock();
>>>      }
>>> -    tb_unlock();
>>>      return tb;
>>>  }
>>>
>> Kind regards,
>> Sergey
>
> --
> Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path
  2016-06-29 14:52       ` Sergey Fedorov
@ 2016-06-29 16:08         ` Alex Bennée
  2016-06-30  9:24           ` Sergey Fedorov
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Bennée @ 2016-06-29 16:08 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite


Sergey Fedorov <serge.fdrv@gmail.com> writes:

> On 29/06/16 17:47, Alex Bennée wrote:
>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>
>>> On 03/06/16 23:40, Alex Bennée wrote:
>>>> Lock contention in the hot path of moving between existing patched
>>>> TranslationBlocks is the main drag on MTTCG performance. This patch
>>>> pushes the tb_lock() usage down to the two places that really need it:
>>>>
>>>>   - code generation (tb_gen_code)
>>>>   - jump patching (tb_add_jump)
>>>>
>>>> The rest of the code doesn't really need to hold a lock as it is either
>>>> using per-CPU structures or designed to be used in concurrent read
>>>> situations (qht_lookup).
>>>>
>>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>>>
>>>> ---
>>>> v3
>>>>   - fix merge conflicts with Sergey's patch
>>>> ---
>>>>  cpu-exec.c | 59 ++++++++++++++++++++++++++++++-----------------------------
>>>>  1 file changed, 30 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>>> index b017643..4af0b52 100644
>>>> --- a/cpu-exec.c
>>>> +++ b/cpu-exec.c
>>>> @@ -298,41 +298,38 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
>>>>       * Pairs with smp_wmb() in tb_phys_invalidate(). */
>>>>      smp_rmb();
>>>>      tb = tb_find_physical(cpu, pc, cs_base, flags);
>>>> -    if (tb) {
>>>> -        goto found;
>>>> -    }
>>>> +    if (!tb) {
>>>>
>>>> -    /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>>>> -     * taken outside tb_lock.  Since we're momentarily dropping
>>>> -     * tb_lock, there's a chance that our desired tb has been
>>>> -     * translated.
>>>> -     */
>>>> -    tb_unlock();
>>>> -    mmap_lock();
>>>> -    tb_lock();
>>>> -    tb = tb_find_physical(cpu, pc, cs_base, flags);
>>>> -    if (tb) {
>>>> -        mmap_unlock();
>>>> -        goto found;
>>>> -    }
>>>> +        /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
>>>> +         * taken outside tb_lock.
>>>> +         */
>>>> +        mmap_lock();
>>>> +        tb_lock();
>>>>
>>>> -    /* if no translated code available, then translate it now */
>>>> -    tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>>>> +        /* There's a chance that our desired tb has been translated while
>>>> +         * taking the locks so we check again inside the lock.
>>>> +         */
>>>> +        tb = tb_find_physical(cpu, pc, cs_base, flags);
>>>> +        if (!tb) {
>>>> +            /* if no translated code available, then translate it now */
>>>> +            tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
>>>> +        }
>>>>
>>>> -    mmap_unlock();
>>>> +        tb_unlock();
>>>> +        mmap_unlock();
>>>> +    }
>>>>
>>>> -found:
>>>> -    /* we add the TB in the virtual pc hash table */
>>>> +    /* We add the TB in the virtual pc hash table for the fast lookup */
>>>>      cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
>>> Hmm, seems like I forgot to convert this into atomic_set() in the
>>> previous patch...
>> OK, can you fix that in your quick fixes series?
>
> Sure. I think that patch and this are both ready-to-go into mainline.
>
>>
>>>>      return tb;
>>>>  }
>>>>
>>>>  static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>> -                                             TranslationBlock **last_tb,
>>>> +                                             TranslationBlock **ltbp,
>>> I'm not sure if it is more readable...
>> I'll revert. I was trying to keep line lengths short :-/
>>
>>>>                                               int tb_exit)
>>>>  {
>>>>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>>> -    TranslationBlock *tb;
>>>> +    TranslationBlock *tb, *last_tb;
>>>>      target_ulong cs_base, pc;
>>>>      uint32_t flags;
>>>>
>>>> @@ -340,7 +337,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>         always be the same before a given translated block
>>>>         is executed. */
>>>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>>> -    tb_lock();
>>>>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>>>                   tb->flags != flags)) {
>>>> @@ -350,7 +346,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>          /* Ensure that no TB jump will be modified as the
>>>>           * translation buffer has been flushed.
>>>>           */
>>>> -        *last_tb = NULL;
>>>> +        *ltbp = NULL;
>>>>          cpu->tb_flushed = false;
>>>>      }
>>>>  #ifndef CONFIG_USER_ONLY
>>>> @@ -359,14 +355,19 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>       * spanning two pages because the mapping for the second page can change.
>>>>       */
>>>>      if (tb->page_addr[1] != -1) {
>>>> -        *last_tb = NULL;
>>>> +        *ltbp = NULL;
>>>>      }
>>>>  #endif
>>>> +
>>>>      /* See if we can patch the calling TB. */
>>>> -    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>>> -        tb_add_jump(*last_tb, tb_exit, tb);
>>>> +    last_tb = *ltbp;
>>>> +    if (!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN) &&
>>>> +        last_tb &&
>>>> +        !last_tb->jmp_list_next[tb_exit]) {
>>> If we're going to check this outside of tb_lock we have to do this with
>>> atomic_{read,set}(). However, I think it is rare case to race on
>>> tb_add_jump() so probably it is okay to do the check under tb_lock.
>> It's checking for NULL, it gets re-checked in tb_add_jump while under
>> lock so I the setting race should be OK I think.
>
> I think we could just skip this check and be fine. What do you think
> regarding this?

I think that means we'll take the lock more frequently than we want to.
I'll have to check.

>
> Thanks,
> Sergey
>
>>
>>>> +        tb_lock();
>>>> +        tb_add_jump(last_tb, tb_exit, tb);
>>>> +        tb_unlock();
>>>>      }
>>>> -    tb_unlock();
>>>>      return tb;
>>>>  }
>>>>
>>> Kind regards,
>>> Sergey
>>
>> --
>> Alex Bennée


--
Alex Bennée

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path
  2016-06-29 16:08         ` Alex Bennée
@ 2016-06-30  9:24           ` Sergey Fedorov
  0 siblings, 0 replies; 62+ messages in thread
From: Sergey Fedorov @ 2016-06-30  9:24 UTC (permalink / raw)
  To: Alex Bennée
  Cc: mttcg, qemu-devel, fred.konrad, a.rigo, cota, bobby.prani,
	mark.burton, pbonzini, jan.kiszka, rth, peter.maydell,
	claudio.fontana, Peter Crosthwaite

On 29/06/16 19:08, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 29/06/16 17:47, Alex Bennée wrote:
>>> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>>>
>>>> On 03/06/16 23:40, Alex Bennée wrote:
(snip)
>>>>>
>>>>>  static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>> -                                             TranslationBlock **last_tb,
>>>>> +                                             TranslationBlock **ltbp,
>>>> I'm not sure if it is more readable...
>>> I'll revert. I was trying to keep line lengths short :-/
>>>
>>>>>                                               int tb_exit)
>>>>>  {
>>>>>      CPUArchState *env = (CPUArchState *)cpu->env_ptr;
>>>>> -    TranslationBlock *tb;
>>>>> +    TranslationBlock *tb, *last_tb;
>>>>>      target_ulong cs_base, pc;
>>>>>      uint32_t flags;
>>>>>
>>>>> @@ -340,7 +337,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>>         always be the same before a given translated block
>>>>>         is executed. */
>>>>>      cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
>>>>> -    tb_lock();
>>>>>      tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
>>>>>      if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
>>>>>                   tb->flags != flags)) {
>>>>> @@ -350,7 +346,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>>          /* Ensure that no TB jump will be modified as the
>>>>>           * translation buffer has been flushed.
>>>>>           */
>>>>> -        *last_tb = NULL;
>>>>> +        *ltbp = NULL;
>>>>>          cpu->tb_flushed = false;
>>>>>      }
>>>>>  #ifndef CONFIG_USER_ONLY
>>>>> @@ -359,14 +355,19 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
>>>>>       * spanning two pages because the mapping for the second page can change.
>>>>>       */
>>>>>      if (tb->page_addr[1] != -1) {
>>>>> -        *last_tb = NULL;
>>>>> +        *ltbp = NULL;
>>>>>      }
>>>>>  #endif
>>>>> +
>>>>>      /* See if we can patch the calling TB. */
>>>>> -    if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
>>>>> -        tb_add_jump(*last_tb, tb_exit, tb);
>>>>> +    last_tb = *ltbp;
>>>>> +    if (!qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN) &&
>>>>> +        last_tb &&
>>>>> +        !last_tb->jmp_list_next[tb_exit]) {
>>>> If we're going to check this outside of tb_lock we have to do this with
>>>> atomic_{read,set}(). However, I think it is rare case to race on
>>>> tb_add_jump() so probably it is okay to do the check under tb_lock.
>>> It's checking for NULL, it gets re-checked in tb_add_jump while under
>>> lock so I the setting race should be OK I think.
>> I think we could just skip this check and be fine. What do you think
>> regarding this?
> I think that means we'll take the lock more frequently than we want to.
> I'll have to check.

If two blocks have already been chained then we don't go here, otherwise
the chances that some other thread has just done the chaining of the
blocks are rare, I think.

Regards,
Sergey

>>>>> +        tb_lock();
>>>>> +        tb_add_jump(last_tb, tb_exit, tb);
>>>>> +        tb_unlock();
>>>>>      }
>>>>> -    tb_unlock();
>>>>>      return tb;
>>>>>  }
>>>>>
>>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG
  2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
                   ` (18 preceding siblings ...)
  2016-06-03 20:40 ` [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path Alex Bennée
@ 2016-06-04 14:40 ` Pranith Kumar
  19 siblings, 0 replies; 62+ messages in thread
From: Pranith Kumar @ 2016-06-04 14:40 UTC (permalink / raw)
  To: Alex Bennée
  Cc: MTTCG Devel, qemu-devel, KONRAD Frédéric, alvise rigo,
	Sergey Fedorov, Emilio G. Cota, Mark Burton, Paolo Bonzini,
	J. Kiszka, Richard Henderson, Peter Maydell, Claudio Fontana

Hi Alex,

On Fri, Jun 3, 2016 at 4:40 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
> This is the third iteration of the RFC patch set which aims to provide
> the basic framework for MTTCG. There have been some considerable
> changes since the last extensive review (thanks to all the reviewers).
>
>    - many updates to docs/multi-thread-tcg.txt design document
>    - added assertions for all the locking requirements
>    - split apart the big enable thread-per-vCPU patch
>    - removed locking from the hot-path
>
> In general the main thread functions are a lot less messy (especially
> the single thread variant). The splitting apart of the big enabling
> patch was helped by removing tcg_current_cpu and the global
> exit_request as separate patches. Finally the big performance boost of
> a lockless hot-path is made possible by Emilio's QHT work which this
> is based on.
>
> The branch can be found at:
>
>   https://github.com/stsquad/qemu/tree/mttcg/base-patches-v3

FYI, I tried booting a debian armv7 image with this branch and it
doesn't boot. I'll try to see why it is failing.

Thanks,
--
Pranith.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2016-08-10 14:44 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-03 20:40 [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 01/19] cpus: make all_vcpus_paused() return bool Alex Bennée
2016-06-07 15:05   ` Richard Henderson
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 02/19] translate_all: DEBUG_FLUSH -> DEBUG_TB_FLUSH Alex Bennée
2016-06-07 14:54   ` Richard Henderson
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 03/19] translate-all: add DEBUG_LOCKING asserts Alex Bennée
2016-06-23 14:34   ` Sergey Fedorov
2016-06-23 17:14     ` Alex Bennée
2016-06-23 18:43       ` Sergey Fedorov
2016-07-01 23:21   ` Richard Henderson
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 04/19] docs: new design document multi-thread-tcg.txt (DRAFTING) Alex Bennée
2016-06-23 21:33   ` Sergey Fedorov
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 05/19] exec: add assert_debug_safe and notes on debug structures Alex Bennée
2016-06-24 15:28   ` Sergey Fedorov
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 06/19] tcg: comment on which functions have to be called with tb_lock held Alex Bennée
2016-06-24 15:38   ` Sergey Fedorov
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 07/19] translate-all: Add assert_memory_lock annotations Alex Bennée
2016-06-24 15:48   ` Sergey Fedorov
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 08/19] tcg: protect TBContext with tb_lock Alex Bennée
2016-07-01 23:40   ` Richard Henderson
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 09/19] target-arm/arm-powerctl: wake up sleeping CPUs Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 10/19] tcg: cpus rm tcg_exec_all() Alex Bennée
2016-06-24 17:09   ` Sergey Fedorov
2016-07-01 23:50   ` Richard Henderson
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 11/19] tcg: add options for enabling MTTCG Alex Bennée
2016-06-27 21:07   ` Sergey Fedorov
2016-07-22 16:17     ` Alex Bennée
2016-07-01 23:53   ` Richard Henderson
2016-07-02  7:11     ` Alex Bennée
2016-07-02  7:38       ` Sergey Fedorov
2016-07-04 10:10         ` Paolo Bonzini
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 12/19] tcg: add kick timer for single-threaded vCPU emulation Alex Bennée
2016-06-27 21:20   ` Sergey Fedorov
2016-07-02  0:17     ` Richard Henderson
2016-07-02  7:36       ` Sergey Fedorov
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 13/19] tcg: rename tcg_current_cpu to tcg_current_rr_cpu Alex Bennée
2016-06-06 15:30   ` Paolo Bonzini
2016-06-06 16:05     ` Alex Bennée
2016-06-06 17:05       ` Paolo Bonzini
2016-06-06 17:26         ` Alex Bennée
2016-06-06 18:25           ` Paolo Bonzini
2016-06-07 12:59   ` Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 14/19] tcg: remove global exit_request Alex Bennée
2016-06-28 16:20   ` Sergey Fedorov
2016-08-03 11:42     ` Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 15/19] tcg: drop global lock during TCG code execution Alex Bennée
2016-06-28 16:54   ` Sergey Fedorov
2016-08-10 13:51     ` Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 16/19] tcg: move locking for tb_invalidate_phys_page_range up Alex Bennée
2016-06-28 19:43   ` Sergey Fedorov
2016-06-28 19:51     ` Sergey Fedorov
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 17/19] tcg: enable thread-per-vCPU Alex Bennée
2016-06-29 14:09   ` Sergey Fedorov
2016-08-10 14:44     ` Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 18/19] tcg: Ensure safe TB lookup out of 'tb_lock' Alex Bennée
2016-06-03 20:40 ` [Qemu-devel] [RFC v3 19/19] cpu-exec: remove tb_lock from the hot-path Alex Bennée
2016-06-29 14:35   ` Sergey Fedorov
2016-06-29 14:47     ` Alex Bennée
2016-06-29 14:52       ` Sergey Fedorov
2016-06-29 16:08         ` Alex Bennée
2016-06-30  9:24           ` Sergey Fedorov
2016-06-04 14:40 ` [Qemu-devel] [RFC v3 00/19] Base enabling patches for MTTCG Pranith Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).