[PULL 0/8] tcg patch queue

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PULL 0/8] tcg patch queue
@ 2020-02-29  2:43 Richard Henderson
  2020-02-29  2:43 ` [PULL 1/8] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) Richard Henderson
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The following changes since commit e0175b71638cf4398903c0d25f93fe62e0606389:

  Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging (2020-02-28 16:39:27 +0000)

are available in the Git repository at:

  https://github.com/rth7680/qemu.git tags/pull-tcg-20200228

for you to fetch changes up to 600e17b261555c56a048781b8dd5ba3985650013:

  accel/tcg: increase default code gen buffer size for 64 bit (2020-02-28 17:43:31 -0800)

----------------------------------------------------------------
Fix race in cpu_exec_step_atomic.
Work around compile failure with -fno-inine.
Expand tcg/arm epilogue inline.
Adjustments to the default code gen buffer size.

----------------------------------------------------------------
Alex Bennée (5):
      accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
      accel/tcg: use units.h for defining code gen buffer sizes
      accel/tcg: remove link between guest ram and TCG cache size
      accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
      accel/tcg: increase default code gen buffer size for 64 bit

Richard Henderson (2):
      tcg/arm: Split out tcg_out_epilogue
      tcg/arm: Expand epilogue inline

Zenghui Yu (1):
      compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined

 include/qemu/compiler.h   |  2 +-
 accel/tcg/cpu-exec.c      | 21 ++++++++--------
 accel/tcg/translate-all.c | 61 ++++++++++++++++++++++++++++-------------------
 tcg/arm/tcg-target.inc.c  | 29 ++++++++++------------
 4 files changed, 60 insertions(+), 53 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PULL 1/8] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 2/8] compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined Richard Henderson
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Alex Bennée, Yifan, Paolo Bonzini

From: Alex Bennée <alex.bennee@linaro.org>

The bug describes a race whereby cpu_exec_step_atomic can acquire a TB
which is invalidated by a tb_flush before we execute it. This doesn't
affect the other cpu_exec modes as a tb_flush by it's nature can only
occur on a quiescent system. The race was described as:

  B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
  B3. tcg_tb_alloc obtains a new TB

      C3. TB obtained with tb_lookup__cpu_state or tb_gen_code
          (same TB as B2)

          A3. start_exclusive critical section entered
          A4. do_tb_flush is called, TB memory freed/re-allocated
          A5. end_exclusive exits critical section

  B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code
  B3. tcg_tb_alloc reallocates TB from B2

      C4. start_exclusive critical section entered
      C5. cpu_tb_exec executes the TB code that was free in A4

The simplest fix is to widen the exclusive period to include the TB
lookup. As a result we can drop the complication of checking we are in
the exclusive region before we end it.

Cc: Yifan <me@yifanlu.com>
Buglink: https://bugs.launchpad.net/qemu/+bug/1863025
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20200214144952.15502-1-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cpu-exec.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 2560c90eec..d95c4848a4 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -240,6 +240,8 @@ void cpu_exec_step_atomic(CPUState *cpu)
     uint32_t cf_mask = cflags & CF_HASH_MASK;
 
     if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+        start_exclusive();
+
         tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags, cf_mask);
         if (tb == NULL) {
             mmap_lock();
@@ -247,8 +249,6 @@ void cpu_exec_step_atomic(CPUState *cpu)
             mmap_unlock();
         }
 
-        start_exclusive();
-
         /* Since we got here, we know that parallel_cpus must be true.  */
         parallel_cpus = false;
         cc->cpu_exec_enter(cpu);
@@ -271,14 +271,15 @@ void cpu_exec_step_atomic(CPUState *cpu)
         qemu_plugin_disable_mem_helpers(cpu);
     }
 
-    if (cpu_in_exclusive_context(cpu)) {
-        /* We might longjump out of either the codegen or the
-         * execution, so must make sure we only end the exclusive
-         * region if we started it.
-         */
-        parallel_cpus = true;
-        end_exclusive();
-    }
+
+    /*
+     * As we start the exclusive region before codegen we must still
+     * be in the region if we longjump out of either the codegen or
+     * the execution.
+     */
+    g_assert(cpu_in_exclusive_context(cpu));
+    parallel_cpus = true;
+    end_exclusive();
 }
 
 struct tb_desc {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 2/8] compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
  2020-02-29  2:43 ` [PULL 1/8] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 3/8] tcg/arm: Split out tcg_out_epilogue Richard Henderson
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zenghui Yu, peter.maydell, Euler Robot

From: Zenghui Yu <yuzenghui@huawei.com>

Our robot reported the following compile-time warning while compiling
Qemu with -fno-inline cflags:

In function 'load_memop',
    inlined from 'load_helper' at /qemu/accel/tcg/cputlb.c:1578:20,
    inlined from 'full_ldub_mmu' at /qemu/accel/tcg/cputlb.c:1624:12:
/qemu/accel/tcg/cputlb.c:1502:9: error: call to 'qemu_build_not_reached' declared with attribute error: code path is reachable
         qemu_build_not_reached();
         ^~~~~~~~~~~~~~~~~~~~~~~~
    [...]

It looks like a false-positive because only (MO_UB ^ MO_BSWAP) will
hit the default case in load_memop() while need_swap (size > 1) has
already ensured that MO_UB is not involved.

So the thing is that compilers get confused by the -fno-inline and
just can't accurately evaluate memop_size(op) at compile time, and
then the qemu_build_not_reached() is wrongly triggered by (MO_UB ^
MO_BSWAP).  Let's carefully don't use the compile-time assert when
no functions will be inlined into their callers.

Reported-by: Euler Robot <euler.robot@huawei.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
Message-Id: <20200205141545.180-1-yuzenghui@huawei.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/compiler.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/qemu/compiler.h b/include/qemu/compiler.h
index 85c02c16d3..c76281f354 100644
--- a/include/qemu/compiler.h
+++ b/include/qemu/compiler.h
@@ -236,7 +236,7 @@
  * supports QEMU_ERROR, this will be reported at compile time; otherwise
  * this will be reported at link time due to the missing symbol.
  */
-#ifdef __OPTIMIZE__
+#if defined(__OPTIMIZE__) && !defined(__NO_INLINE__)
 extern void QEMU_NORETURN QEMU_ERROR("code path is reachable")
     qemu_build_not_reached(void);
 #else
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 3/8] tcg/arm: Split out tcg_out_epilogue
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
  2020-02-29  2:43 ` [PULL 1/8] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) Richard Henderson
  2020-02-29  2:43 ` [PULL 2/8] compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 4/8] tcg/arm: Expand epilogue inline Richard Henderson
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Philippe Mathieu-Daudé, Richard Henderson

From: Richard Henderson <rth@twiddle.net>

We will shortly use this function from tcg_out_op as well.

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index fffb6611e2..e1aa740ba4 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1746,6 +1746,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 }
 
 static tcg_insn_unit *tb_ret_addr;
+static void tcg_out_epilogue(TCGContext *s);
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                 const TCGArg *args, const int *const_args)
@@ -2284,19 +2285,17 @@ static void tcg_out_nop_fill(tcg_insn_unit *p, int count)
       + TCG_TARGET_STACK_ALIGN - 1) \
      & -TCG_TARGET_STACK_ALIGN)
 
+#define STACK_ADDEND  (FRAME_SIZE - PUSH_SIZE)
+
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    int stack_addend;
-
     /* Calling convention requires us to save r4-r11 and lr.  */
     /* stmdb sp!, { r4 - r11, lr } */
     tcg_out32(s, (COND_AL << 28) | 0x092d4ff0);
 
     /* Reserve callee argument and tcg temp space.  */
-    stack_addend = FRAME_SIZE - PUSH_SIZE;
-
     tcg_out_dat_rI(s, COND_AL, ARITH_SUB, TCG_REG_CALL_STACK,
-                   TCG_REG_CALL_STACK, stack_addend, 1);
+                   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
     tcg_set_frame(s, TCG_REG_CALL_STACK, TCG_STATIC_CALL_ARGS_SIZE,
                   CPU_TEMP_BUF_NLONGS * sizeof(long));
 
@@ -2310,11 +2309,15 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      */
     s->code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
-
-    /* TB epilogue */
     tb_ret_addr = s->code_ptr;
+    tcg_out_epilogue(s);
+}
+
+static void tcg_out_epilogue(TCGContext *s)
+{
+    /* Release local stack frame.  */
     tcg_out_dat_rI(s, COND_AL, ARITH_ADD, TCG_REG_CALL_STACK,
-                   TCG_REG_CALL_STACK, stack_addend, 1);
+                   TCG_REG_CALL_STACK, STACK_ADDEND, 1);
 
     /* ldmia sp!, { r4 - r11, pc } */
     tcg_out32(s, (COND_AL << 28) | 0x08bd8ff0);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 4/8] tcg/arm: Expand epilogue inline
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
                   ` (2 preceding siblings ...)
  2020-02-29  2:43 ` [PULL 3/8] tcg/arm: Split out tcg_out_epilogue Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 5/8] accel/tcg: use units.h for defining code gen buffer sizes Richard Henderson
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Richard Henderson

From: Richard Henderson <rth@twiddle.net>

It is, after all, just two instructions.

Profiling on a cortex-a15, using -d nochain to increase the number
of exit_tb that are executed, shows a minor improvement of 0.5%.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index e1aa740ba4..6aa7757aac 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -1745,7 +1745,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 #endif
 }
 
-static tcg_insn_unit *tb_ret_addr;
 static void tcg_out_epilogue(TCGContext *s);
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
@@ -1756,14 +1755,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     switch (opc) {
     case INDEX_op_exit_tb:
-        /* Reuse the zeroing that exists for goto_ptr.  */
-        a0 = args[0];
-        if (a0 == 0) {
-            tcg_out_goto(s, COND_AL, s->code_gen_epilogue);
-        } else {
-            tcg_out_movi32(s, COND_AL, TCG_REG_R0, args[0]);
-            tcg_out_goto(s, COND_AL, tb_ret_addr);
-        }
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, args[0]);
+        tcg_out_epilogue(s);
         break;
     case INDEX_op_goto_tb:
         {
@@ -2309,7 +2302,6 @@ static void tcg_target_qemu_prologue(TCGContext *s)
      */
     s->code_gen_epilogue = s->code_ptr;
     tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, 0);
-    tb_ret_addr = s->code_ptr;
     tcg_out_epilogue(s);
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 5/8] accel/tcg: use units.h for defining code gen buffer sizes
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
                   ` (3 preceding siblings ...)
  2020-02-29  2:43 ` [PULL 4/8] tcg/arm: Expand epilogue inline Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 6/8] accel/tcg: remove link between guest ram and TCG cache size Richard Henderson
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Niek Linnenbank, Alex Bennée,
	Philippe Mathieu-Daudé

From: Alex Bennée <alex.bennee@linaro.org>

It's easier to read.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20200228192415.19867-2-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index a08ab11f65..238b0e575b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "qemu-common.h"
 
 #define NO_CPU_IO_DEFS
@@ -901,33 +902,33 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
    but not so small that we can't have a fair number of TB's live.  */
-#define MIN_CODE_GEN_BUFFER_SIZE     (1024u * 1024)
+#define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
 
 /* Maximum size of the code gen buffer we'd like to use.  Unless otherwise
    indicated, this is constrained by the range of direct branches on the
    host cpu, as used by the TCG implementation of goto_tb.  */
 #if defined(__x86_64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__sparc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__powerpc64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__powerpc__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (32u * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (32 * MiB)
 #elif defined(__aarch64__)
-# define MAX_CODE_GEN_BUFFER_SIZE  (2ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (2 * GiB)
 #elif defined(__s390x__)
   /* We have a +- 4GB range on the branches; leave some slop.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (3ul * 1024 * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (3 * GiB)
 #elif defined(__mips__)
   /* We have a 256MB branch region, but leave room to make sure the
      main executable is also within that region.  */
-# define MAX_CODE_GEN_BUFFER_SIZE  (128ul * 1024 * 1024)
+# define MAX_CODE_GEN_BUFFER_SIZE  (128 * MiB)
 #else
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
-#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32u * 1024 * 1024)
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 6/8] accel/tcg: remove link between guest ram and TCG cache size
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
                   ` (4 preceding siblings ...)
  2020-02-29  2:43 ` [PULL 5/8] accel/tcg: use units.h for defining code gen buffer sizes Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 7/8] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Richard Henderson
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Niek Linnenbank, Philippe Mathieu-Daudé,
	Alex Bennée, Igor Mammedov

From: Alex Bennée <alex.bennee@linaro.org>

Basing the TB cache size on the ram_size was always a little heuristic
and was broken by a1b18df9a4 which caused ram_size not to be fully
realised at the time we initialise the TCG translation cache.

The current DEFAULT_CODE_GEN_BUFFER_SIZE may still be a little small
but follow-up patches will address that.

Fixes: a1b18df9a4
Cc: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-Id: <20200228192415.19867-3-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 238b0e575b..5b66af783b 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -938,15 +938,7 @@ static inline size_t size_code_gen_buffer(size_t tb_size)
 {
     /* Size the buffer.  */
     if (tb_size == 0) {
-#ifdef USE_STATIC_CODE_GEN_BUFFER
         tb_size = DEFAULT_CODE_GEN_BUFFER_SIZE;
-#else
-        /* ??? Needs adjustments.  */
-        /* ??? If we relax the requirement that CONFIG_USER_ONLY use the
-           static buffer, we could size this on RESERVED_VA, on the text
-           segment size of the executable, or continue to use the default.  */
-        tb_size = (unsigned long)(ram_size / 4);
-#endif
     }
     if (tb_size < MIN_CODE_GEN_BUFFER_SIZE) {
         tb_size = MIN_CODE_GEN_BUFFER_SIZE;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 7/8] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
                   ` (5 preceding siblings ...)
  2020-02-29  2:43 ` [PULL 6/8] accel/tcg: remove link between guest ram and TCG cache size Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-02-29  2:43 ` [PULL 8/8] accel/tcg: increase default code gen buffer size for 64 bit Richard Henderson
  2020-03-02 13:00 ` [PULL 0/8] tcg patch queue Peter Maydell
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Niek Linnenbank, Alex Bennée,
	Philippe Mathieu-Daudé

From: Alex Bennée <alex.bennee@linaro.org>

There is no particular reason to use a static codegen buffer on 64 bit
hosts as we have address space to burn. Allow the common CONFIG_USER
case to use the mmap'ed buffers like SoftMMU.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-Id: <20200228192415.19867-4-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 5b66af783b..4ce5d1b393 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -892,11 +892,12 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#if defined(CONFIG_USER_ONLY)
-/* Currently it is not recommended to allocate big chunks of data in
-   user mode. It will change when a dedicated libc will be used.  */
-/* ??? 64-bit hosts ought to have no problem mmaping data outside the
-   region in which the guest needs to run.  Revisit this.  */
+#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
 #define USE_STATIC_CODE_GEN_BUFFER
 #endif
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PULL 8/8] accel/tcg: increase default code gen buffer size for 64 bit
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
                   ` (6 preceding siblings ...)
  2020-02-29  2:43 ` [PULL 7/8] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Richard Henderson
@ 2020-02-29  2:43 ` Richard Henderson
  2020-03-02 13:00 ` [PULL 0/8] tcg patch queue Peter Maydell
  8 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2020-02-29  2:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Niek Linnenbank, Alex Bennée

From: Alex Bennée <alex.bennee@linaro.org>

While 32mb is certainly usable a full system boot ends up flushing the
codegen buffer nearly 100 times. Increase the default on 64 bit hosts
to take advantage of all that spare memory. After this change I can
boot my tests system without any TB flushes.

As we usually run more CONFIG_USER binaries at a time in typical usage
we aren't quite as profligate for user-mode code generation usage. We
also bring the static code gen defies to the same place to keep all
the reasoning in the comments together.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Reviewed-by: Niek Linnenbank <nieklinnenbank@gmail.com>
Message-Id: <20200228192415.19867-5-alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/translate-all.c | 35 ++++++++++++++++++++++++++---------
 1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 4ce5d1b393..78914154bf 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -892,15 +892,6 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
     }
 }
 
-#if defined(CONFIG_USER_ONLY) && TCG_TARGET_REG_BITS == 32
-/*
- * For user mode on smaller 32 bit systems we may run into trouble
- * allocating big chunks of data in the right place. On these systems
- * we utilise a static code generation buffer directly in the binary.
- */
-#define USE_STATIC_CODE_GEN_BUFFER
-#endif
-
 /* Minimum size of the code gen buffer.  This number is randomly chosen,
    but not so small that we can't have a fair number of TB's live.  */
 #define MIN_CODE_GEN_BUFFER_SIZE     (1 * MiB)
@@ -929,7 +920,33 @@ static void page_lock_pair(PageDesc **ret_p1, tb_page_addr_t phys1,
 # define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 #endif
 
+#if TCG_TARGET_REG_BITS == 32
 #define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (32 * MiB)
+#ifdef CONFIG_USER_ONLY
+/*
+ * For user mode on smaller 32 bit systems we may run into trouble
+ * allocating big chunks of data in the right place. On these systems
+ * we utilise a static code generation buffer directly in the binary.
+ */
+#define USE_STATIC_CODE_GEN_BUFFER
+#endif
+#else /* TCG_TARGET_REG_BITS == 64 */
+#ifdef CONFIG_USER_ONLY
+/*
+ * As user-mode emulation typically means running multiple instances
+ * of the translator don't go too nuts with our default code gen
+ * buffer lest we make things too hard for the OS.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (128 * MiB)
+#else
+/*
+ * We expect most system emulation to run one or two guests per host.
+ * Users running large scale system emulation may want to tweak their
+ * runtime setup via the tb-size control on the command line.
+ */
+#define DEFAULT_CODE_GEN_BUFFER_SIZE_1 (1 * GiB)
+#endif
+#endif
 
 #define DEFAULT_CODE_GEN_BUFFER_SIZE \
   (DEFAULT_CODE_GEN_BUFFER_SIZE_1 < MAX_CODE_GEN_BUFFER_SIZE \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PULL 0/8] tcg patch queue
  2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
                   ` (7 preceding siblings ...)
  2020-02-29  2:43 ` [PULL 8/8] accel/tcg: increase default code gen buffer size for 64 bit Richard Henderson
@ 2020-03-02 13:00 ` Peter Maydell
  8 siblings, 0 replies; 10+ messages in thread
From: Peter Maydell @ 2020-03-02 13:00 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers

On Sat, 29 Feb 2020 at 02:43, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The following changes since commit e0175b71638cf4398903c0d25f93fe62e0606389:
>
>   Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20200228' into staging (2020-02-28 16:39:27 +0000)
>
> are available in the Git repository at:
>
>   https://github.com/rth7680/qemu.git tags/pull-tcg-20200228
>
> for you to fetch changes up to 600e17b261555c56a048781b8dd5ba3985650013:
>
>   accel/tcg: increase default code gen buffer size for 64 bit (2020-02-28 17:43:31 -0800)
>
> ----------------------------------------------------------------
> Fix race in cpu_exec_step_atomic.
> Work around compile failure with -fno-inine.
> Expand tcg/arm epilogue inline.
> Adjustments to the default code gen buffer size.
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.0
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-03-02 13:01 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-02-29  2:43 [PULL 0/8] tcg patch queue Richard Henderson
2020-02-29  2:43 ` [PULL 1/8] accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025) Richard Henderson
2020-02-29  2:43 ` [PULL 2/8] compiler.h: Don't use compile-time assert when __NO_INLINE__ is defined Richard Henderson
2020-02-29  2:43 ` [PULL 3/8] tcg/arm: Split out tcg_out_epilogue Richard Henderson
2020-02-29  2:43 ` [PULL 4/8] tcg/arm: Expand epilogue inline Richard Henderson
2020-02-29  2:43 ` [PULL 5/8] accel/tcg: use units.h for defining code gen buffer sizes Richard Henderson
2020-02-29  2:43 ` [PULL 6/8] accel/tcg: remove link between guest ram and TCG cache size Richard Henderson
2020-02-29  2:43 ` [PULL 7/8] accel/tcg: only USE_STATIC_CODE_GEN_BUFFER on 32 bit hosts Richard Henderson
2020-02-29  2:43 ` [PULL 8/8] accel/tcg: increase default code gen buffer size for 64 bit Richard Henderson
2020-03-02 13:00 ` [PULL 0/8] tcg patch queue Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).