qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64
@ 2013-08-05 18:28 Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
                   ` (15 more replies)
  0 siblings, 16 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel

About half of these patches are focused on reducing the number of
full 64-bit constants that need to be generated for addresses:

E.g. patch 5, looking through the function descriptor.  If the
program is built --disable-pie, the elements of the function
descriptors are all 32-bit constants.

E.g. the end result of indirect jump threading + TCG_REG_TB.
Before, we reserve 6 insn slots to generate the full 64-bit address.
After, we use 2 insns -- addis + ld -- to load the full 64-bit
address from the indirection slot.

The second patch could probably be reverted.  I'd planned to be
able to use the same conditional call + tail call scheme as ARM,
but I'd forgotten the need for a conditional store to go along
with that.  OTOH, it might still turn out to be useful somewhere.


r~


Richard Henderson (15):
  tcg-ppc64: Avoid code for nop move
  tcg-ppc64: Add an LK argument to tcg_out_call
  tcg-ppc64: Use the branch absolute instruction when possible
  tcg-ppc64: Don't load the static chain from TCG
  tcg-ppc64: Look through the function descriptor when profitable
  tcg-ppc64: Move AREG0 to r31
  tcg-ppc64: Tidy register allocation order
  tcg-ppc64: Create PowerOpcode
  tcg-ppc64: Handle long offsets better
  tcg-ppc64: Use indirect jump threading
  tcg-ppc64: Setup TCG_REG_TB
  tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long
  tcg-ppc64: Tidy tcg_target_qemu_prologue
  tcg-ppc64: Streamline tcg_out_tlb_read
  tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION

 configure               |    2 +-
 include/exec/exec-all.h |    7 +-
 tcg/ppc64/tcg-target.c  | 1079 ++++++++++++++++++++++++++---------------------
 tcg/ppc64/tcg-target.h  |    2 +-
 4 files changed, 598 insertions(+), 492 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call Richard Henderson
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 0678de2..0e3147b 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -508,7 +508,9 @@ static const uint32_t tcg_to_isel[] = {
 static inline void tcg_out_mov(TCGContext *s, TCGType type,
                                TCGReg ret, TCGReg arg)
 {
-    tcg_out32 (s, OR | SAB (arg, ret, arg));
+    if (ret != arg) {
+        tcg_out32 (s, OR | SAB (arg, ret, arg));
+    }
 }
 
 static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible Richard Henderson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

This will enable the generation of tail-calls in a future patch.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 0e3147b..94960a3 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -702,30 +702,30 @@ static void tcg_out_b (TCGContext *s, int mask, tcg_target_long target)
     }
 }
 
-static void tcg_out_call (TCGContext *s, tcg_target_long arg, int const_arg)
+/* Make a call to a function.  LK = LK for a normal call, or 0 to avoid
+   setting the link register, making a tail call.  */
+static void tcg_out_call(TCGContext *s, tcg_target_long arg,
+                         int const_arg, int lk)
 {
 #ifdef __APPLE__
     if (const_arg) {
-        tcg_out_b (s, LK, arg);
-    }
-    else {
-        tcg_out32 (s, MTSPR | RS (arg) | LR);
-        tcg_out32 (s, BCLR | BO_ALWAYS | LK);
+        tcg_out_b(s, lk, arg);
+    } else {
+        tcg_out32(s, MTSPR | RS(arg) | CTR);
+        tcg_out32(s, BCCTR | BO_ALWAYS | lk);
     }
 #else
-    int reg;
-
+    TCGReg reg = arg;
     if (const_arg) {
-        reg = 2;
-        tcg_out_movi (s, TCG_TYPE_I64, reg, arg);
+        reg = TCG_REG_R2;
+        tcg_out_movi(s, TCG_TYPE_I64, reg, arg);
     }
-    else reg = arg;
 
-    tcg_out32 (s, LD | RT (0) | RA (reg));
-    tcg_out32 (s, MTSPR | RA (0) | CTR);
-    tcg_out32 (s, LD | RT (11) | RA (reg) | 16);
-    tcg_out32 (s, LD | RT (2) | RA (reg) | 8);
-    tcg_out32 (s, BCCTR | BO_ALWAYS | LK);
+    tcg_out32(s, LD | TAI(TCG_REG_R0, reg, 0));
+    tcg_out32(s, MTSPR | RA(TCG_REG_R0) | CTR);
+    tcg_out32(s, LD | TAI(TCG_REG_R11, reg, 16));
+    tcg_out32(s, LD | TAI(TCG_REG_R2, reg, 8));
+    tcg_out32(s, BCCTR | BO_ALWAYS | lk);
 #endif
 }
 
@@ -869,7 +869,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     tcg_out_mov (s, TCG_TYPE_I64, ir++, addr_reg);
     tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
 
-    tcg_out_call (s, (tcg_target_long) qemu_ld_helpers[s_bits], 1);
+    tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
 
     if (opc & 4) {
         insn = qemu_exts_opc[s_bits];
@@ -960,7 +960,7 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
     tcg_out_rld (s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
     tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
 
-    tcg_out_call (s, (tcg_target_long) qemu_st_helpers[opc], 1);
+    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
 
     label2_ptr = s->code_ptr;
     tcg_out32 (s, B);
@@ -1440,7 +1440,7 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         break;
     case INDEX_op_call:
-        tcg_out_call (s, args[0], const_args[0]);
+        tcg_out_call(s, args[0], const_args[0], LK);
         break;
     case INDEX_op_movi_i32:
         tcg_out_movi (s, TCG_TYPE_I32, args[0], args[1]);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG Richard Henderson
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

... before falling back to an indirect branch.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 31 ++++++++++++++++++-------------
 1 file changed, 18 insertions(+), 13 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 94960a3..fce3e5d 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -173,13 +173,17 @@ static const int tcg_target_callee_save_regs[] = {
     TCG_REG_R31
 };
 
+static inline bool in_range_b(intptr_t disp)
+{
+    return disp >= -0x4000000 && disp < 0x4000000;
+}
+
 static uint32_t reloc_pc24_val (void *pc, tcg_target_long target)
 {
     tcg_target_long disp;
 
     disp = target - (tcg_target_long) pc;
-    if ((disp << 38) >> 38 != disp)
-        tcg_abort ();
+    assert(in_range_b(disp));
 
     return disp & 0x3fffffc;
 }
@@ -195,8 +199,7 @@ static uint16_t reloc_pc14_val (void *pc, tcg_target_long target)
     tcg_target_long disp;
 
     disp = target - (tcg_target_long) pc;
-    if (disp != (int16_t) disp)
-        tcg_abort ();
+    assert(disp == (int16_t)disp);
 
     return disp & 0xfffc;
 }
@@ -454,6 +457,7 @@ static int tcg_target_const_match (tcg_target_long val,
 #define FXM(b) (1 << (19 - (b)))
 
 #define LK    1
+#define AA    2
 
 #define TAB(t, a, b) (RT(t) | RA(a) | RB(b))
 #define SAB(s, a, b) (RS(s) | RA(a) | RB(b))
@@ -688,17 +692,18 @@ static void tcg_out_xori32(TCGContext *s, TCGReg dst, TCGReg src, uint32_t c)
     tcg_out_zori32(s, dst, src, c, XORI, XORIS);
 }
 
-static void tcg_out_b (TCGContext *s, int mask, tcg_target_long target)
+static void tcg_out_b(TCGContext *s, int lk, tcg_target_long target)
 {
-    tcg_target_long disp;
+    tcg_target_long disp = target - (tcg_target_long) s->code_ptr;
 
-    disp = target - (tcg_target_long) s->code_ptr;
-    if ((disp << 38) >> 38 == disp)
-        tcg_out32 (s, B | (disp & 0x3fffffc) | mask);
-    else {
-        tcg_out_movi (s, TCG_TYPE_I64, 0, (tcg_target_long) target);
-        tcg_out32 (s, MTSPR | RS (0) | CTR);
-        tcg_out32 (s, BCCTR | BO_ALWAYS | mask);
+    if (in_range_b(disp)) {
+        tcg_out32(s, B | (disp & 0x3fffffc) | lk);
+    } else if (in_range_b(target)) {
+        tcg_out32(s, B | AA | target | lk);
+    } else {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, target);
+        tcg_out32 (s, MTSPR | RS(TCG_REG_R0) | CTR);
+        tcg_out32 (s, BCCTR | BO_ALWAYS | lk);
     }
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (2 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable Richard Henderson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

There are no helpers that require the static chain.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index fce3e5d..ddc9581 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -728,7 +728,6 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
 
     tcg_out32(s, LD | TAI(TCG_REG_R0, reg, 0));
     tcg_out32(s, MTSPR | RA(TCG_REG_R0) | CTR);
-    tcg_out32(s, LD | TAI(TCG_REG_R11, reg, 16));
     tcg_out32(s, LD | TAI(TCG_REG_R2, reg, 8));
     tcg_out32(s, BCCTR | BO_ALWAYS | lk);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (3 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31 Richard Henderson
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Loading 32-bit immediates instead of memory will be faster.
Don't attempt to generate full 64-bit immediates.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index ddc9581..2563253 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -722,6 +722,17 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
 #else
     TCGReg reg = arg;
     if (const_arg) {
+        uintptr_t tgt = ((uintptr_t *)arg)[0];
+        uintptr_t toc = ((uintptr_t *)arg)[1];
+
+        /* Look through the function descriptor, if profitable.  */
+        if (tgt == (int32_t)tgt && toc == (int32_t)toc) {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, toc);
+            tcg_out_b(s, lk, tgt);
+            return;
+        }
+
+        /* Avoid generating two full 64-bit constants.  */
         reg = TCG_REG_R2;
         tcg_out_movi(s, TCG_TYPE_I64, reg, arg);
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (4 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order Richard Henderson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Now that AREG0 doesn't need to be a global register, non-conflicting
with the normal frame pointer, move it out of the middle of the set.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 4 ++--
 tcg/ppc64/tcg-target.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 2563253..2b3d1bb 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -166,11 +166,11 @@ static const int tcg_target_callee_save_regs[] = {
     TCG_REG_R24,
     TCG_REG_R25,
     TCG_REG_R26,
-    TCG_REG_R27, /* currently used for the global env */
+    TCG_REG_R27,
     TCG_REG_R28,
     TCG_REG_R29,
     TCG_REG_R30,
-    TCG_REG_R31
+    TCG_REG_R31, /* currently used for the global env */
 };
 
 static inline bool in_range_b(intptr_t disp)
diff --git a/tcg/ppc64/tcg-target.h b/tcg/ppc64/tcg-target.h
index 48fc6e2..66d0515 100644
--- a/tcg/ppc64/tcg-target.h
+++ b/tcg/ppc64/tcg-target.h
@@ -119,7 +119,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulu2_i64        1
 #define TCG_TARGET_HAS_muls2_i64        1
 
-#define TCG_AREG0 TCG_REG_R27
+#define TCG_AREG0 TCG_REG_R31
 
 #define TCG_TARGET_EXTEND_ARGS 1
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (5 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31 Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode Richard Henderson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Remove conditionalization from tcg_target_reg_alloc_order, relying on
reserved_regs to prevent register allocation that shouldn't happen.
So R11 is now present in reg_alloc_order for __APPLE__, but also now
reserved.

Sort reg_alloc_order into call-saved, call-clobbered, and parameters.
This reduces the effect of values getting spilled and reloaded before
function calls.

Whether or not it is reserved, R2 (TOC) is always call-clobbered.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 46 +++++++++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 25 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 2b3d1bb..862e84c 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -99,7 +99,7 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 #endif
 
 static const int tcg_target_reg_alloc_order[] = {
-    TCG_REG_R14,
+    TCG_REG_R14,  /* call saved registers */
     TCG_REG_R15,
     TCG_REG_R16,
     TCG_REG_R17,
@@ -109,29 +109,25 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R21,
     TCG_REG_R22,
     TCG_REG_R23,
+    TCG_REG_R24,
+    TCG_REG_R25,
+    TCG_REG_R26,
+    TCG_REG_R27,
     TCG_REG_R28,
     TCG_REG_R29,
     TCG_REG_R30,
     TCG_REG_R31,
-#ifdef __APPLE__
+    TCG_REG_R12,  /* call clobbered, non-arguments */
+    TCG_REG_R11,
     TCG_REG_R2,
-#endif
-    TCG_REG_R3,
-    TCG_REG_R4,
-    TCG_REG_R5,
-    TCG_REG_R6,
-    TCG_REG_R7,
-    TCG_REG_R8,
+    TCG_REG_R10,  /* call clobbered, arguments */
     TCG_REG_R9,
-    TCG_REG_R10,
-#ifndef __APPLE__
-    TCG_REG_R11,
-#endif
-    TCG_REG_R12,
-    TCG_REG_R24,
-    TCG_REG_R25,
-    TCG_REG_R26,
-    TCG_REG_R27
+    TCG_REG_R8,
+    TCG_REG_R7,
+    TCG_REG_R6,
+    TCG_REG_R5,
+    TCG_REG_R4,
+    TCG_REG_R3,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -2160,9 +2156,7 @@ static void tcg_target_init (TCGContext *s)
     tcg_regset_set32 (tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
     tcg_regset_set32 (tcg_target_call_clobber_regs, 0,
                      (1 << TCG_REG_R0) |
-#ifdef __APPLE__
                      (1 << TCG_REG_R2) |
-#endif
                      (1 << TCG_REG_R3) |
                      (1 << TCG_REG_R4) |
                      (1 << TCG_REG_R5) |
@@ -2176,12 +2170,14 @@ static void tcg_target_init (TCGContext *s)
         );
 
     tcg_regset_clear (s->reserved_regs);
-    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R0);
-    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R1);
-#ifndef __APPLE__
-    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2);
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R0); /* tcg temp */
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R1); /* stack pointer */
+#ifdef __APPLE__
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R11); /* ??? */
+#else
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2); /* toc */
 #endif
-    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13);
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13); /* thread pointer */
 
     tcg_add_target_add_op_defs (ppc_op_defs);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (6 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better Richard Henderson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

This makes some bits easier to debug, being presented with a symbol
instead of a number inside gdb.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 273 +++++++++++++++++++++++++------------------------
 1 file changed, 138 insertions(+), 135 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 862e84c..a79b876 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -313,133 +313,10 @@ static int tcg_target_const_match (tcg_target_long val,
 #define XO58(opc) (OPCD(58)|(opc))
 #define XO62(opc) (OPCD(62)|(opc))
 
-#define B      OPCD( 18)
-#define BC     OPCD( 16)
-#define LBZ    OPCD( 34)
-#define LHZ    OPCD( 40)
-#define LHA    OPCD( 42)
-#define LWZ    OPCD( 32)
-#define STB    OPCD( 38)
-#define STH    OPCD( 44)
-#define STW    OPCD( 36)
-
-#define STD    XO62(  0)
-#define STDU   XO62(  1)
-#define STDX   XO31(149)
-
-#define LD     XO58(  0)
-#define LDX    XO31( 21)
-#define LDU    XO58(  1)
-#define LWA    XO58(  2)
-#define LWAX   XO31(341)
-
-#define ADDIC  OPCD( 12)
-#define ADDI   OPCD( 14)
-#define ADDIS  OPCD( 15)
-#define ORI    OPCD( 24)
-#define ORIS   OPCD( 25)
-#define XORI   OPCD( 26)
-#define XORIS  OPCD( 27)
-#define ANDI   OPCD( 28)
-#define ANDIS  OPCD( 29)
-#define MULLI  OPCD(  7)
-#define CMPLI  OPCD( 10)
-#define CMPI   OPCD( 11)
-#define SUBFIC OPCD( 8)
-
-#define LWZU   OPCD( 33)
-#define STWU   OPCD( 37)
-
-#define RLWIMI OPCD( 20)
-#define RLWINM OPCD( 21)
-#define RLWNM  OPCD( 23)
-
-#define RLDICL MD30(  0)
-#define RLDICR MD30(  1)
-#define RLDIMI MD30(  3)
-#define RLDCL  MDS30( 8)
-
-#define BCLR   XO19( 16)
-#define BCCTR  XO19(528)
-#define CRAND  XO19(257)
-#define CRANDC XO19(129)
-#define CRNAND XO19(225)
-#define CROR   XO19(449)
-#define CRNOR  XO19( 33)
-
-#define EXTSB  XO31(954)
-#define EXTSH  XO31(922)
-#define EXTSW  XO31(986)
-#define ADD    XO31(266)
-#define ADDE   XO31(138)
-#define ADDME  XO31(234)
-#define ADDZE  XO31(202)
-#define ADDC   XO31( 10)
-#define AND    XO31( 28)
-#define SUBF   XO31( 40)
-#define SUBFC  XO31(  8)
-#define SUBFE  XO31(136)
-#define SUBFME XO31(232)
-#define SUBFZE XO31(200)
-#define OR     XO31(444)
-#define XOR    XO31(316)
-#define MULLW  XO31(235)
-#define MULHWU XO31( 11)
-#define DIVW   XO31(491)
-#define DIVWU  XO31(459)
-#define CMP    XO31(  0)
-#define CMPL   XO31( 32)
-#define LHBRX  XO31(790)
-#define LWBRX  XO31(534)
-#define LDBRX  XO31(532)
-#define STHBRX XO31(918)
-#define STWBRX XO31(662)
-#define STDBRX XO31(660)
-#define MFSPR  XO31(339)
-#define MTSPR  XO31(467)
-#define SRAWI  XO31(824)
-#define NEG    XO31(104)
-#define MFCR   XO31( 19)
-#define MFOCRF (MFCR | (1u << 20))
-#define NOR    XO31(124)
-#define CNTLZW XO31( 26)
-#define CNTLZD XO31( 58)
-#define ANDC   XO31( 60)
-#define ORC    XO31(412)
-#define EQV    XO31(284)
-#define NAND   XO31(476)
-#define ISEL   XO31( 15)
-
-#define MULLD  XO31(233)
-#define MULHD  XO31( 73)
-#define MULHDU XO31(  9)
-#define DIVD   XO31(489)
-#define DIVDU  XO31(457)
-
-#define LBZX   XO31( 87)
-#define LHZX   XO31(279)
-#define LHAX   XO31(343)
-#define LWZX   XO31( 23)
-#define STBX   XO31(215)
-#define STHX   XO31(407)
-#define STWX   XO31(151)
-
 #define SPR(a,b) ((((a)<<5)|(b))<<11)
 #define LR     SPR(8, 0)
 #define CTR    SPR(9, 0)
 
-#define SLW    XO31( 24)
-#define SRW    XO31(536)
-#define SRAW   XO31(792)
-
-#define SLD    XO31( 27)
-#define SRD    XO31(539)
-#define SRAD   XO31(794)
-#define SRADI  XO31(413<<1)
-
-#define TW     XO31( 4)
-#define TRAP   (TW | TO (31))
-
 #define RT(r) ((r)<<21)
 #define RS(r) ((r)<<21)
 #define RA(r) ((r)<<16)
@@ -455,6 +332,131 @@ static int tcg_target_const_match (tcg_target_long val,
 #define LK    1
 #define AA    2
 
+typedef enum PowerOpcode {
+    B      = OPCD( 18),
+    BC     = OPCD( 16),
+    LBZ    = OPCD( 34),
+    LHZ    = OPCD( 40),
+    LHA    = OPCD( 42),
+    LWZ    = OPCD( 32),
+    STB    = OPCD( 38),
+    STH    = OPCD( 44),
+    STW    = OPCD( 36),
+
+    STD    = XO62(  0),
+    STDU   = XO62(  1),
+    STDX   = XO31(149),
+
+    LD     = XO58(  0),
+    LDX    = XO31( 21),
+    LDU    = XO58(  1),
+    LWA    = XO58(  2),
+    LWAX   = XO31(341),
+
+    ADDIC  = OPCD( 12),
+    ADDI   = OPCD( 14),
+    ADDIS  = OPCD( 15),
+    ORI    = OPCD( 24),
+    ORIS   = OPCD( 25),
+    XORI   = OPCD( 26),
+    XORIS  = OPCD( 27),
+    ANDI   = OPCD( 28),
+    ANDIS  = OPCD( 29),
+    MULLI  = OPCD(  7),
+    CMPLI  = OPCD( 10),
+    CMPI   = OPCD( 11),
+    SUBFIC = OPCD(  8),
+
+    LWZU   = OPCD( 33),
+    STWU   = OPCD( 37),
+
+    RLWIMI = OPCD( 20),
+    RLWINM = OPCD( 21),
+    RLWNM  = OPCD( 23),
+
+    RLDICL = MD30(  0),
+    RLDICR = MD30(  1),
+    RLDIMI = MD30(  3),
+    RLDCL  = MDS30( 8),
+
+    BCLR   = XO19( 16),
+    BCCTR  = XO19(528),
+    CRAND  = XO19(257),
+    CRANDC = XO19(129),
+    CRNAND = XO19(225),
+    CROR   = XO19(449),
+    CRNOR  = XO19( 33),
+
+    EXTSB  = XO31(954),
+    EXTSH  = XO31(922),
+    EXTSW  = XO31(986),
+    ADD    = XO31(266),
+    ADDE   = XO31(138),
+    ADDME  = XO31(234),
+    ADDZE  = XO31(202),
+    ADDC   = XO31( 10),
+    AND    = XO31( 28),
+    SUBF   = XO31( 40),
+    SUBFC  = XO31(  8),
+    SUBFE  = XO31(136),
+    SUBFME = XO31(232),
+    SUBFZE = XO31(200),
+    OR     = XO31(444),
+    XOR    = XO31(316),
+    MULLW  = XO31(235),
+    MULHWU = XO31( 11),
+    DIVW   = XO31(491),
+    DIVWU  = XO31(459),
+    CMP    = XO31(  0),
+    CMPL   = XO31( 32),
+    LHBRX  = XO31(790),
+    LWBRX  = XO31(534),
+    LDBRX  = XO31(532),
+    STHBRX = XO31(918),
+    STWBRX = XO31(662),
+    STDBRX = XO31(660),
+    MFSPR  = XO31(339),
+    MTSPR  = XO31(467),
+    SRAWI  = XO31(824),
+    NEG    = XO31(104),
+    MFCR   = XO31( 19),
+    MFOCRF = MFCR | (1u << 20),
+    NOR    = XO31(124),
+    CNTLZW = XO31( 26),
+    CNTLZD = XO31( 58),
+    ANDC   = XO31( 60),
+    ORC    = XO31(412),
+    EQV    = XO31(284),
+    NAND   = XO31(476),
+    ISEL   = XO31( 15),
+
+    MULLD  = XO31(233),
+    MULHD  = XO31( 73),
+    MULHDU = XO31(  9),
+    DIVD   = XO31(489),
+    DIVDU  = XO31(457),
+
+    LBZX   = XO31( 87),
+    LHZX   = XO31(279),
+    LHAX   = XO31(343),
+    LWZX   = XO31( 23),
+    STBX   = XO31(215),
+    STHX   = XO31(407),
+    STWX   = XO31(151),
+
+    SLW    = XO31( 24),
+    SRW    = XO31(536),
+    SRAW   = XO31(792),
+
+    SLD    = XO31( 27),
+    SRD    = XO31(539),
+    SRAD   = XO31(794),
+    SRADI  = XO31(413<<1),
+
+    TW     = XO31( 4),
+    TRAP   = TW | TO(31),
+} PowerOpcode;
+
 #define TAB(t, a, b) (RT(t) | RA(a) | RB(b))
 #define SAB(s, a, b) (RS(s) | RA(a) | RB(b))
 #define TAI(s, a, i) (RT(s) | RA(a) | ((i) & 0xffff))
@@ -513,16 +515,16 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
     }
 }
 
-static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
-                               int sh, int mb)
+static inline void tcg_out_rld(TCGContext *s, PowerOpcode op, TCGReg ra,
+                               TCGReg rs, int sh, int mb)
 {
     sh = SH (sh & 0x1f) | (((sh >> 5) & 1) << 1);
     mb = MB64 ((mb >> 5) | ((mb << 1) & 0x3f));
     tcg_out32 (s, op | RA (ra) | RS (rs) | sh | mb);
 }
 
-static inline void tcg_out_rlw(TCGContext *s, int op, TCGReg ra, TCGReg rs,
-                               int sh, int mb, int me)
+static inline void tcg_out_rlw(TCGContext *s, PowerOpcode op, TCGReg ra,
+                               TCGReg rs, int sh, int mb, int me)
 {
     tcg_out32(s, op | RA(ra) | RS(rs) | SH(sh) | MB(mb) | ME(me));
 }
@@ -666,7 +668,7 @@ static void tcg_out_andi64(TCGContext *s, TCGReg dst, TCGReg src, uint64_t c)
 }
 
 static void tcg_out_zori32(TCGContext *s, TCGReg dst, TCGReg src, uint32_t c,
-                           int op_lo, int op_hi)
+                           PowerOpcode op_lo, PowerOpcode op_hi)
 {
     if (c >> 16) {
         tcg_out32(s, op_hi | SAI(src, dst, c >> 16));
@@ -741,7 +743,7 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
 }
 
 static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
-                         int offset, int op1, int op2)
+                         int offset, PowerOpcode op1, PowerOpcode op2)
 {
     if (offset == (int16_t) offset) {
         tcg_out32(s, op1 | TAI(ret, addr, offset));
@@ -752,7 +754,7 @@ static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
 }
 
 static void tcg_out_ldsta(TCGContext *s, TCGReg ret, TCGReg addr,
-                          int offset, int op1, int op2)
+                          int offset, PowerOpcode op1, PowerOpcode op2)
 {
     if (offset == (int16_t) (offset & ~3)) {
         tcg_out32(s, op1 | TAI(ret, addr, offset));
@@ -820,7 +822,7 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
 }
 #endif
 
-static const uint32_t qemu_ldx_opc[8] = {
+static const PowerOpcode qemu_ldx_opc[8] = {
 #ifdef TARGET_WORDS_BIGENDIAN
     LBZX, LHZX, LWZX, LDX,
     0,    LHAX, LWAX, LDX
@@ -830,7 +832,7 @@ static const uint32_t qemu_ldx_opc[8] = {
 #endif
 };
 
-static const uint32_t qemu_stx_opc[4] = {
+static const PowerOpcode qemu_stx_opc[4] = {
 #ifdef TARGET_WORDS_BIGENDIAN
     STBX, STHX, STWX, STDX
 #else
@@ -838,14 +840,15 @@ static const uint32_t qemu_stx_opc[4] = {
 #endif
 };
 
-static const uint32_t qemu_exts_opc[4] = {
+static const PowerOpcode qemu_exts_opc[4] = {
     EXTSB, EXTSH, EXTSW, 0
 };
 
 static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 {
     TCGReg addr_reg, data_reg, r0, r1, rbase;
-    uint32_t insn, s_bits;
+    PowerOpcode insn;
+    int s_bits;
 #ifdef CONFIG_SOFTMMU
     TCGReg r2, ir;
     int mem_index;
@@ -936,7 +939,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
 {
     TCGReg addr_reg, r0, r1, rbase, data_reg;
-    uint32_t insn;
+    PowerOpcode insn;
 #ifdef CONFIG_SOFTMMU
     TCGReg r2, ir;
     int mem_index;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (7 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading Richard Henderson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Previously we'd only handle 16-bit offsets from memory operand
without falling back to indexed, but it's easy to use ADDIS to
handle full 32-bit offsets.

This also lets us unify code that existed inline in tcg_out_op
for handling addition of large constants.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 159 +++++++++++++++++++++++++------------------------
 1 file changed, 81 insertions(+), 78 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index a79b876..e9c41fb 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -119,7 +119,6 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_R31,
     TCG_REG_R12,  /* call clobbered, non-arguments */
     TCG_REG_R11,
-    TCG_REG_R2,
     TCG_REG_R10,  /* call clobbered, arguments */
     TCG_REG_R9,
     TCG_REG_R8,
@@ -742,25 +741,55 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
 #endif
 }
 
-static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
-                         int offset, PowerOpcode op1, PowerOpcode op2)
+static void tcg_out_mem_long(TCGContext *s, PowerOpcode opi, PowerOpcode opx,
+                             TCGReg rt, TCGReg base, tcg_target_long offset)
 {
-    if (offset == (int16_t) offset) {
-        tcg_out32(s, op1 | TAI(ret, addr, offset));
-    } else {
-        tcg_out_movi(s, TCG_TYPE_I64, 0, offset);
-        tcg_out32(s, op2 | TAB(ret, addr, 0));
+    tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
+    TCGReg rs = TCG_REG_R2;
+
+    assert(rt != TCG_REG_R2 && base != TCG_REG_R2);
+
+    switch (opi) {
+    case LD: case LWA:
+        align = 3;
+        /* FALLTHRU */
+    default:
+        if (rt != TCG_REG_R0) {
+            rs = rt;
+        }
+        break;
+    case STD:
+        align = 3;
+        break;
+    case STB: case STH: case STW:
+        break;
     }
-}
 
-static void tcg_out_ldsta(TCGContext *s, TCGReg ret, TCGReg addr,
-                          int offset, PowerOpcode op1, PowerOpcode op2)
-{
-    if (offset == (int16_t) (offset & ~3)) {
-        tcg_out32(s, op1 | TAI(ret, addr, offset));
-    } else {
-        tcg_out_movi(s, TCG_TYPE_I64, 0, offset);
-        tcg_out32(s, op2 | TAB(ret, addr, 0));
+    /* For unaligned, or very large offsets, use the indexed form.  */
+    if (offset & align || offset != (int32_t)offset) {
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig);
+        tcg_out32(s, opx | TAB(rt, base, TCG_REG_R2));
+        return;
+    }
+
+    l0 = (int16_t)offset;
+    offset = (offset - l0) >> 16;
+    l1 = (int16_t)offset;
+
+    if (l1 < 0 && orig >= 0) {
+        extra = 0x4000;
+        l1 = (int16_t)(offset - 0x4000);
+    }
+    if (l1) {
+        tcg_out32(s, ADDIS | TAI(rs, base, l1));
+        base = rs;
+    }
+    if (extra) {
+        tcg_out32(s, ADDIS | TAI(rs, base, extra));
+        base = rs;
+    }
+    if (opi != ADDI || base != rt || l0 != 0) {
+        tcg_out32(s, opi | TAI(rt, base, l0));
     }
 }
 
@@ -1088,22 +1117,30 @@ static void tcg_target_qemu_prologue (TCGContext *s)
     tcg_out32(s, BCLR | BO_ALWAYS);
 }
 
-static void tcg_out_ld (TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
-                        tcg_target_long arg2)
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
+                       tcg_target_long arg2)
 {
-    if (type == TCG_TYPE_I32)
-        tcg_out_ldst (s, ret, arg1, arg2, LWZ, LWZX);
-    else
-        tcg_out_ldsta (s, ret, arg1, arg2, LD, LDX);
+    PowerOpcode opi, opx;
+
+    if (type == TCG_TYPE_I32) {
+        opi = LWZ, opx = LWZX;
+    } else {
+        opi = LD, opx = LDX;
+    }
+    tcg_out_mem_long(s, opi, opx, ret, arg1, arg2);
 }
 
-static void tcg_out_st (TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
-                        tcg_target_long arg2)
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
+                       tcg_target_long arg2)
 {
-    if (type == TCG_TYPE_I32)
-        tcg_out_ldst (s, arg, arg1, arg2, STW, STWX);
-    else
-        tcg_out_ldsta (s, arg, arg1, arg2, STD, STDX);
+    PowerOpcode opi, opx;
+
+    if (type == TCG_TYPE_I32) {
+        opi = STW, opx = STWX;
+    } else {
+        opi = STD, opx = STDX;
+    }
+    tcg_out_mem_long(s, opi, opx, arg, arg1, arg2);
 }
 
 static void tcg_out_cmp(TCGContext *s, int cond, TCGArg arg1, TCGArg arg2,
@@ -1464,61 +1501,52 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_ld8u_i32:
     case INDEX_op_ld8u_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], LBZ, LBZX);
+        tcg_out_mem_long(s, LBZ, LBZX, args[0], args[1], args[2]);
         break;
     case INDEX_op_ld8s_i32:
     case INDEX_op_ld8s_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], LBZ, LBZX);
-        tcg_out32 (s, EXTSB | RS (args[0]) | RA (args[0]));
+        tcg_out_mem_long(s, LBZ, LBZX, args[0], args[1], args[2]);
+        tcg_out32(s, EXTSB | RS(args[0]) | RA(args[0]));
         break;
     case INDEX_op_ld16u_i32:
     case INDEX_op_ld16u_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], LHZ, LHZX);
+        tcg_out_mem_long(s, LHZ, LHZX, args[0], args[1], args[2]);
         break;
     case INDEX_op_ld16s_i32:
     case INDEX_op_ld16s_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], LHA, LHAX);
+        tcg_out_mem_long(s, LHA, LHAX, args[0], args[1], args[2]);
         break;
     case INDEX_op_ld_i32:
     case INDEX_op_ld32u_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], LWZ, LWZX);
+        tcg_out_mem_long(s, LWZ, LWZX, args[0], args[1], args[2]);
         break;
     case INDEX_op_ld32s_i64:
-        tcg_out_ldsta (s, args[0], args[1], args[2], LWA, LWAX);
+        tcg_out_mem_long(s, LWA, LWAX, args[0], args[1], args[2]);
         break;
     case INDEX_op_ld_i64:
-        tcg_out_ldsta (s, args[0], args[1], args[2], LD, LDX);
+        tcg_out_mem_long(s, LD, LDX, args[0], args[1], args[2]);
         break;
     case INDEX_op_st8_i32:
     case INDEX_op_st8_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], STB, STBX);
+        tcg_out_mem_long(s, STB, STBX, args[0], args[1], args[2]);
         break;
     case INDEX_op_st16_i32:
     case INDEX_op_st16_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], STH, STHX);
+        tcg_out_mem_long(s, STH, STHX, args[0], args[1], args[2]);
         break;
     case INDEX_op_st_i32:
     case INDEX_op_st32_i64:
-        tcg_out_ldst (s, args[0], args[1], args[2], STW, STWX);
+        tcg_out_mem_long(s, STW, STWX, args[0], args[1], args[2]);
         break;
     case INDEX_op_st_i64:
-        tcg_out_ldsta (s, args[0], args[1], args[2], STD, STDX);
+        tcg_out_mem_long(s, STD, STDX, args[0], args[1], args[2]);
         break;
 
     case INDEX_op_add_i32:
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
-            int32_t l, h;
         do_addi_32:
-            l = (int16_t)a2;
-            h = a2 - l;
-            if (h) {
-                tcg_out32(s, ADDIS | TAI(a0, a1, h >> 16));
-                a1 = a0;
-            }
-            if (l || a0 != a1) {
-                tcg_out32(s, ADDI | TAI(a0, a1, l));
-            }
+            tcg_out_mem_long(s, ADDI, ADD, a0, a1, (int32_t)a2);
         } else {
             tcg_out32(s, ADD | TAB(a0, a1, a2));
         }
@@ -1694,32 +1722,8 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_add_i64:
         a0 = args[0], a1 = args[1], a2 = args[2];
         if (const_args[2]) {
-            int32_t l0, h1, h2;
         do_addi_64:
-            /* We can always split any 32-bit signed constant into 3 pieces.
-               Note the positive 0x80000000 coming from the sub_i64 path,
-               handled with the same code we need for eg 0x7fff8000.  */
-            assert(a2 == (int32_t)a2 || a2 == 0x80000000);
-            l0 = (int16_t)a2;
-            h1 = a2 - l0;
-            h2 = 0;
-            if (h1 < 0 && (int64_t)a2 > 0) {
-                h2 = 0x40000000;
-                h1 = a2 - h2 - l0;
-            }
-            assert((TCGArg)h2 + h1 + l0 == a2);
-
-            if (h2) {
-                tcg_out32(s, ADDIS | TAI(a0, a1, h2 >> 16));
-                a1 = a0;
-            }
-            if (h1) {
-                tcg_out32(s, ADDIS | TAI(a0, a1, h1 >> 16));
-                a1 = a0;
-            }
-            if (l0 || a0 != a1) {
-                tcg_out32(s, ADDI | TAI(a0, a1, l0));
-            }
+            tcg_out_mem_long(s, ADDI, ADD, a0, a1, a2);
         } else {
             tcg_out32(s, ADD | TAB(a0, a1, a2));
         }
@@ -2175,10 +2179,9 @@ static void tcg_target_init (TCGContext *s)
     tcg_regset_clear (s->reserved_regs);
     tcg_regset_set_reg (s->reserved_regs, TCG_REG_R0); /* tcg temp */
     tcg_regset_set_reg (s->reserved_regs, TCG_REG_R1); /* stack pointer */
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2); /* mem temp */
 #ifdef __APPLE__
     tcg_regset_set_reg (s->reserved_regs, TCG_REG_R11); /* ??? */
-#else
-    tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2); /* toc */
 #endif
     tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13); /* thread pointer */
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (8 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB Richard Henderson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

We were always doing an indirect jump anyway, and the sequence is
never longer than the 6 insns we were reserving for the direct jump.
Futher cleanups will reduce the length of the constant address load.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/exec-all.h |  3 ++-
 tcg/ppc64/tcg-target.c  | 26 ++++++++------------------
 2 files changed, 10 insertions(+), 19 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b3402a1..26c3553 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -126,7 +126,8 @@ static inline void tlb_flush(CPUArchState *env, int flush_global)
 #define CODE_GEN_AVG_BLOCK_SIZE 64
 #endif
 
-#if defined(__arm__) || defined(_ARCH_PPC) \
+#if defined(__arm__) \
+    || (defined(__powerpc__) && !defined(__powerpc64__)) \
     || defined(__x86_64__) || defined(__i386__) \
     || defined(__sparc__) || defined(__aarch64__) \
     || defined(CONFIG_TCG_INTERPRETER)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index e9c41fb..f69bc8f 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1440,17 +1440,6 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
     }
 }
 
-void ppc_tb_set_jmp_target (unsigned long jmp_addr, unsigned long addr)
-{
-    TCGContext s;
-    unsigned long patch_size;
-
-    s.code_ptr = (uint8_t *) jmp_addr;
-    tcg_out_b (&s, 0, addr);
-    patch_size = s.code_ptr - (uint8_t *) jmp_addr;
-    flush_icache_range (jmp_addr, jmp_addr + patch_size);
-}
-
 static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
                         const int *const_args)
 {
@@ -1464,13 +1453,14 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
-            /* direct jump method */
-
-            s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
-            s->code_ptr += 28;
-        }
-        else {
-            tcg_abort ();
+            /* Direct jump method. */
+            tcg_abort();
+        } else {
+            /* Indirect jump method. */
+            tcg_out_mem_long(s, LD, LDX, TCG_REG_R0, TCG_REG_R0,
+                             (tcg_target_long)(s->tb_next + args[0]));
+            tcg_out32(s, MTSPR | RS(TCG_REG_R0) | CTR);
+            tcg_out32(s, BCCTR | BO_ALWAYS);
         }
         s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
         break;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (9 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long Richard Henderson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

A handy value "near" the rest of the program's dynamic allocation.
We'll be able to use this value for constant address generation,
cross-TB references, and in the further future, constant pool refs.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index f69bc8f..e01d8bc 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -55,10 +55,15 @@ static bool have_isa_2_06;
 #define HAVE_ISEL      0
 #endif
 
+/* Our local "toc" points to the beginning of the TB, making it easy to
+   form addresses in the memory range "near" the TB.  Unlike the real TOC,
+   put this in a call-saved register so we don't have to reload it.  */
+#define TCG_REG_TB  TCG_REG_R30
+
 #ifdef CONFIG_USE_GUEST_BASE
-#define TCG_GUEST_BASE_REG 30
+#define TCG_GUEST_BASE_REG  TCG_REG_R29
 #else
-#define TCG_GUEST_BASE_REG 0
+#define TCG_GUEST_BASE_REG  0
 #endif
 
 #ifndef NDEBUG
@@ -1097,8 +1102,9 @@ static void tcg_target_qemu_prologue (TCGContext *s)
     }
 #endif
 
-    tcg_out_mov (s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
+    tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
     tcg_out32 (s, MTSPR | RS (tcg_target_call_iarg_regs[1]) | CTR);
+    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, tcg_target_call_iarg_regs[1]);
     tcg_out32 (s, BCCTR | BO_ALWAYS);
 
     /* Epilogue */
@@ -1457,13 +1463,19 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
             tcg_abort();
         } else {
             /* Indirect jump method. */
-            tcg_out_mem_long(s, LD, LDX, TCG_REG_R0, TCG_REG_R0,
+            tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_R0,
                              (tcg_target_long)(s->tb_next + args[0]));
-            tcg_out32(s, MTSPR | RS(TCG_REG_R0) | CTR);
+            tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
             tcg_out32(s, BCCTR | BO_ALWAYS);
         }
         s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+
+        /* In the initial unset chain case, we fall thru.  Which means
+           that we need to reset the TCG_REG_TB register to our current.  */
+        tcg_out_mem_long(s, ADDI, ADD, TCG_REG_TB, TCG_REG_TB,
+                         s->code_buf - s->code_ptr);
         break;
+
     case INDEX_op_br:
         {
             TCGLabel *l = &s->labels[args[0]];
@@ -2174,6 +2186,7 @@ static void tcg_target_init (TCGContext *s)
     tcg_regset_set_reg (s->reserved_regs, TCG_REG_R11); /* ??? */
 #endif
     tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13); /* thread pointer */
+    tcg_regset_set_reg (s->reserved_regs, TCG_REG_TB); /* tcg tb pointer */
 
     tcg_add_target_add_op_defs (ppc_op_defs);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (10 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue Richard Henderson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

This results in significant code size reductions when manipulating
pointers into TCG's own data structures.  E.g.

-OUT: [size=180]
+OUT: [size=132]
...
-xxx:  li      r2,16383			# goto_tb
-xxx:  rldicr  r2,r2,32,31
-xxx:  oris    r2,r2,39128
-xxx:  ori     r2,r2,376
-xxx:  ldx     r30,0,r2
+xxx:  addis   r30,r30,-544
+xxx:  ld      r30,-8(r30)
...
-xxx:  li      r3,16383			# exit_tb
-xxx:  rldicr  r3,r3,32,31
-xxx:  oris    r3,r3,39128
-xxx:  ori     r3,r3,288
+xxx:  addis   r3,r30,-544
+xxx:  addi    r3,r3,-96

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 164 +++++++++++++++++++++++++++++--------------------
 1 file changed, 99 insertions(+), 65 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index e01d8bc..d4e1efc 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -548,6 +548,78 @@ static inline void tcg_out_shri64(TCGContext *s, TCGReg dst, TCGReg src, int c)
     tcg_out_rld(s, RLDICL, dst, src, 64 - c, c);
 }
 
+static void tcg_out_mem_long(TCGContext *s, PowerOpcode opi, PowerOpcode opx,
+                             TCGReg rt, TCGReg base, tcg_target_long offset)
+{
+    tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
+    TCGReg rs = TCG_REG_R2;
+
+    assert(rt != TCG_REG_R2 && base != TCG_REG_R2);
+
+    switch (opi) {
+    case LD: case LWA:
+        align = 3;
+        /* FALLTHRU */
+    default:
+        if (rt != TCG_REG_R0) {
+            rs = rt;
+        }
+        break;
+    case STD:
+        align = 3;
+        break;
+    case STB: case STH: case STW:
+        break;
+    }
+
+    /* For unaligned, use the indexed form.  */
+    if (offset & align) {
+    do_indexed:
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig);
+        tcg_out32(s, opx | TAB(rt, base, TCG_REG_R2));
+        return;
+    }
+
+    if (base == TCG_REG_R0) {
+        /* For absolute addresses, avoid indexed form.  First try turning
+           it into an offset from a known base register, then just fold
+           the low 16 bits. */
+        offset -= (tcg_target_long)s->code_buf;
+        if (offset == (int32_t)offset) {
+            orig = offset;
+            base = TCG_REG_TB;
+        } else {
+            offset = (int16_t)orig;
+            tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig - offset);
+            orig = offset;
+            base = TCG_REG_R2;
+        }
+    } else if (offset != (int32_t)offset) {
+        /* For very large offsets off a real base register, use indexed.  */
+        goto do_indexed;
+    }
+
+    l0 = (int16_t)offset;
+    offset = (offset - l0) >> 16;
+    l1 = (int16_t)offset;
+
+    if (l1 < 0 && orig >= 0) {
+        extra = 0x4000;
+        l1 = (int16_t)(offset - 0x4000);
+    }
+    if (l1) {
+        tcg_out32(s, ADDIS | TAI(rs, base, l1));
+        base = rs;
+    }
+    if (extra) {
+        tcg_out32(s, ADDIS | TAI(rs, base, extra));
+        base = rs;
+    }
+    if (opi != ADDI || base != rt || l0 != 0) {
+        tcg_out32(s, opi | TAI(rt, base, l0));
+    }
+}
+
 static void tcg_out_movi32(TCGContext *s, TCGReg ret, int32_t arg)
 {
     if (arg == (int16_t) arg) {
@@ -563,23 +635,37 @@ static void tcg_out_movi32(TCGContext *s, TCGReg ret, int32_t arg)
 static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
                          tcg_target_long arg)
 {
+    tcg_target_long tmp;
+
+    /* Two attempts at 1 or 2 insn sequence for 32-bit constant.  */
     if (type == TCG_TYPE_I32 || arg == (int32_t)arg) {
         tcg_out_movi32(s, ret, arg);
-    } else if (arg == (uint32_t)arg && !(arg & 0x8000)) {
+        return;
+    }
+    if (arg == (uint32_t)arg && !(arg & 0x8000)) {
         tcg_out32(s, ADDI | TAI(ret, 0, arg));
         tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
-    } else {
-        int32_t high = arg >> 32;
-        tcg_out_movi32(s, ret, high);
-        if (high) {
-            tcg_out_shli64(s, ret, ret, 32);
-        }
-        if (arg & 0xffff0000) {
-            tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
-        }
-        if (arg & 0xffff) {
-            tcg_out32(s, ORI | SAI(ret, ret, arg));
-        }
+        return;
+    }
+
+    /* See if we can turn a address constant into a TB offset.  */
+    tmp = arg - (uintptr_t)s->code_buf;
+    if (tmp == (int32_t)tmp) {
+        tcg_out_mem_long(s, ADDI, ADD, ret, TCG_REG_TB, tmp);
+        return;
+    }
+
+    /* Full 64-bit constant load.  */
+    tmp = arg >> 32;
+    tcg_out_movi32(s, ret, tmp);
+    if (tmp) {
+        tcg_out_shli64(s, ret, ret, 32);
+    }
+    if (arg & 0xffff0000) {
+        tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
+    }
+    if (arg & 0xffff) {
+        tcg_out32(s, ORI | SAI(ret, ret, arg));
     }
 }
 
@@ -746,58 +832,6 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
 #endif
 }
 
-static void tcg_out_mem_long(TCGContext *s, PowerOpcode opi, PowerOpcode opx,
-                             TCGReg rt, TCGReg base, tcg_target_long offset)
-{
-    tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
-    TCGReg rs = TCG_REG_R2;
-
-    assert(rt != TCG_REG_R2 && base != TCG_REG_R2);
-
-    switch (opi) {
-    case LD: case LWA:
-        align = 3;
-        /* FALLTHRU */
-    default:
-        if (rt != TCG_REG_R0) {
-            rs = rt;
-        }
-        break;
-    case STD:
-        align = 3;
-        break;
-    case STB: case STH: case STW:
-        break;
-    }
-
-    /* For unaligned, or very large offsets, use the indexed form.  */
-    if (offset & align || offset != (int32_t)offset) {
-        tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig);
-        tcg_out32(s, opx | TAB(rt, base, TCG_REG_R2));
-        return;
-    }
-
-    l0 = (int16_t)offset;
-    offset = (offset - l0) >> 16;
-    l1 = (int16_t)offset;
-
-    if (l1 < 0 && orig >= 0) {
-        extra = 0x4000;
-        l1 = (int16_t)(offset - 0x4000);
-    }
-    if (l1) {
-        tcg_out32(s, ADDIS | TAI(rs, base, l1));
-        base = rs;
-    }
-    if (extra) {
-        tcg_out32(s, ADDIS | TAI(rs, base, extra));
-        base = rs;
-    }
-    if (opi != ADDI || base != rt || l0 != 0) {
-        tcg_out32(s, opi | TAI(rt, base, l0));
-    }
-}
-
 #if defined (CONFIG_SOFTMMU)
 
 #include "exec/softmmu_defs.h"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (11 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read Richard Henderson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Use the helper macros like TAI.  Fix formatting.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 38 ++++++++++++++++----------------------
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index d4e1efc..90d033c 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1118,21 +1118,18 @@ static void tcg_target_qemu_prologue (TCGContext *s)
 #endif
 
     /* Prologue */
-    tcg_out32 (s, MFSPR | RT (0) | LR);
-    tcg_out32 (s, STDU | RS (1) | RA (1) | (-frame_size & 0xffff));
-    for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i)
-        tcg_out32 (s, (STD
-                       | RS (tcg_target_callee_save_regs[i])
-                       | RA (1)
-                       | (i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)
-                       )
-            );
-    tcg_out32 (s, STD | RS (0) | RA (1) | (frame_size + 16));
+    tcg_out32(s, MFSPR | RT(TCG_REG_R0) | LR);
+    tcg_out32(s, STDU | SAI(TCG_REG_R1, TCG_REG_R1, -frame_size));
+    for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i) {
+        tcg_out32(s, (STD | SAI(tcg_target_callee_save_regs[i], TCG_REG_R1,
+                                i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)));
+    }
+    tcg_out32(s, STD | RS(TCG_REG_R0) | RA(TCG_REG_R1) | (frame_size + 16));
 
 #ifdef CONFIG_USE_GUEST_BASE
     if (GUEST_BASE) {
-        tcg_out_movi (s, TCG_TYPE_I64, TCG_GUEST_BASE_REG, GUEST_BASE);
-        tcg_regset_set_reg (s->reserved_regs, TCG_GUEST_BASE_REG);
+        tcg_out_movi(s, TCG_TYPE_I64, TCG_GUEST_BASE_REG, GUEST_BASE);
+        tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
     }
 #endif
 
@@ -1144,16 +1141,13 @@ static void tcg_target_qemu_prologue (TCGContext *s)
     /* Epilogue */
     tb_ret_addr = s->code_ptr;
 
-    for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i)
-        tcg_out32 (s, (LD
-                       | RT (tcg_target_callee_save_regs[i])
-                       | RA (1)
-                       | (i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)
-                       )
-            );
-    tcg_out32(s, LD | TAI(0, 1, frame_size + 16));
-    tcg_out32(s, MTSPR | RS(0) | LR);
-    tcg_out32(s, ADDI | TAI(1, 1, frame_size));
+    tcg_out32(s, LD | TAI(TCG_REG_R0, TCG_REG_R1, frame_size + 16));
+    for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i) {
+        tcg_out32(s, (LD | TAI(tcg_target_callee_save_regs[i], TCG_REG_R1,
+                               i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)));
+    }
+    tcg_out32(s, MTSPR | RS(TCG_REG_R0) | LR);
+    tcg_out32(s, ADDI | TAI(TCG_REG_R1, TCG_REG_R1, frame_size));
     tcg_out32(s, BCLR | BO_ALWAYS);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (12 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
  2013-08-17  6:23 ` [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Less conditional compilation.  Merge an add insn with the indexed
memory load insn.  Load the tlb addend earlier.  Avoid the address
update memory form.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 202 +++++++++++++++++++++++--------------------------
 1 file changed, 95 insertions(+), 107 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 90d033c..4b23597 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -31,13 +31,11 @@
 
 static uint8_t *tb_ret_addr;
 
-#define FAST_PATH
-
 #if TARGET_LONG_BITS == 32
-#define LD_ADDR LWZU
+#define LD_ADDR LWZ
 #define CMP_L 0
 #else
-#define LD_ADDR LDU
+#define LD_ADDR LD
 #define CMP_L (1<<21)
 #endif
 
@@ -854,39 +852,64 @@ static const void * const qemu_st_helpers[4] = {
     helper_stq_mmu,
 };
 
-static void tcg_out_tlb_read(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
-                             TCGReg addr_reg, int s_bits, int offset)
+/* Perform the TLB load and compare.  Places the result of the comparison
+   in CR7, loads the addend of the TLB into R3, and returns the register
+   containing the guest address (zero-extended into R4).  Clobbers R0 and R2. */
+
+static TCGReg tcg_out_tlb_read(TCGContext *s, int s_bits, TCGReg addr_reg,
+                               int mem_index, bool is_read)
 {
-#if TARGET_LONG_BITS == 32
-    tcg_out_ext32u(s, addr_reg, addr_reg);
-
-    tcg_out_rlw(s, RLWINM, r0, addr_reg,
-                32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),
-                32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS),
-                31 - CPU_TLB_ENTRY_BITS);
-    tcg_out32(s, ADD | TAB(r0, r0, TCG_AREG0));
-    tcg_out32(s, LWZU | TAI(r1, r0, offset));
-    tcg_out_rlw(s, RLWINM, r2, addr_reg, 0,
-                (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
-#else
-    tcg_out_rld (s, RLDICL, r0, addr_reg,
-                 64 - TARGET_PAGE_BITS,
-                 64 - CPU_TLB_BITS);
-    tcg_out_shli64(s, r0, r0, CPU_TLB_ENTRY_BITS);
+    size_t offset
+        = (is_read
+           ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
+           : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
+
+    /* Extract the page index, shifted into place for tlb index.  */
+    if (TARGET_LONG_BITS == 32) {
+        /* Zero-extend the address into a place helpful for further use.  */
+        tcg_out_ext32u(s, TCG_REG_R4, addr_reg);
+        addr_reg = TCG_REG_R4;
+
+        tcg_out_rlw(s, RLWINM, TCG_REG_R3, addr_reg,
+                    32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),
+                    32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS),
+                    31 - CPU_TLB_ENTRY_BITS);
+    } else {
+        tcg_out_rld (s, RLDICL, TCG_REG_R3, addr_reg,
+                     64 - TARGET_PAGE_BITS,
+                     64 - CPU_TLB_BITS);
+        tcg_out_shli64(s, TCG_REG_R3, TCG_REG_R3, CPU_TLB_ENTRY_BITS);
+    }
 
-    tcg_out32(s, ADD | TAB(r0, r0, TCG_AREG0));
-    tcg_out32(s, LD_ADDR | TAI(r1, r0, offset));
+    /* Load the tlb comparator.  */
+    tcg_out32(s, ADD | TAB(TCG_REG_R3, TCG_REG_R3, TCG_AREG0));
+    tcg_out32(s, LD_ADDR | TAI(TCG_REG_R2, TCG_REG_R3, offset));
 
-    if (!s_bits) {
-        tcg_out_rld (s, RLDICR, r2, addr_reg, 0, 63 - TARGET_PAGE_BITS);
-    }
-    else {
-        tcg_out_rld (s, RLDICL, r2, addr_reg,
-                     64 - TARGET_PAGE_BITS,
-                     TARGET_PAGE_BITS - s_bits);
-        tcg_out_rld (s, RLDICL, r2, r2, TARGET_PAGE_BITS, 0);
+    /* Load the TLB addend for use on the fast path.  Do this asap
+       to minimize any load use delay.  */
+    offset = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
+    tcg_out32(s, LD | TAI(TCG_REG_R3, TCG_REG_R3, offset));
+
+    /* Clear the non-page, non-alignment bits from the address.  */
+    if (TARGET_LONG_BITS == 32) {
+        tcg_out_rlw(s, RLWINM, TCG_REG_R0, addr_reg, 0,
+                    (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
+    } else {
+        if (!s_bits) {
+            tcg_out_rld (s, RLDICR, TCG_REG_R0, addr_reg,
+                         0, 63 - TARGET_PAGE_BITS);
+        } else {
+            tcg_out_rld (s, RLDICL, TCG_REG_R0, addr_reg,
+                         64 - TARGET_PAGE_BITS,
+                         TARGET_PAGE_BITS - s_bits);
+            tcg_out_rld (s, RLDICL, TCG_REG_R0, TCG_REG_R0,
+                         TARGET_PAGE_BITS, 0);
+        }
     }
-#endif
+
+    tcg_out32(s, CMP | BF(7) | RA(TCG_REG_R0) | RB(TCG_REG_R2) | CMP_L);
+
+    return addr_reg;
 }
 #endif
 
@@ -918,7 +941,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     PowerOpcode insn;
     int s_bits;
 #ifdef CONFIG_SOFTMMU
-    TCGReg r2, ir;
+    TCGReg ir;
     int mem_index;
     void *label1_ptr, *label2_ptr;
 #endif
@@ -930,26 +953,16 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 #ifdef CONFIG_SOFTMMU
     mem_index = *args;
 
-    r0 = 3;
-    r1 = 4;
-    r2 = 0;
-    rbase = 0;
-
-    tcg_out_tlb_read (s, r0, r1, r2, addr_reg, s_bits,
-                      offsetof (CPUArchState, tlb_table[mem_index][0].addr_read));
-
-    tcg_out32 (s, CMP | BF (7) | RA (r2) | RB (r1) | CMP_L);
+    r0 = tcg_out_tlb_read(s, s_bits, addr_reg, mem_index, true);
 
     label1_ptr = s->code_ptr;
-#ifdef FAST_PATH
-    tcg_out32 (s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-#endif
+    tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
 
     /* slow path */
-    ir = 3;
-    tcg_out_mov (s, TCG_TYPE_I64, ir++, TCG_AREG0);
-    tcg_out_mov (s, TCG_TYPE_I64, ir++, addr_reg);
-    tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
+    ir = TCG_REG_R3;
+    tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
+    tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
 
     tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
 
@@ -959,29 +972,23 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     } else if (data_reg != 3) {
         tcg_out_mov(s, TCG_TYPE_I64, data_reg, 3);
     }
+
     label2_ptr = s->code_ptr;
-    tcg_out32 (s, B);
+    tcg_out32(s, B);
 
     /* label1: fast path */
-#ifdef FAST_PATH
-    reloc_pc14 (label1_ptr, (tcg_target_long) s->code_ptr);
-#endif
-
-    /* r0 now contains &env->tlb_table[mem_index][index].addr_read */
-    tcg_out32(s, LD | TAI(r0, r0,
-                          offsetof(CPUTLBEntry, addend)
-                          - offsetof(CPUTLBEntry, addr_read)));
-    /* r0 = env->tlb_table[mem_index][index].addend */
-    tcg_out32(s, ADD | TAB(r0, r0, addr_reg));
-    /* r0 = env->tlb_table[mem_index][index].addend + addr */
+    reloc_pc14(label1_ptr, (tcg_target_long)s->code_ptr);
 
+    rbase = TCG_REG_R3;
+    r1 = TCG_REG_R0;
 #else  /* !CONFIG_SOFTMMU */
-#if TARGET_LONG_BITS == 32
-    tcg_out_ext32u(s, addr_reg, addr_reg);
-#endif
-    r0 = addr_reg;
-    r1 = 3;
     rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
+    r0 = addr_reg;
+    r1 = TCG_REG_R0;
+    if (TARGET_LONG_BITS == 32) {
+        r0 = TCG_REG_R2;
+        tcg_out_ext32u(s, r0, addr_reg);
+    }
 #endif
 
     insn = qemu_ldx_opc[opc];
@@ -1000,7 +1007,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     }
 
 #ifdef CONFIG_SOFTMMU
-    reloc_pc24 (label2_ptr, (tcg_target_long) s->code_ptr);
+    reloc_pc24(label2_ptr, (tcg_target_long)s->code_ptr);
 #endif
 }
 
@@ -1009,7 +1016,7 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
     TCGReg addr_reg, r0, r1, rbase, data_reg;
     PowerOpcode insn;
 #ifdef CONFIG_SOFTMMU
-    TCGReg r2, ir;
+    TCGReg ir;
     int mem_index;
     void *label1_ptr, *label2_ptr;
 #endif
@@ -1020,63 +1027,44 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
 #ifdef CONFIG_SOFTMMU
     mem_index = *args;
 
-    r0 = 3;
-    r1 = 4;
-    r2 = 0;
-    rbase = 0;
-
-    tcg_out_tlb_read (s, r0, r1, r2, addr_reg, opc,
-                      offsetof (CPUArchState, tlb_table[mem_index][0].addr_write));
-
-    tcg_out32 (s, CMP | BF (7) | RA (r2) | RB (r1) | CMP_L);
+    r0 = tcg_out_tlb_read(s, opc, addr_reg, mem_index, false);
 
     label1_ptr = s->code_ptr;
-#ifdef FAST_PATH
-    tcg_out32 (s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-#endif
+    tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
 
     /* slow path */
-    ir = 3;
-    tcg_out_mov (s, TCG_TYPE_I64, ir++, TCG_AREG0);
-    tcg_out_mov (s, TCG_TYPE_I64, ir++, addr_reg);
-    tcg_out_rld (s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
-    tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
+    ir = TCG_REG_R3;
+    tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
+    tcg_out_rld(s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
+    tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
 
     tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
 
     label2_ptr = s->code_ptr;
-    tcg_out32 (s, B);
+    tcg_out32(s, B);
 
     /* label1: fast path */
-#ifdef FAST_PATH
-    reloc_pc14 (label1_ptr, (tcg_target_long) s->code_ptr);
-#endif
-
-    tcg_out32 (s, (LD
-                   | RT (r0)
-                   | RA (r0)
-                   | (offsetof (CPUTLBEntry, addend)
-                      - offsetof (CPUTLBEntry, addr_write))
-                   ));
-    /* r0 = env->tlb_table[mem_index][index].addend */
-    tcg_out32(s, ADD | TAB(r0, r0, addr_reg));
-    /* r0 = env->tlb_table[mem_index][index].addend + addr */
+    reloc_pc14(label1_ptr, (tcg_target_long) s->code_ptr);
 
+    rbase = TCG_REG_R3;
+    r1 = TCG_REG_R2;
 #else  /* !CONFIG_SOFTMMU */
-#if TARGET_LONG_BITS == 32
-    tcg_out_ext32u(s, addr_reg, addr_reg);
-#endif
-    r1 = 3;
-    r0 = addr_reg;
     rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
+    r0 = addr_reg;
+    r1 = TCG_REG_R3;
+    if (TARGET_LONG_BITS == 32) {
+        r0 = TCG_REG_R2;
+        tcg_out_ext32u(s, r0, addr_reg);
+    }
 #endif
 
     insn = qemu_stx_opc[opc];
     if (!HAVE_ISA_2_06 && insn == STDBRX) {
         tcg_out32(s, STWBRX | SAB(data_reg, rbase, r0));
         tcg_out32(s, ADDI | TAI(r1, r0, 4));
-        tcg_out_shri64(s, 0, data_reg, 32);
-        tcg_out32(s, STWBRX | SAB(0, rbase, r1));
+        tcg_out_shri64(s, TCG_REG_R0, data_reg, 32);
+        tcg_out32(s, STWBRX | SAB(TCG_REG_R0, rbase, r1));
     } else {
         tcg_out32(s, insn | SAB(data_reg, rbase, r0));
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (13 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
  2013-08-17  6:23 ` [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure               |   2 +-
 include/exec/exec-all.h |   4 +-
 tcg/ppc64/tcg-target.c  | 219 +++++++++++++++++++++++++++++++-----------------
 3 files changed, 146 insertions(+), 79 deletions(-)

diff --git a/configure b/configure
index 18fa608..5b9a66c 100755
--- a/configure
+++ b/configure
@@ -3650,7 +3650,7 @@ echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
 echo "ARCH=$ARCH" >> $config_host_mak
 
 case "$cpu" in
-  arm|i386|x86_64|ppc|aarch64)
+  aarch64 | arm | i386 | x86_64 | ppc*)
     # The TCG interpreter currently does not support ld/st optimization.
     if test "$tcg_interpreter" = "no" ; then
         echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_host_mak
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 26c3553..91b189b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -326,11 +326,11 @@ extern uintptr_t tci_tb_ptr;
    (5) post-process (e.g. stack adjust)
    (6) jump to corresponding code of the next of fast path
  */
-# if defined(__i386__) || defined(__x86_64__)
+# if defined(__i386__) || defined(__x86_64__) || defined(_ARCH_PPC64)
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
 /* The return address argument for ldst is passed directly.  */
 #  define GETPC_LDST()  (abort(), 0)
-# elif defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
+# elif defined(_ARCH_PPC)
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
 #  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
 # elif defined(__arm__)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 4b23597..7ecc032 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -830,26 +830,50 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
 #endif
 }
 
+static const PowerOpcode qemu_ldx_opc[8] = {
+#ifdef TARGET_WORDS_BIGENDIAN
+    LBZX, LHZX, LWZX, LDX,
+    0,    LHAX, LWAX, LDX
+#else
+    LBZX, LHBRX, LWBRX, LDBRX,
+    0,    0,     0,     LDBRX,
+#endif
+};
+
+static const PowerOpcode qemu_stx_opc[4] = {
+#ifdef TARGET_WORDS_BIGENDIAN
+    STBX, STHX, STWX, STDX
+#else
+    STBX, STHBRX, STWBRX, STDBRX,
+#endif
+};
+
+static const PowerOpcode qemu_exts_opc[4] = {
+    EXTSB, EXTSH, EXTSW, 0
+};
+
 #if defined (CONFIG_SOFTMMU)
 
 #include "exec/softmmu_defs.h"
 
 /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
-   int mmu_idx) */
+ *                                 int mmu_idx, uintptr_t ra)
+ */
 static const void * const qemu_ld_helpers[4] = {
-    helper_ldb_mmu,
-    helper_ldw_mmu,
-    helper_ldl_mmu,
-    helper_ldq_mmu,
+    helper_ret_ldb_mmu,
+    helper_ret_ldw_mmu,
+    helper_ret_ldl_mmu,
+    helper_ret_ldq_mmu,
 };
 
 /* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
-   uintxx_t val, int mmu_idx) */
+ *                                 uintxx_t val, int mmu_idx, uintptr_t ra)
+ */
 static const void * const qemu_st_helpers[4] = {
-    helper_stb_mmu,
-    helper_stw_mmu,
-    helper_stl_mmu,
-    helper_stq_mmu,
+    helper_ret_stb_mmu,
+    helper_ret_stw_mmu,
+    helper_ret_stl_mmu,
+    helper_ret_stq_mmu,
 };
 
 /* Perform the TLB load and compare.  Places the result of the comparison
@@ -911,29 +935,108 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, int s_bits, TCGReg addr_reg,
 
     return addr_reg;
 }
-#endif
 
-static const PowerOpcode qemu_ldx_opc[8] = {
-#ifdef TARGET_WORDS_BIGENDIAN
-    LBZX, LHZX, LWZX, LDX,
-    0,    LHAX, LWAX, LDX
-#else
-    LBZX, LHBRX, LWBRX, LDBRX,
-    0,    0,     0,     LDBRX,
-#endif
-};
+/* Record the context of a call to the out of line helper code for the slow
+   path for a load or store, so that we can later generate the correct
+   helper code.  */
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld, int opc,
+                                int data_reg, int addr_reg, int mem_index,
+                                uint8_t *raddr, uint8_t *label_ptr)
+{
+    int idx;
+    TCGLabelQemuLdst *label;
 
-static const PowerOpcode qemu_stx_opc[4] = {
-#ifdef TARGET_WORDS_BIGENDIAN
-    STBX, STHX, STWX, STDX
-#else
-    STBX, STHBRX, STWBRX, STDBRX,
-#endif
-};
+    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST) {
+        tcg_abort();
+    }
 
-static const PowerOpcode qemu_exts_opc[4] = {
-    EXTSB, EXTSH, EXTSW, 0
-};
+    idx = s->nb_qemu_ldst_labels++;
+    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
+    label->is_ld = is_ld;
+    label->opc = opc;
+    label->datalo_reg = data_reg;
+    label->addrlo_reg = addr_reg;
+    label->mem_index = mem_index;
+    label->raddr = raddr;
+    label->label_ptr[0] = label_ptr;
+}
+
+/* See the GETPC definition in include/exec/exec-all.h.  */
+static inline uintptr_t do_getpc(uint8_t *raddr)
+{
+    return (uintptr_t)raddr - 1;
+}
+
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+    int opc = lb->opc;
+    int s_bits = opc & 3;
+    PowerOpcode insn;
+
+    reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
+
+    tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_AREG0);
+
+    /* If the address needed to be zero-extended, we'll have already
+       placed it in R4.  The only remaining case is 64-bit guest.  */
+    if (lb->addrlo_reg != TCG_REG_R4) {
+        tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R4, lb->addrlo_reg);
+    }
+
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R5, lb->mem_index);
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R6, do_getpc(lb->raddr));
+
+    tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
+
+    if (opc & 4) {
+        insn = qemu_exts_opc[s_bits];
+        tcg_out32(s, insn | RA(lb->datalo_reg) | RS(TCG_REG_R3));
+    } else {
+        tcg_out_mov(s, TCG_TYPE_I64, lb->datalo_reg, TCG_REG_R3);
+    }
+
+    tcg_out_b(s, 0, (uintptr_t)lb->raddr);
+}
+
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+    int opc = lb->opc;
+
+    reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
+
+    tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R3, TCG_AREG0);
+
+    /* If the address needed to be zero-extended, we'll have already
+       placed it in R4.  The only remaining case is 64-bit guest.  */
+    if (lb->addrlo_reg != TCG_REG_R4) {
+        tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R4, lb->addrlo_reg);
+    }
+
+    tcg_out_rld(s, RLDICL, TCG_REG_R5, lb->datalo_reg,
+                0, 64 - (1 << (3 + opc)));
+    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R6, lb->mem_index);
+    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R7, do_getpc(lb->raddr));
+
+    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
+
+    tcg_out_b(s, 0, (uintptr_t)lb->raddr);
+}
+
+void tcg_out_tb_finalize(TCGContext *s)
+{
+    int i, n = s->nb_qemu_ldst_labels;
+
+    /* qemu_ld/st slow paths */
+    for (i = 0; i < n; i++) {
+        TCGLabelQemuLdst *label = &s->qemu_ldst_labels[i];
+        if (label->is_ld) {
+            tcg_out_qemu_ld_slow_path(s, label);
+        } else {
+            tcg_out_qemu_st_slow_path(s, label);
+        }
+    }
+}
+#endif /* SOFTMMU */
 
 static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 {
@@ -941,9 +1044,8 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     PowerOpcode insn;
     int s_bits;
 #ifdef CONFIG_SOFTMMU
-    TCGReg ir;
     int mem_index;
-    void *label1_ptr, *label2_ptr;
+    void *label_ptr;
 #endif
 
     data_reg = *args++;
@@ -955,29 +1057,8 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 
     r0 = tcg_out_tlb_read(s, s_bits, addr_reg, mem_index, true);
 
-    label1_ptr = s->code_ptr;
-    tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-
-    /* slow path */
-    ir = TCG_REG_R3;
-    tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
-    tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
-    tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
-
-    tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
-
-    if (opc & 4) {
-        insn = qemu_exts_opc[s_bits];
-        tcg_out32(s, insn | RA(data_reg) | RS(3));
-    } else if (data_reg != 3) {
-        tcg_out_mov(s, TCG_TYPE_I64, data_reg, 3);
-    }
-
-    label2_ptr = s->code_ptr;
-    tcg_out32(s, B);
-
-    /* label1: fast path */
-    reloc_pc14(label1_ptr, (tcg_target_long)s->code_ptr);
+    label_ptr = s->code_ptr;
+    tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_FALSE);
 
     rbase = TCG_REG_R3;
     r1 = TCG_REG_R0;
@@ -1007,7 +1088,8 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     }
 
 #ifdef CONFIG_SOFTMMU
-    reloc_pc24(label2_ptr, (tcg_target_long)s->code_ptr);
+    add_qemu_ldst_label(s, true, opc, data_reg, r0, mem_index,
+                        s->code_ptr, label_ptr);
 #endif
 }
 
@@ -1016,9 +1098,8 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
     TCGReg addr_reg, r0, r1, rbase, data_reg;
     PowerOpcode insn;
 #ifdef CONFIG_SOFTMMU
-    TCGReg ir;
     int mem_index;
-    void *label1_ptr, *label2_ptr;
+    void *label_ptr;
 #endif
 
     data_reg = *args++;
@@ -1029,23 +1110,8 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
 
     r0 = tcg_out_tlb_read(s, opc, addr_reg, mem_index, false);
 
-    label1_ptr = s->code_ptr;
-    tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-
-    /* slow path */
-    ir = TCG_REG_R3;
-    tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
-    tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
-    tcg_out_rld(s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
-    tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
-
-    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
-
-    label2_ptr = s->code_ptr;
-    tcg_out32(s, B);
-
-    /* label1: fast path */
-    reloc_pc14(label1_ptr, (tcg_target_long) s->code_ptr);
+    label_ptr = s->code_ptr;
+    tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_FALSE);
 
     rbase = TCG_REG_R3;
     r1 = TCG_REG_R2;
@@ -1070,7 +1136,8 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
     }
 
 #ifdef CONFIG_SOFTMMU
-    reloc_pc24 (label2_ptr, (tcg_target_long) s->code_ptr);
+    add_qemu_ldst_label(s, false, opc, data_reg, r0, mem_index,
+                        s->code_ptr, label_ptr);
 #endif
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64
  2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
                   ` (14 preceding siblings ...)
  2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
@ 2013-08-17  6:23 ` Richard Henderson
  15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-17  6:23 UTC (permalink / raw)
  To: qemu-devel

Ping.

r~

On 08/05/2013 11:28 AM, Richard Henderson wrote:
> About half of these patches are focused on reducing the number of
> full 64-bit constants that need to be generated for addresses:
> 
> E.g. patch 5, looking through the function descriptor.  If the
> program is built --disable-pie, the elements of the function
> descriptors are all 32-bit constants.
> 
> E.g. the end result of indirect jump threading + TCG_REG_TB.
> Before, we reserve 6 insn slots to generate the full 64-bit address.
> After, we use 2 insns -- addis + ld -- to load the full 64-bit
> address from the indirection slot.
> 
> The second patch could probably be reverted.  I'd planned to be
> able to use the same conditional call + tail call scheme as ARM,
> but I'd forgotten the need for a conditional store to go along
> with that.  OTOH, it might still turn out to be useful somewhere.
> 
> 
> r~
> 
> 
> Richard Henderson (15):
>   tcg-ppc64: Avoid code for nop move
>   tcg-ppc64: Add an LK argument to tcg_out_call
>   tcg-ppc64: Use the branch absolute instruction when possible
>   tcg-ppc64: Don't load the static chain from TCG
>   tcg-ppc64: Look through the function descriptor when profitable
>   tcg-ppc64: Move AREG0 to r31
>   tcg-ppc64: Tidy register allocation order
>   tcg-ppc64: Create PowerOpcode
>   tcg-ppc64: Handle long offsets better
>   tcg-ppc64: Use indirect jump threading
>   tcg-ppc64: Setup TCG_REG_TB
>   tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long
>   tcg-ppc64: Tidy tcg_target_qemu_prologue
>   tcg-ppc64: Streamline tcg_out_tlb_read
>   tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION
> 
>  configure               |    2 +-
>  include/exec/exec-all.h |    7 +-
>  tcg/ppc64/tcg-target.c  | 1079 ++++++++++++++++++++++++++---------------------
>  tcg/ppc64/tcg-target.h  |    2 +-
>  4 files changed, 598 insertions(+), 492 deletions(-)
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-08-17  6:23 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
2013-08-17  6:23 ` [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).