* [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64
@ 2013-08-05 18:28 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
` (15 more replies)
0 siblings, 16 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel
About half of these patches are focused on reducing the number of
full 64-bit constants that need to be generated for addresses:
E.g. patch 5, looking through the function descriptor. If the
program is built --disable-pie, the elements of the function
descriptors are all 32-bit constants.
E.g. the end result of indirect jump threading + TCG_REG_TB.
Before, we reserve 6 insn slots to generate the full 64-bit address.
After, we use 2 insns -- addis + ld -- to load the full 64-bit
address from the indirection slot.
The second patch could probably be reverted. I'd planned to be
able to use the same conditional call + tail call scheme as ARM,
but I'd forgotten the need for a conditional store to go along
with that. OTOH, it might still turn out to be useful somewhere.
r~
Richard Henderson (15):
tcg-ppc64: Avoid code for nop move
tcg-ppc64: Add an LK argument to tcg_out_call
tcg-ppc64: Use the branch absolute instruction when possible
tcg-ppc64: Don't load the static chain from TCG
tcg-ppc64: Look through the function descriptor when profitable
tcg-ppc64: Move AREG0 to r31
tcg-ppc64: Tidy register allocation order
tcg-ppc64: Create PowerOpcode
tcg-ppc64: Handle long offsets better
tcg-ppc64: Use indirect jump threading
tcg-ppc64: Setup TCG_REG_TB
tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long
tcg-ppc64: Tidy tcg_target_qemu_prologue
tcg-ppc64: Streamline tcg_out_tlb_read
tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION
configure | 2 +-
include/exec/exec-all.h | 7 +-
tcg/ppc64/tcg-target.c | 1079 ++++++++++++++++++++++++++---------------------
tcg/ppc64/tcg-target.h | 2 +-
4 files changed, 598 insertions(+), 492 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call Richard Henderson
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 0678de2..0e3147b 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -508,7 +508,9 @@ static const uint32_t tcg_to_isel[] = {
static inline void tcg_out_mov(TCGContext *s, TCGType type,
TCGReg ret, TCGReg arg)
{
- tcg_out32 (s, OR | SAB (arg, ret, arg));
+ if (ret != arg) {
+ tcg_out32 (s, OR | SAB (arg, ret, arg));
+ }
}
static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible Richard Henderson
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
This will enable the generation of tail-calls in a future patch.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 0e3147b..94960a3 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -702,30 +702,30 @@ static void tcg_out_b (TCGContext *s, int mask, tcg_target_long target)
}
}
-static void tcg_out_call (TCGContext *s, tcg_target_long arg, int const_arg)
+/* Make a call to a function. LK = LK for a normal call, or 0 to avoid
+ setting the link register, making a tail call. */
+static void tcg_out_call(TCGContext *s, tcg_target_long arg,
+ int const_arg, int lk)
{
#ifdef __APPLE__
if (const_arg) {
- tcg_out_b (s, LK, arg);
- }
- else {
- tcg_out32 (s, MTSPR | RS (arg) | LR);
- tcg_out32 (s, BCLR | BO_ALWAYS | LK);
+ tcg_out_b(s, lk, arg);
+ } else {
+ tcg_out32(s, MTSPR | RS(arg) | CTR);
+ tcg_out32(s, BCCTR | BO_ALWAYS | lk);
}
#else
- int reg;
-
+ TCGReg reg = arg;
if (const_arg) {
- reg = 2;
- tcg_out_movi (s, TCG_TYPE_I64, reg, arg);
+ reg = TCG_REG_R2;
+ tcg_out_movi(s, TCG_TYPE_I64, reg, arg);
}
- else reg = arg;
- tcg_out32 (s, LD | RT (0) | RA (reg));
- tcg_out32 (s, MTSPR | RA (0) | CTR);
- tcg_out32 (s, LD | RT (11) | RA (reg) | 16);
- tcg_out32 (s, LD | RT (2) | RA (reg) | 8);
- tcg_out32 (s, BCCTR | BO_ALWAYS | LK);
+ tcg_out32(s, LD | TAI(TCG_REG_R0, reg, 0));
+ tcg_out32(s, MTSPR | RA(TCG_REG_R0) | CTR);
+ tcg_out32(s, LD | TAI(TCG_REG_R11, reg, 16));
+ tcg_out32(s, LD | TAI(TCG_REG_R2, reg, 8));
+ tcg_out32(s, BCCTR | BO_ALWAYS | lk);
#endif
}
@@ -869,7 +869,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
tcg_out_mov (s, TCG_TYPE_I64, ir++, addr_reg);
tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
- tcg_out_call (s, (tcg_target_long) qemu_ld_helpers[s_bits], 1);
+ tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
if (opc & 4) {
insn = qemu_exts_opc[s_bits];
@@ -960,7 +960,7 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
tcg_out_rld (s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
- tcg_out_call (s, (tcg_target_long) qemu_st_helpers[opc], 1);
+ tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
label2_ptr = s->code_ptr;
tcg_out32 (s, B);
@@ -1440,7 +1440,7 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
}
break;
case INDEX_op_call:
- tcg_out_call (s, args[0], const_args[0]);
+ tcg_out_call(s, args[0], const_args[0], LK);
break;
case INDEX_op_movi_i32:
tcg_out_movi (s, TCG_TYPE_I32, args[0], args[1]);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG Richard Henderson
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
... before falling back to an indirect branch.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 31 ++++++++++++++++++-------------
1 file changed, 18 insertions(+), 13 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 94960a3..fce3e5d 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -173,13 +173,17 @@ static const int tcg_target_callee_save_regs[] = {
TCG_REG_R31
};
+static inline bool in_range_b(intptr_t disp)
+{
+ return disp >= -0x4000000 && disp < 0x4000000;
+}
+
static uint32_t reloc_pc24_val (void *pc, tcg_target_long target)
{
tcg_target_long disp;
disp = target - (tcg_target_long) pc;
- if ((disp << 38) >> 38 != disp)
- tcg_abort ();
+ assert(in_range_b(disp));
return disp & 0x3fffffc;
}
@@ -195,8 +199,7 @@ static uint16_t reloc_pc14_val (void *pc, tcg_target_long target)
tcg_target_long disp;
disp = target - (tcg_target_long) pc;
- if (disp != (int16_t) disp)
- tcg_abort ();
+ assert(disp == (int16_t)disp);
return disp & 0xfffc;
}
@@ -454,6 +457,7 @@ static int tcg_target_const_match (tcg_target_long val,
#define FXM(b) (1 << (19 - (b)))
#define LK 1
+#define AA 2
#define TAB(t, a, b) (RT(t) | RA(a) | RB(b))
#define SAB(s, a, b) (RS(s) | RA(a) | RB(b))
@@ -688,17 +692,18 @@ static void tcg_out_xori32(TCGContext *s, TCGReg dst, TCGReg src, uint32_t c)
tcg_out_zori32(s, dst, src, c, XORI, XORIS);
}
-static void tcg_out_b (TCGContext *s, int mask, tcg_target_long target)
+static void tcg_out_b(TCGContext *s, int lk, tcg_target_long target)
{
- tcg_target_long disp;
+ tcg_target_long disp = target - (tcg_target_long) s->code_ptr;
- disp = target - (tcg_target_long) s->code_ptr;
- if ((disp << 38) >> 38 == disp)
- tcg_out32 (s, B | (disp & 0x3fffffc) | mask);
- else {
- tcg_out_movi (s, TCG_TYPE_I64, 0, (tcg_target_long) target);
- tcg_out32 (s, MTSPR | RS (0) | CTR);
- tcg_out32 (s, BCCTR | BO_ALWAYS | mask);
+ if (in_range_b(disp)) {
+ tcg_out32(s, B | (disp & 0x3fffffc) | lk);
+ } else if (in_range_b(target)) {
+ tcg_out32(s, B | AA | target | lk);
+ } else {
+ tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R0, target);
+ tcg_out32 (s, MTSPR | RS(TCG_REG_R0) | CTR);
+ tcg_out32 (s, BCCTR | BO_ALWAYS | lk);
}
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (2 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable Richard Henderson
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
There are no helpers that require the static chain.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index fce3e5d..ddc9581 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -728,7 +728,6 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
tcg_out32(s, LD | TAI(TCG_REG_R0, reg, 0));
tcg_out32(s, MTSPR | RA(TCG_REG_R0) | CTR);
- tcg_out32(s, LD | TAI(TCG_REG_R11, reg, 16));
tcg_out32(s, LD | TAI(TCG_REG_R2, reg, 8));
tcg_out32(s, BCCTR | BO_ALWAYS | lk);
#endif
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (3 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31 Richard Henderson
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Loading 32-bit immediates instead of memory will be faster.
Don't attempt to generate full 64-bit immediates.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index ddc9581..2563253 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -722,6 +722,17 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
#else
TCGReg reg = arg;
if (const_arg) {
+ uintptr_t tgt = ((uintptr_t *)arg)[0];
+ uintptr_t toc = ((uintptr_t *)arg)[1];
+
+ /* Look through the function descriptor, if profitable. */
+ if (tgt == (int32_t)tgt && toc == (int32_t)toc) {
+ tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R2, toc);
+ tcg_out_b(s, lk, tgt);
+ return;
+ }
+
+ /* Avoid generating two full 64-bit constants. */
reg = TCG_REG_R2;
tcg_out_movi(s, TCG_TYPE_I64, reg, arg);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (4 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order Richard Henderson
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Now that AREG0 doesn't need to be a global register, non-conflicting
with the normal frame pointer, move it out of the middle of the set.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 4 ++--
tcg/ppc64/tcg-target.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 2563253..2b3d1bb 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -166,11 +166,11 @@ static const int tcg_target_callee_save_regs[] = {
TCG_REG_R24,
TCG_REG_R25,
TCG_REG_R26,
- TCG_REG_R27, /* currently used for the global env */
+ TCG_REG_R27,
TCG_REG_R28,
TCG_REG_R29,
TCG_REG_R30,
- TCG_REG_R31
+ TCG_REG_R31, /* currently used for the global env */
};
static inline bool in_range_b(intptr_t disp)
diff --git a/tcg/ppc64/tcg-target.h b/tcg/ppc64/tcg-target.h
index 48fc6e2..66d0515 100644
--- a/tcg/ppc64/tcg-target.h
+++ b/tcg/ppc64/tcg-target.h
@@ -119,7 +119,7 @@ typedef enum {
#define TCG_TARGET_HAS_mulu2_i64 1
#define TCG_TARGET_HAS_muls2_i64 1
-#define TCG_AREG0 TCG_REG_R27
+#define TCG_AREG0 TCG_REG_R31
#define TCG_TARGET_EXTEND_ARGS 1
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (5 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31 Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode Richard Henderson
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Remove conditionalization from tcg_target_reg_alloc_order, relying on
reserved_regs to prevent register allocation that shouldn't happen.
So R11 is now present in reg_alloc_order for __APPLE__, but also now
reserved.
Sort reg_alloc_order into call-saved, call-clobbered, and parameters.
This reduces the effect of values getting spilled and reloaded before
function calls.
Whether or not it is reserved, R2 (TOC) is always call-clobbered.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 46 +++++++++++++++++++++-------------------------
1 file changed, 21 insertions(+), 25 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 2b3d1bb..862e84c 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -99,7 +99,7 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
#endif
static const int tcg_target_reg_alloc_order[] = {
- TCG_REG_R14,
+ TCG_REG_R14, /* call saved registers */
TCG_REG_R15,
TCG_REG_R16,
TCG_REG_R17,
@@ -109,29 +109,25 @@ static const int tcg_target_reg_alloc_order[] = {
TCG_REG_R21,
TCG_REG_R22,
TCG_REG_R23,
+ TCG_REG_R24,
+ TCG_REG_R25,
+ TCG_REG_R26,
+ TCG_REG_R27,
TCG_REG_R28,
TCG_REG_R29,
TCG_REG_R30,
TCG_REG_R31,
-#ifdef __APPLE__
+ TCG_REG_R12, /* call clobbered, non-arguments */
+ TCG_REG_R11,
TCG_REG_R2,
-#endif
- TCG_REG_R3,
- TCG_REG_R4,
- TCG_REG_R5,
- TCG_REG_R6,
- TCG_REG_R7,
- TCG_REG_R8,
+ TCG_REG_R10, /* call clobbered, arguments */
TCG_REG_R9,
- TCG_REG_R10,
-#ifndef __APPLE__
- TCG_REG_R11,
-#endif
- TCG_REG_R12,
- TCG_REG_R24,
- TCG_REG_R25,
- TCG_REG_R26,
- TCG_REG_R27
+ TCG_REG_R8,
+ TCG_REG_R7,
+ TCG_REG_R6,
+ TCG_REG_R5,
+ TCG_REG_R4,
+ TCG_REG_R3,
};
static const int tcg_target_call_iarg_regs[] = {
@@ -2160,9 +2156,7 @@ static void tcg_target_init (TCGContext *s)
tcg_regset_set32 (tcg_target_available_regs[TCG_TYPE_I64], 0, 0xffffffff);
tcg_regset_set32 (tcg_target_call_clobber_regs, 0,
(1 << TCG_REG_R0) |
-#ifdef __APPLE__
(1 << TCG_REG_R2) |
-#endif
(1 << TCG_REG_R3) |
(1 << TCG_REG_R4) |
(1 << TCG_REG_R5) |
@@ -2176,12 +2170,14 @@ static void tcg_target_init (TCGContext *s)
);
tcg_regset_clear (s->reserved_regs);
- tcg_regset_set_reg (s->reserved_regs, TCG_REG_R0);
- tcg_regset_set_reg (s->reserved_regs, TCG_REG_R1);
-#ifndef __APPLE__
- tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2);
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_R0); /* tcg temp */
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_R1); /* stack pointer */
+#ifdef __APPLE__
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_R11); /* ??? */
+#else
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2); /* toc */
#endif
- tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13);
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13); /* thread pointer */
tcg_add_target_add_op_defs (ppc_op_defs);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (6 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better Richard Henderson
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
This makes some bits easier to debug, being presented with a symbol
instead of a number inside gdb.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 273 +++++++++++++++++++++++++------------------------
1 file changed, 138 insertions(+), 135 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 862e84c..a79b876 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -313,133 +313,10 @@ static int tcg_target_const_match (tcg_target_long val,
#define XO58(opc) (OPCD(58)|(opc))
#define XO62(opc) (OPCD(62)|(opc))
-#define B OPCD( 18)
-#define BC OPCD( 16)
-#define LBZ OPCD( 34)
-#define LHZ OPCD( 40)
-#define LHA OPCD( 42)
-#define LWZ OPCD( 32)
-#define STB OPCD( 38)
-#define STH OPCD( 44)
-#define STW OPCD( 36)
-
-#define STD XO62( 0)
-#define STDU XO62( 1)
-#define STDX XO31(149)
-
-#define LD XO58( 0)
-#define LDX XO31( 21)
-#define LDU XO58( 1)
-#define LWA XO58( 2)
-#define LWAX XO31(341)
-
-#define ADDIC OPCD( 12)
-#define ADDI OPCD( 14)
-#define ADDIS OPCD( 15)
-#define ORI OPCD( 24)
-#define ORIS OPCD( 25)
-#define XORI OPCD( 26)
-#define XORIS OPCD( 27)
-#define ANDI OPCD( 28)
-#define ANDIS OPCD( 29)
-#define MULLI OPCD( 7)
-#define CMPLI OPCD( 10)
-#define CMPI OPCD( 11)
-#define SUBFIC OPCD( 8)
-
-#define LWZU OPCD( 33)
-#define STWU OPCD( 37)
-
-#define RLWIMI OPCD( 20)
-#define RLWINM OPCD( 21)
-#define RLWNM OPCD( 23)
-
-#define RLDICL MD30( 0)
-#define RLDICR MD30( 1)
-#define RLDIMI MD30( 3)
-#define RLDCL MDS30( 8)
-
-#define BCLR XO19( 16)
-#define BCCTR XO19(528)
-#define CRAND XO19(257)
-#define CRANDC XO19(129)
-#define CRNAND XO19(225)
-#define CROR XO19(449)
-#define CRNOR XO19( 33)
-
-#define EXTSB XO31(954)
-#define EXTSH XO31(922)
-#define EXTSW XO31(986)
-#define ADD XO31(266)
-#define ADDE XO31(138)
-#define ADDME XO31(234)
-#define ADDZE XO31(202)
-#define ADDC XO31( 10)
-#define AND XO31( 28)
-#define SUBF XO31( 40)
-#define SUBFC XO31( 8)
-#define SUBFE XO31(136)
-#define SUBFME XO31(232)
-#define SUBFZE XO31(200)
-#define OR XO31(444)
-#define XOR XO31(316)
-#define MULLW XO31(235)
-#define MULHWU XO31( 11)
-#define DIVW XO31(491)
-#define DIVWU XO31(459)
-#define CMP XO31( 0)
-#define CMPL XO31( 32)
-#define LHBRX XO31(790)
-#define LWBRX XO31(534)
-#define LDBRX XO31(532)
-#define STHBRX XO31(918)
-#define STWBRX XO31(662)
-#define STDBRX XO31(660)
-#define MFSPR XO31(339)
-#define MTSPR XO31(467)
-#define SRAWI XO31(824)
-#define NEG XO31(104)
-#define MFCR XO31( 19)
-#define MFOCRF (MFCR | (1u << 20))
-#define NOR XO31(124)
-#define CNTLZW XO31( 26)
-#define CNTLZD XO31( 58)
-#define ANDC XO31( 60)
-#define ORC XO31(412)
-#define EQV XO31(284)
-#define NAND XO31(476)
-#define ISEL XO31( 15)
-
-#define MULLD XO31(233)
-#define MULHD XO31( 73)
-#define MULHDU XO31( 9)
-#define DIVD XO31(489)
-#define DIVDU XO31(457)
-
-#define LBZX XO31( 87)
-#define LHZX XO31(279)
-#define LHAX XO31(343)
-#define LWZX XO31( 23)
-#define STBX XO31(215)
-#define STHX XO31(407)
-#define STWX XO31(151)
-
#define SPR(a,b) ((((a)<<5)|(b))<<11)
#define LR SPR(8, 0)
#define CTR SPR(9, 0)
-#define SLW XO31( 24)
-#define SRW XO31(536)
-#define SRAW XO31(792)
-
-#define SLD XO31( 27)
-#define SRD XO31(539)
-#define SRAD XO31(794)
-#define SRADI XO31(413<<1)
-
-#define TW XO31( 4)
-#define TRAP (TW | TO (31))
-
#define RT(r) ((r)<<21)
#define RS(r) ((r)<<21)
#define RA(r) ((r)<<16)
@@ -455,6 +332,131 @@ static int tcg_target_const_match (tcg_target_long val,
#define LK 1
#define AA 2
+typedef enum PowerOpcode {
+ B = OPCD( 18),
+ BC = OPCD( 16),
+ LBZ = OPCD( 34),
+ LHZ = OPCD( 40),
+ LHA = OPCD( 42),
+ LWZ = OPCD( 32),
+ STB = OPCD( 38),
+ STH = OPCD( 44),
+ STW = OPCD( 36),
+
+ STD = XO62( 0),
+ STDU = XO62( 1),
+ STDX = XO31(149),
+
+ LD = XO58( 0),
+ LDX = XO31( 21),
+ LDU = XO58( 1),
+ LWA = XO58( 2),
+ LWAX = XO31(341),
+
+ ADDIC = OPCD( 12),
+ ADDI = OPCD( 14),
+ ADDIS = OPCD( 15),
+ ORI = OPCD( 24),
+ ORIS = OPCD( 25),
+ XORI = OPCD( 26),
+ XORIS = OPCD( 27),
+ ANDI = OPCD( 28),
+ ANDIS = OPCD( 29),
+ MULLI = OPCD( 7),
+ CMPLI = OPCD( 10),
+ CMPI = OPCD( 11),
+ SUBFIC = OPCD( 8),
+
+ LWZU = OPCD( 33),
+ STWU = OPCD( 37),
+
+ RLWIMI = OPCD( 20),
+ RLWINM = OPCD( 21),
+ RLWNM = OPCD( 23),
+
+ RLDICL = MD30( 0),
+ RLDICR = MD30( 1),
+ RLDIMI = MD30( 3),
+ RLDCL = MDS30( 8),
+
+ BCLR = XO19( 16),
+ BCCTR = XO19(528),
+ CRAND = XO19(257),
+ CRANDC = XO19(129),
+ CRNAND = XO19(225),
+ CROR = XO19(449),
+ CRNOR = XO19( 33),
+
+ EXTSB = XO31(954),
+ EXTSH = XO31(922),
+ EXTSW = XO31(986),
+ ADD = XO31(266),
+ ADDE = XO31(138),
+ ADDME = XO31(234),
+ ADDZE = XO31(202),
+ ADDC = XO31( 10),
+ AND = XO31( 28),
+ SUBF = XO31( 40),
+ SUBFC = XO31( 8),
+ SUBFE = XO31(136),
+ SUBFME = XO31(232),
+ SUBFZE = XO31(200),
+ OR = XO31(444),
+ XOR = XO31(316),
+ MULLW = XO31(235),
+ MULHWU = XO31( 11),
+ DIVW = XO31(491),
+ DIVWU = XO31(459),
+ CMP = XO31( 0),
+ CMPL = XO31( 32),
+ LHBRX = XO31(790),
+ LWBRX = XO31(534),
+ LDBRX = XO31(532),
+ STHBRX = XO31(918),
+ STWBRX = XO31(662),
+ STDBRX = XO31(660),
+ MFSPR = XO31(339),
+ MTSPR = XO31(467),
+ SRAWI = XO31(824),
+ NEG = XO31(104),
+ MFCR = XO31( 19),
+ MFOCRF = MFCR | (1u << 20),
+ NOR = XO31(124),
+ CNTLZW = XO31( 26),
+ CNTLZD = XO31( 58),
+ ANDC = XO31( 60),
+ ORC = XO31(412),
+ EQV = XO31(284),
+ NAND = XO31(476),
+ ISEL = XO31( 15),
+
+ MULLD = XO31(233),
+ MULHD = XO31( 73),
+ MULHDU = XO31( 9),
+ DIVD = XO31(489),
+ DIVDU = XO31(457),
+
+ LBZX = XO31( 87),
+ LHZX = XO31(279),
+ LHAX = XO31(343),
+ LWZX = XO31( 23),
+ STBX = XO31(215),
+ STHX = XO31(407),
+ STWX = XO31(151),
+
+ SLW = XO31( 24),
+ SRW = XO31(536),
+ SRAW = XO31(792),
+
+ SLD = XO31( 27),
+ SRD = XO31(539),
+ SRAD = XO31(794),
+ SRADI = XO31(413<<1),
+
+ TW = XO31( 4),
+ TRAP = TW | TO(31),
+} PowerOpcode;
+
#define TAB(t, a, b) (RT(t) | RA(a) | RB(b))
#define SAB(s, a, b) (RS(s) | RA(a) | RB(b))
#define TAI(s, a, i) (RT(s) | RA(a) | ((i) & 0xffff))
@@ -513,16 +515,16 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
}
}
-static inline void tcg_out_rld(TCGContext *s, int op, TCGReg ra, TCGReg rs,
- int sh, int mb)
+static inline void tcg_out_rld(TCGContext *s, PowerOpcode op, TCGReg ra,
+ TCGReg rs, int sh, int mb)
{
sh = SH (sh & 0x1f) | (((sh >> 5) & 1) << 1);
mb = MB64 ((mb >> 5) | ((mb << 1) & 0x3f));
tcg_out32 (s, op | RA (ra) | RS (rs) | sh | mb);
}
-static inline void tcg_out_rlw(TCGContext *s, int op, TCGReg ra, TCGReg rs,
- int sh, int mb, int me)
+static inline void tcg_out_rlw(TCGContext *s, PowerOpcode op, TCGReg ra,
+ TCGReg rs, int sh, int mb, int me)
{
tcg_out32(s, op | RA(ra) | RS(rs) | SH(sh) | MB(mb) | ME(me));
}
@@ -666,7 +668,7 @@ static void tcg_out_andi64(TCGContext *s, TCGReg dst, TCGReg src, uint64_t c)
}
static void tcg_out_zori32(TCGContext *s, TCGReg dst, TCGReg src, uint32_t c,
- int op_lo, int op_hi)
+ PowerOpcode op_lo, PowerOpcode op_hi)
{
if (c >> 16) {
tcg_out32(s, op_hi | SAI(src, dst, c >> 16));
@@ -741,7 +743,7 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
}
static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
- int offset, int op1, int op2)
+ int offset, PowerOpcode op1, PowerOpcode op2)
{
if (offset == (int16_t) offset) {
tcg_out32(s, op1 | TAI(ret, addr, offset));
@@ -752,7 +754,7 @@ static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
}
static void tcg_out_ldsta(TCGContext *s, TCGReg ret, TCGReg addr,
- int offset, int op1, int op2)
+ int offset, PowerOpcode op1, PowerOpcode op2)
{
if (offset == (int16_t) (offset & ~3)) {
tcg_out32(s, op1 | TAI(ret, addr, offset));
@@ -820,7 +822,7 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
}
#endif
-static const uint32_t qemu_ldx_opc[8] = {
+static const PowerOpcode qemu_ldx_opc[8] = {
#ifdef TARGET_WORDS_BIGENDIAN
LBZX, LHZX, LWZX, LDX,
0, LHAX, LWAX, LDX
@@ -830,7 +832,7 @@ static const uint32_t qemu_ldx_opc[8] = {
#endif
};
-static const uint32_t qemu_stx_opc[4] = {
+static const PowerOpcode qemu_stx_opc[4] = {
#ifdef TARGET_WORDS_BIGENDIAN
STBX, STHX, STWX, STDX
#else
@@ -838,14 +840,15 @@ static const uint32_t qemu_stx_opc[4] = {
#endif
};
-static const uint32_t qemu_exts_opc[4] = {
+static const PowerOpcode qemu_exts_opc[4] = {
EXTSB, EXTSH, EXTSW, 0
};
static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
{
TCGReg addr_reg, data_reg, r0, r1, rbase;
- uint32_t insn, s_bits;
+ PowerOpcode insn;
+ int s_bits;
#ifdef CONFIG_SOFTMMU
TCGReg r2, ir;
int mem_index;
@@ -936,7 +939,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
{
TCGReg addr_reg, r0, r1, rbase, data_reg;
- uint32_t insn;
+ PowerOpcode insn;
#ifdef CONFIG_SOFTMMU
TCGReg r2, ir;
int mem_index;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (7 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading Richard Henderson
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Previously we'd only handle 16-bit offsets from memory operand
without falling back to indexed, but it's easy to use ADDIS to
handle full 32-bit offsets.
This also lets us unify code that existed inline in tcg_out_op
for handling addition of large constants.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 159 +++++++++++++++++++++++++------------------------
1 file changed, 81 insertions(+), 78 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index a79b876..e9c41fb 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -119,7 +119,6 @@ static const int tcg_target_reg_alloc_order[] = {
TCG_REG_R31,
TCG_REG_R12, /* call clobbered, non-arguments */
TCG_REG_R11,
- TCG_REG_R2,
TCG_REG_R10, /* call clobbered, arguments */
TCG_REG_R9,
TCG_REG_R8,
@@ -742,25 +741,55 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
#endif
}
-static void tcg_out_ldst(TCGContext *s, TCGReg ret, TCGReg addr,
- int offset, PowerOpcode op1, PowerOpcode op2)
+static void tcg_out_mem_long(TCGContext *s, PowerOpcode opi, PowerOpcode opx,
+ TCGReg rt, TCGReg base, tcg_target_long offset)
{
- if (offset == (int16_t) offset) {
- tcg_out32(s, op1 | TAI(ret, addr, offset));
- } else {
- tcg_out_movi(s, TCG_TYPE_I64, 0, offset);
- tcg_out32(s, op2 | TAB(ret, addr, 0));
+ tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
+ TCGReg rs = TCG_REG_R2;
+
+ assert(rt != TCG_REG_R2 && base != TCG_REG_R2);
+
+ switch (opi) {
+ case LD: case LWA:
+ align = 3;
+ /* FALLTHRU */
+ default:
+ if (rt != TCG_REG_R0) {
+ rs = rt;
+ }
+ break;
+ case STD:
+ align = 3;
+ break;
+ case STB: case STH: case STW:
+ break;
}
-}
-static void tcg_out_ldsta(TCGContext *s, TCGReg ret, TCGReg addr,
- int offset, PowerOpcode op1, PowerOpcode op2)
-{
- if (offset == (int16_t) (offset & ~3)) {
- tcg_out32(s, op1 | TAI(ret, addr, offset));
- } else {
- tcg_out_movi(s, TCG_TYPE_I64, 0, offset);
- tcg_out32(s, op2 | TAB(ret, addr, 0));
+ /* For unaligned, or very large offsets, use the indexed form. */
+ if (offset & align || offset != (int32_t)offset) {
+ tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig);
+ tcg_out32(s, opx | TAB(rt, base, TCG_REG_R2));
+ return;
+ }
+
+ l0 = (int16_t)offset;
+ offset = (offset - l0) >> 16;
+ l1 = (int16_t)offset;
+
+ if (l1 < 0 && orig >= 0) {
+ extra = 0x4000;
+ l1 = (int16_t)(offset - 0x4000);
+ }
+ if (l1) {
+ tcg_out32(s, ADDIS | TAI(rs, base, l1));
+ base = rs;
+ }
+ if (extra) {
+ tcg_out32(s, ADDIS | TAI(rs, base, extra));
+ base = rs;
+ }
+ if (opi != ADDI || base != rt || l0 != 0) {
+ tcg_out32(s, opi | TAI(rt, base, l0));
}
}
@@ -1088,22 +1117,30 @@ static void tcg_target_qemu_prologue (TCGContext *s)
tcg_out32(s, BCLR | BO_ALWAYS);
}
-static void tcg_out_ld (TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
- tcg_target_long arg2)
+static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
+ tcg_target_long arg2)
{
- if (type == TCG_TYPE_I32)
- tcg_out_ldst (s, ret, arg1, arg2, LWZ, LWZX);
- else
- tcg_out_ldsta (s, ret, arg1, arg2, LD, LDX);
+ PowerOpcode opi, opx;
+
+ if (type == TCG_TYPE_I32) {
+ opi = LWZ, opx = LWZX;
+ } else {
+ opi = LD, opx = LDX;
+ }
+ tcg_out_mem_long(s, opi, opx, ret, arg1, arg2);
}
-static void tcg_out_st (TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
- tcg_target_long arg2)
+static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
+ tcg_target_long arg2)
{
- if (type == TCG_TYPE_I32)
- tcg_out_ldst (s, arg, arg1, arg2, STW, STWX);
- else
- tcg_out_ldsta (s, arg, arg1, arg2, STD, STDX);
+ PowerOpcode opi, opx;
+
+ if (type == TCG_TYPE_I32) {
+ opi = STW, opx = STWX;
+ } else {
+ opi = STD, opx = STDX;
+ }
+ tcg_out_mem_long(s, opi, opx, arg, arg1, arg2);
}
static void tcg_out_cmp(TCGContext *s, int cond, TCGArg arg1, TCGArg arg2,
@@ -1464,61 +1501,52 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
break;
case INDEX_op_ld8u_i32:
case INDEX_op_ld8u_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], LBZ, LBZX);
+ tcg_out_mem_long(s, LBZ, LBZX, args[0], args[1], args[2]);
break;
case INDEX_op_ld8s_i32:
case INDEX_op_ld8s_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], LBZ, LBZX);
- tcg_out32 (s, EXTSB | RS (args[0]) | RA (args[0]));
+ tcg_out_mem_long(s, LBZ, LBZX, args[0], args[1], args[2]);
+ tcg_out32(s, EXTSB | RS(args[0]) | RA(args[0]));
break;
case INDEX_op_ld16u_i32:
case INDEX_op_ld16u_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], LHZ, LHZX);
+ tcg_out_mem_long(s, LHZ, LHZX, args[0], args[1], args[2]);
break;
case INDEX_op_ld16s_i32:
case INDEX_op_ld16s_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], LHA, LHAX);
+ tcg_out_mem_long(s, LHA, LHAX, args[0], args[1], args[2]);
break;
case INDEX_op_ld_i32:
case INDEX_op_ld32u_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], LWZ, LWZX);
+ tcg_out_mem_long(s, LWZ, LWZX, args[0], args[1], args[2]);
break;
case INDEX_op_ld32s_i64:
- tcg_out_ldsta (s, args[0], args[1], args[2], LWA, LWAX);
+ tcg_out_mem_long(s, LWA, LWAX, args[0], args[1], args[2]);
break;
case INDEX_op_ld_i64:
- tcg_out_ldsta (s, args[0], args[1], args[2], LD, LDX);
+ tcg_out_mem_long(s, LD, LDX, args[0], args[1], args[2]);
break;
case INDEX_op_st8_i32:
case INDEX_op_st8_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], STB, STBX);
+ tcg_out_mem_long(s, STB, STBX, args[0], args[1], args[2]);
break;
case INDEX_op_st16_i32:
case INDEX_op_st16_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], STH, STHX);
+ tcg_out_mem_long(s, STH, STHX, args[0], args[1], args[2]);
break;
case INDEX_op_st_i32:
case INDEX_op_st32_i64:
- tcg_out_ldst (s, args[0], args[1], args[2], STW, STWX);
+ tcg_out_mem_long(s, STW, STWX, args[0], args[1], args[2]);
break;
case INDEX_op_st_i64:
- tcg_out_ldsta (s, args[0], args[1], args[2], STD, STDX);
+ tcg_out_mem_long(s, STD, STDX, args[0], args[1], args[2]);
break;
case INDEX_op_add_i32:
a0 = args[0], a1 = args[1], a2 = args[2];
if (const_args[2]) {
- int32_t l, h;
do_addi_32:
- l = (int16_t)a2;
- h = a2 - l;
- if (h) {
- tcg_out32(s, ADDIS | TAI(a0, a1, h >> 16));
- a1 = a0;
- }
- if (l || a0 != a1) {
- tcg_out32(s, ADDI | TAI(a0, a1, l));
- }
+ tcg_out_mem_long(s, ADDI, ADD, a0, a1, (int32_t)a2);
} else {
tcg_out32(s, ADD | TAB(a0, a1, a2));
}
@@ -1694,32 +1722,8 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
case INDEX_op_add_i64:
a0 = args[0], a1 = args[1], a2 = args[2];
if (const_args[2]) {
- int32_t l0, h1, h2;
do_addi_64:
- /* We can always split any 32-bit signed constant into 3 pieces.
- Note the positive 0x80000000 coming from the sub_i64 path,
- handled with the same code we need for eg 0x7fff8000. */
- assert(a2 == (int32_t)a2 || a2 == 0x80000000);
- l0 = (int16_t)a2;
- h1 = a2 - l0;
- h2 = 0;
- if (h1 < 0 && (int64_t)a2 > 0) {
- h2 = 0x40000000;
- h1 = a2 - h2 - l0;
- }
- assert((TCGArg)h2 + h1 + l0 == a2);
-
- if (h2) {
- tcg_out32(s, ADDIS | TAI(a0, a1, h2 >> 16));
- a1 = a0;
- }
- if (h1) {
- tcg_out32(s, ADDIS | TAI(a0, a1, h1 >> 16));
- a1 = a0;
- }
- if (l0 || a0 != a1) {
- tcg_out32(s, ADDI | TAI(a0, a1, l0));
- }
+ tcg_out_mem_long(s, ADDI, ADD, a0, a1, a2);
} else {
tcg_out32(s, ADD | TAB(a0, a1, a2));
}
@@ -2175,10 +2179,9 @@ static void tcg_target_init (TCGContext *s)
tcg_regset_clear (s->reserved_regs);
tcg_regset_set_reg (s->reserved_regs, TCG_REG_R0); /* tcg temp */
tcg_regset_set_reg (s->reserved_regs, TCG_REG_R1); /* stack pointer */
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2); /* mem temp */
#ifdef __APPLE__
tcg_regset_set_reg (s->reserved_regs, TCG_REG_R11); /* ??? */
-#else
- tcg_regset_set_reg (s->reserved_regs, TCG_REG_R2); /* toc */
#endif
tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13); /* thread pointer */
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (8 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB Richard Henderson
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
We were always doing an indirect jump anyway, and the sequence is
never longer than the 6 insns we were reserving for the direct jump.
Futher cleanups will reduce the length of the constant address load.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
include/exec/exec-all.h | 3 ++-
tcg/ppc64/tcg-target.c | 26 ++++++++------------------
2 files changed, 10 insertions(+), 19 deletions(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b3402a1..26c3553 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -126,7 +126,8 @@ static inline void tlb_flush(CPUArchState *env, int flush_global)
#define CODE_GEN_AVG_BLOCK_SIZE 64
#endif
-#if defined(__arm__) || defined(_ARCH_PPC) \
+#if defined(__arm__) \
+ || (defined(__powerpc__) && !defined(__powerpc64__)) \
|| defined(__x86_64__) || defined(__i386__) \
|| defined(__sparc__) || defined(__aarch64__) \
|| defined(CONFIG_TCG_INTERPRETER)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index e9c41fb..f69bc8f 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1440,17 +1440,6 @@ static void tcg_out_movcond(TCGContext *s, TCGType type, TCGCond cond,
}
}
-void ppc_tb_set_jmp_target (unsigned long jmp_addr, unsigned long addr)
-{
- TCGContext s;
- unsigned long patch_size;
-
- s.code_ptr = (uint8_t *) jmp_addr;
- tcg_out_b (&s, 0, addr);
- patch_size = s.code_ptr - (uint8_t *) jmp_addr;
- flush_icache_range (jmp_addr, jmp_addr + patch_size);
-}
-
static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
const int *const_args)
{
@@ -1464,13 +1453,14 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
break;
case INDEX_op_goto_tb:
if (s->tb_jmp_offset) {
- /* direct jump method */
-
- s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
- s->code_ptr += 28;
- }
- else {
- tcg_abort ();
+ /* Direct jump method. */
+ tcg_abort();
+ } else {
+ /* Indirect jump method. */
+ tcg_out_mem_long(s, LD, LDX, TCG_REG_R0, TCG_REG_R0,
+ (tcg_target_long)(s->tb_next + args[0]));
+ tcg_out32(s, MTSPR | RS(TCG_REG_R0) | CTR);
+ tcg_out32(s, BCCTR | BO_ALWAYS);
}
s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
break;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (9 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long Richard Henderson
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
A handy value "near" the rest of the program's dynamic allocation.
We'll be able to use this value for constant address generation,
cross-TB references, and in the further future, constant pool refs.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index f69bc8f..e01d8bc 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -55,10 +55,15 @@ static bool have_isa_2_06;
#define HAVE_ISEL 0
#endif
+/* Our local "toc" points to the beginning of the TB, making it easy to
+ form addresses in the memory range "near" the TB. Unlike the real TOC,
+ put this in a call-saved register so we don't have to reload it. */
+#define TCG_REG_TB TCG_REG_R30
+
#ifdef CONFIG_USE_GUEST_BASE
-#define TCG_GUEST_BASE_REG 30
+#define TCG_GUEST_BASE_REG TCG_REG_R29
#else
-#define TCG_GUEST_BASE_REG 0
+#define TCG_GUEST_BASE_REG 0
#endif
#ifndef NDEBUG
@@ -1097,8 +1102,9 @@ static void tcg_target_qemu_prologue (TCGContext *s)
}
#endif
- tcg_out_mov (s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
+ tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
tcg_out32 (s, MTSPR | RS (tcg_target_call_iarg_regs[1]) | CTR);
+ tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_TB, tcg_target_call_iarg_regs[1]);
tcg_out32 (s, BCCTR | BO_ALWAYS);
/* Epilogue */
@@ -1457,13 +1463,19 @@ static void tcg_out_op (TCGContext *s, TCGOpcode opc, const TCGArg *args,
tcg_abort();
} else {
/* Indirect jump method. */
- tcg_out_mem_long(s, LD, LDX, TCG_REG_R0, TCG_REG_R0,
+ tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_R0,
(tcg_target_long)(s->tb_next + args[0]));
- tcg_out32(s, MTSPR | RS(TCG_REG_R0) | CTR);
+ tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
tcg_out32(s, BCCTR | BO_ALWAYS);
}
s->tb_next_offset[args[0]] = s->code_ptr - s->code_buf;
+
+ /* In the initial unset chain case, we fall thru. Which means
+ that we need to reset the TCG_REG_TB register to our current. */
+ tcg_out_mem_long(s, ADDI, ADD, TCG_REG_TB, TCG_REG_TB,
+ s->code_buf - s->code_ptr);
break;
+
case INDEX_op_br:
{
TCGLabel *l = &s->labels[args[0]];
@@ -2174,6 +2186,7 @@ static void tcg_target_init (TCGContext *s)
tcg_regset_set_reg (s->reserved_regs, TCG_REG_R11); /* ??? */
#endif
tcg_regset_set_reg (s->reserved_regs, TCG_REG_R13); /* thread pointer */
+ tcg_regset_set_reg (s->reserved_regs, TCG_REG_TB); /* tcg tb pointer */
tcg_add_target_add_op_defs (ppc_op_defs);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (10 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue Richard Henderson
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
This results in significant code size reductions when manipulating
pointers into TCG's own data structures. E.g.
-OUT: [size=180]
+OUT: [size=132]
...
-xxx: li r2,16383 # goto_tb
-xxx: rldicr r2,r2,32,31
-xxx: oris r2,r2,39128
-xxx: ori r2,r2,376
-xxx: ldx r30,0,r2
+xxx: addis r30,r30,-544
+xxx: ld r30,-8(r30)
...
-xxx: li r3,16383 # exit_tb
-xxx: rldicr r3,r3,32,31
-xxx: oris r3,r3,39128
-xxx: ori r3,r3,288
+xxx: addis r3,r30,-544
+xxx: addi r3,r3,-96
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 164 +++++++++++++++++++++++++++++--------------------
1 file changed, 99 insertions(+), 65 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index e01d8bc..d4e1efc 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -548,6 +548,78 @@ static inline void tcg_out_shri64(TCGContext *s, TCGReg dst, TCGReg src, int c)
tcg_out_rld(s, RLDICL, dst, src, 64 - c, c);
}
+static void tcg_out_mem_long(TCGContext *s, PowerOpcode opi, PowerOpcode opx,
+ TCGReg rt, TCGReg base, tcg_target_long offset)
+{
+ tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
+ TCGReg rs = TCG_REG_R2;
+
+ assert(rt != TCG_REG_R2 && base != TCG_REG_R2);
+
+ switch (opi) {
+ case LD: case LWA:
+ align = 3;
+ /* FALLTHRU */
+ default:
+ if (rt != TCG_REG_R0) {
+ rs = rt;
+ }
+ break;
+ case STD:
+ align = 3;
+ break;
+ case STB: case STH: case STW:
+ break;
+ }
+
+ /* For unaligned, use the indexed form. */
+ if (offset & align) {
+ do_indexed:
+ tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig);
+ tcg_out32(s, opx | TAB(rt, base, TCG_REG_R2));
+ return;
+ }
+
+ if (base == TCG_REG_R0) {
+ /* For absolute addresses, avoid indexed form. First try turning
+ it into an offset from a known base register, then just fold
+ the low 16 bits. */
+ offset -= (tcg_target_long)s->code_buf;
+ if (offset == (int32_t)offset) {
+ orig = offset;
+ base = TCG_REG_TB;
+ } else {
+ offset = (int16_t)orig;
+ tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig - offset);
+ orig = offset;
+ base = TCG_REG_R2;
+ }
+ } else if (offset != (int32_t)offset) {
+ /* For very large offsets off a real base register, use indexed. */
+ goto do_indexed;
+ }
+
+ l0 = (int16_t)offset;
+ offset = (offset - l0) >> 16;
+ l1 = (int16_t)offset;
+
+ if (l1 < 0 && orig >= 0) {
+ extra = 0x4000;
+ l1 = (int16_t)(offset - 0x4000);
+ }
+ if (l1) {
+ tcg_out32(s, ADDIS | TAI(rs, base, l1));
+ base = rs;
+ }
+ if (extra) {
+ tcg_out32(s, ADDIS | TAI(rs, base, extra));
+ base = rs;
+ }
+ if (opi != ADDI || base != rt || l0 != 0) {
+ tcg_out32(s, opi | TAI(rt, base, l0));
+ }
+}
+
static void tcg_out_movi32(TCGContext *s, TCGReg ret, int32_t arg)
{
if (arg == (int16_t) arg) {
@@ -563,23 +635,37 @@ static void tcg_out_movi32(TCGContext *s, TCGReg ret, int32_t arg)
static void tcg_out_movi(TCGContext *s, TCGType type, TCGReg ret,
tcg_target_long arg)
{
+ tcg_target_long tmp;
+
+ /* Two attempts at 1 or 2 insn sequence for 32-bit constant. */
if (type == TCG_TYPE_I32 || arg == (int32_t)arg) {
tcg_out_movi32(s, ret, arg);
- } else if (arg == (uint32_t)arg && !(arg & 0x8000)) {
+ return;
+ }
+ if (arg == (uint32_t)arg && !(arg & 0x8000)) {
tcg_out32(s, ADDI | TAI(ret, 0, arg));
tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
- } else {
- int32_t high = arg >> 32;
- tcg_out_movi32(s, ret, high);
- if (high) {
- tcg_out_shli64(s, ret, ret, 32);
- }
- if (arg & 0xffff0000) {
- tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
- }
- if (arg & 0xffff) {
- tcg_out32(s, ORI | SAI(ret, ret, arg));
- }
+ return;
+ }
+
+ /* See if we can turn a address constant into a TB offset. */
+ tmp = arg - (uintptr_t)s->code_buf;
+ if (tmp == (int32_t)tmp) {
+ tcg_out_mem_long(s, ADDI, ADD, ret, TCG_REG_TB, tmp);
+ return;
+ }
+
+ /* Full 64-bit constant load. */
+ tmp = arg >> 32;
+ tcg_out_movi32(s, ret, tmp);
+ if (tmp) {
+ tcg_out_shli64(s, ret, ret, 32);
+ }
+ if (arg & 0xffff0000) {
+ tcg_out32(s, ORIS | SAI(ret, ret, arg >> 16));
+ }
+ if (arg & 0xffff) {
+ tcg_out32(s, ORI | SAI(ret, ret, arg));
}
}
@@ -746,58 +832,6 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
#endif
}
-static void tcg_out_mem_long(TCGContext *s, PowerOpcode opi, PowerOpcode opx,
- TCGReg rt, TCGReg base, tcg_target_long offset)
-{
- tcg_target_long orig = offset, l0, l1, extra = 0, align = 0;
- TCGReg rs = TCG_REG_R2;
-
- assert(rt != TCG_REG_R2 && base != TCG_REG_R2);
-
- switch (opi) {
- case LD: case LWA:
- align = 3;
- /* FALLTHRU */
- default:
- if (rt != TCG_REG_R0) {
- rs = rt;
- }
- break;
- case STD:
- align = 3;
- break;
- case STB: case STH: case STW:
- break;
- }
-
- /* For unaligned, or very large offsets, use the indexed form. */
- if (offset & align || offset != (int32_t)offset) {
- tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_R2, orig);
- tcg_out32(s, opx | TAB(rt, base, TCG_REG_R2));
- return;
- }
-
- l0 = (int16_t)offset;
- offset = (offset - l0) >> 16;
- l1 = (int16_t)offset;
-
- if (l1 < 0 && orig >= 0) {
- extra = 0x4000;
- l1 = (int16_t)(offset - 0x4000);
- }
- if (l1) {
- tcg_out32(s, ADDIS | TAI(rs, base, l1));
- base = rs;
- }
- if (extra) {
- tcg_out32(s, ADDIS | TAI(rs, base, extra));
- base = rs;
- }
- if (opi != ADDI || base != rt || l0 != 0) {
- tcg_out32(s, opi | TAI(rt, base, l0));
- }
-}
-
#if defined (CONFIG_SOFTMMU)
#include "exec/softmmu_defs.h"
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (11 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read Richard Henderson
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Use the helper macros like TAI. Fix formatting.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 38 ++++++++++++++++----------------------
1 file changed, 16 insertions(+), 22 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index d4e1efc..90d033c 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -1118,21 +1118,18 @@ static void tcg_target_qemu_prologue (TCGContext *s)
#endif
/* Prologue */
- tcg_out32 (s, MFSPR | RT (0) | LR);
- tcg_out32 (s, STDU | RS (1) | RA (1) | (-frame_size & 0xffff));
- for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i)
- tcg_out32 (s, (STD
- | RS (tcg_target_callee_save_regs[i])
- | RA (1)
- | (i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)
- )
- );
- tcg_out32 (s, STD | RS (0) | RA (1) | (frame_size + 16));
+ tcg_out32(s, MFSPR | RT(TCG_REG_R0) | LR);
+ tcg_out32(s, STDU | SAI(TCG_REG_R1, TCG_REG_R1, -frame_size));
+ for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i) {
+ tcg_out32(s, (STD | SAI(tcg_target_callee_save_regs[i], TCG_REG_R1,
+ i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)));
+ }
+ tcg_out32(s, STD | RS(TCG_REG_R0) | RA(TCG_REG_R1) | (frame_size + 16));
#ifdef CONFIG_USE_GUEST_BASE
if (GUEST_BASE) {
- tcg_out_movi (s, TCG_TYPE_I64, TCG_GUEST_BASE_REG, GUEST_BASE);
- tcg_regset_set_reg (s->reserved_regs, TCG_GUEST_BASE_REG);
+ tcg_out_movi(s, TCG_TYPE_I64, TCG_GUEST_BASE_REG, GUEST_BASE);
+ tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
}
#endif
@@ -1144,16 +1141,13 @@ static void tcg_target_qemu_prologue (TCGContext *s)
/* Epilogue */
tb_ret_addr = s->code_ptr;
- for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i)
- tcg_out32 (s, (LD
- | RT (tcg_target_callee_save_regs[i])
- | RA (1)
- | (i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)
- )
- );
- tcg_out32(s, LD | TAI(0, 1, frame_size + 16));
- tcg_out32(s, MTSPR | RS(0) | LR);
- tcg_out32(s, ADDI | TAI(1, 1, frame_size));
+ tcg_out32(s, LD | TAI(TCG_REG_R0, TCG_REG_R1, frame_size + 16));
+ for (i = 0; i < ARRAY_SIZE (tcg_target_callee_save_regs); ++i) {
+ tcg_out32(s, (LD | TAI(tcg_target_callee_save_regs[i], TCG_REG_R1,
+ i * 8 + 48 + TCG_STATIC_CALL_ARGS_SIZE)));
+ }
+ tcg_out32(s, MTSPR | RS(TCG_REG_R0) | LR);
+ tcg_out32(s, ADDI | TAI(TCG_REG_R1, TCG_REG_R1, frame_size));
tcg_out32(s, BCLR | BO_ALWAYS);
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (12 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
2013-08-17 6:23 ` [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Less conditional compilation. Merge an add insn with the indexed
memory load insn. Load the tlb addend earlier. Avoid the address
update memory form.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc64/tcg-target.c | 202 +++++++++++++++++++++++--------------------------
1 file changed, 95 insertions(+), 107 deletions(-)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 90d033c..4b23597 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -31,13 +31,11 @@
static uint8_t *tb_ret_addr;
-#define FAST_PATH
-
#if TARGET_LONG_BITS == 32
-#define LD_ADDR LWZU
+#define LD_ADDR LWZ
#define CMP_L 0
#else
-#define LD_ADDR LDU
+#define LD_ADDR LD
#define CMP_L (1<<21)
#endif
@@ -854,39 +852,64 @@ static const void * const qemu_st_helpers[4] = {
helper_stq_mmu,
};
-static void tcg_out_tlb_read(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
- TCGReg addr_reg, int s_bits, int offset)
+/* Perform the TLB load and compare. Places the result of the comparison
+ in CR7, loads the addend of the TLB into R3, and returns the register
+ containing the guest address (zero-extended into R4). Clobbers R0 and R2. */
+
+static TCGReg tcg_out_tlb_read(TCGContext *s, int s_bits, TCGReg addr_reg,
+ int mem_index, bool is_read)
{
-#if TARGET_LONG_BITS == 32
- tcg_out_ext32u(s, addr_reg, addr_reg);
-
- tcg_out_rlw(s, RLWINM, r0, addr_reg,
- 32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),
- 32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS),
- 31 - CPU_TLB_ENTRY_BITS);
- tcg_out32(s, ADD | TAB(r0, r0, TCG_AREG0));
- tcg_out32(s, LWZU | TAI(r1, r0, offset));
- tcg_out_rlw(s, RLWINM, r2, addr_reg, 0,
- (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
-#else
- tcg_out_rld (s, RLDICL, r0, addr_reg,
- 64 - TARGET_PAGE_BITS,
- 64 - CPU_TLB_BITS);
- tcg_out_shli64(s, r0, r0, CPU_TLB_ENTRY_BITS);
+ size_t offset
+ = (is_read
+ ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
+ : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
+
+ /* Extract the page index, shifted into place for tlb index. */
+ if (TARGET_LONG_BITS == 32) {
+ /* Zero-extend the address into a place helpful for further use. */
+ tcg_out_ext32u(s, TCG_REG_R4, addr_reg);
+ addr_reg = TCG_REG_R4;
+
+ tcg_out_rlw(s, RLWINM, TCG_REG_R3, addr_reg,
+ 32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS),
+ 32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS),
+ 31 - CPU_TLB_ENTRY_BITS);
+ } else {
+ tcg_out_rld (s, RLDICL, TCG_REG_R3, addr_reg,
+ 64 - TARGET_PAGE_BITS,
+ 64 - CPU_TLB_BITS);
+ tcg_out_shli64(s, TCG_REG_R3, TCG_REG_R3, CPU_TLB_ENTRY_BITS);
+ }
- tcg_out32(s, ADD | TAB(r0, r0, TCG_AREG0));
- tcg_out32(s, LD_ADDR | TAI(r1, r0, offset));
+ /* Load the tlb comparator. */
+ tcg_out32(s, ADD | TAB(TCG_REG_R3, TCG_REG_R3, TCG_AREG0));
+ tcg_out32(s, LD_ADDR | TAI(TCG_REG_R2, TCG_REG_R3, offset));
- if (!s_bits) {
- tcg_out_rld (s, RLDICR, r2, addr_reg, 0, 63 - TARGET_PAGE_BITS);
- }
- else {
- tcg_out_rld (s, RLDICL, r2, addr_reg,
- 64 - TARGET_PAGE_BITS,
- TARGET_PAGE_BITS - s_bits);
- tcg_out_rld (s, RLDICL, r2, r2, TARGET_PAGE_BITS, 0);
+ /* Load the TLB addend for use on the fast path. Do this asap
+ to minimize any load use delay. */
+ offset = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
+ tcg_out32(s, LD | TAI(TCG_REG_R3, TCG_REG_R3, offset));
+
+ /* Clear the non-page, non-alignment bits from the address. */
+ if (TARGET_LONG_BITS == 32) {
+ tcg_out_rlw(s, RLWINM, TCG_REG_R0, addr_reg, 0,
+ (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
+ } else {
+ if (!s_bits) {
+ tcg_out_rld (s, RLDICR, TCG_REG_R0, addr_reg,
+ 0, 63 - TARGET_PAGE_BITS);
+ } else {
+ tcg_out_rld (s, RLDICL, TCG_REG_R0, addr_reg,
+ 64 - TARGET_PAGE_BITS,
+ TARGET_PAGE_BITS - s_bits);
+ tcg_out_rld (s, RLDICL, TCG_REG_R0, TCG_REG_R0,
+ TARGET_PAGE_BITS, 0);
+ }
}
-#endif
+
+ tcg_out32(s, CMP | BF(7) | RA(TCG_REG_R0) | RB(TCG_REG_R2) | CMP_L);
+
+ return addr_reg;
}
#endif
@@ -918,7 +941,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
PowerOpcode insn;
int s_bits;
#ifdef CONFIG_SOFTMMU
- TCGReg r2, ir;
+ TCGReg ir;
int mem_index;
void *label1_ptr, *label2_ptr;
#endif
@@ -930,26 +953,16 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
#ifdef CONFIG_SOFTMMU
mem_index = *args;
- r0 = 3;
- r1 = 4;
- r2 = 0;
- rbase = 0;
-
- tcg_out_tlb_read (s, r0, r1, r2, addr_reg, s_bits,
- offsetof (CPUArchState, tlb_table[mem_index][0].addr_read));
-
- tcg_out32 (s, CMP | BF (7) | RA (r2) | RB (r1) | CMP_L);
+ r0 = tcg_out_tlb_read(s, s_bits, addr_reg, mem_index, true);
label1_ptr = s->code_ptr;
-#ifdef FAST_PATH
- tcg_out32 (s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-#endif
+ tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
/* slow path */
- ir = 3;
- tcg_out_mov (s, TCG_TYPE_I64, ir++, TCG_AREG0);
- tcg_out_mov (s, TCG_TYPE_I64, ir++, addr_reg);
- tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
+ ir = TCG_REG_R3;
+ tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
+ tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
+ tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
@@ -959,29 +972,23 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
} else if (data_reg != 3) {
tcg_out_mov(s, TCG_TYPE_I64, data_reg, 3);
}
+
label2_ptr = s->code_ptr;
- tcg_out32 (s, B);
+ tcg_out32(s, B);
/* label1: fast path */
-#ifdef FAST_PATH
- reloc_pc14 (label1_ptr, (tcg_target_long) s->code_ptr);
-#endif
-
- /* r0 now contains &env->tlb_table[mem_index][index].addr_read */
- tcg_out32(s, LD | TAI(r0, r0,
- offsetof(CPUTLBEntry, addend)
- - offsetof(CPUTLBEntry, addr_read)));
- /* r0 = env->tlb_table[mem_index][index].addend */
- tcg_out32(s, ADD | TAB(r0, r0, addr_reg));
- /* r0 = env->tlb_table[mem_index][index].addend + addr */
+ reloc_pc14(label1_ptr, (tcg_target_long)s->code_ptr);
+ rbase = TCG_REG_R3;
+ r1 = TCG_REG_R0;
#else /* !CONFIG_SOFTMMU */
-#if TARGET_LONG_BITS == 32
- tcg_out_ext32u(s, addr_reg, addr_reg);
-#endif
- r0 = addr_reg;
- r1 = 3;
rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
+ r0 = addr_reg;
+ r1 = TCG_REG_R0;
+ if (TARGET_LONG_BITS == 32) {
+ r0 = TCG_REG_R2;
+ tcg_out_ext32u(s, r0, addr_reg);
+ }
#endif
insn = qemu_ldx_opc[opc];
@@ -1000,7 +1007,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
}
#ifdef CONFIG_SOFTMMU
- reloc_pc24 (label2_ptr, (tcg_target_long) s->code_ptr);
+ reloc_pc24(label2_ptr, (tcg_target_long)s->code_ptr);
#endif
}
@@ -1009,7 +1016,7 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
TCGReg addr_reg, r0, r1, rbase, data_reg;
PowerOpcode insn;
#ifdef CONFIG_SOFTMMU
- TCGReg r2, ir;
+ TCGReg ir;
int mem_index;
void *label1_ptr, *label2_ptr;
#endif
@@ -1020,63 +1027,44 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
#ifdef CONFIG_SOFTMMU
mem_index = *args;
- r0 = 3;
- r1 = 4;
- r2 = 0;
- rbase = 0;
-
- tcg_out_tlb_read (s, r0, r1, r2, addr_reg, opc,
- offsetof (CPUArchState, tlb_table[mem_index][0].addr_write));
-
- tcg_out32 (s, CMP | BF (7) | RA (r2) | RB (r1) | CMP_L);
+ r0 = tcg_out_tlb_read(s, opc, addr_reg, mem_index, false);
label1_ptr = s->code_ptr;
-#ifdef FAST_PATH
- tcg_out32 (s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-#endif
+ tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
/* slow path */
- ir = 3;
- tcg_out_mov (s, TCG_TYPE_I64, ir++, TCG_AREG0);
- tcg_out_mov (s, TCG_TYPE_I64, ir++, addr_reg);
- tcg_out_rld (s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
- tcg_out_movi (s, TCG_TYPE_I64, ir++, mem_index);
+ ir = TCG_REG_R3;
+ tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
+ tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
+ tcg_out_rld(s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
+ tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
label2_ptr = s->code_ptr;
- tcg_out32 (s, B);
+ tcg_out32(s, B);
/* label1: fast path */
-#ifdef FAST_PATH
- reloc_pc14 (label1_ptr, (tcg_target_long) s->code_ptr);
-#endif
-
- tcg_out32 (s, (LD
- | RT (r0)
- | RA (r0)
- | (offsetof (CPUTLBEntry, addend)
- - offsetof (CPUTLBEntry, addr_write))
- ));
- /* r0 = env->tlb_table[mem_index][index].addend */
- tcg_out32(s, ADD | TAB(r0, r0, addr_reg));
- /* r0 = env->tlb_table[mem_index][index].addend + addr */
+ reloc_pc14(label1_ptr, (tcg_target_long) s->code_ptr);
+ rbase = TCG_REG_R3;
+ r1 = TCG_REG_R2;
#else /* !CONFIG_SOFTMMU */
-#if TARGET_LONG_BITS == 32
- tcg_out_ext32u(s, addr_reg, addr_reg);
-#endif
- r1 = 3;
- r0 = addr_reg;
rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
+ r0 = addr_reg;
+ r1 = TCG_REG_R3;
+ if (TARGET_LONG_BITS == 32) {
+ r0 = TCG_REG_R2;
+ tcg_out_ext32u(s, r0, addr_reg);
+ }
#endif
insn = qemu_stx_opc[opc];
if (!HAVE_ISA_2_06 && insn == STDBRX) {
tcg_out32(s, STWBRX | SAB(data_reg, rbase, r0));
tcg_out32(s, ADDI | TAI(r1, r0, 4));
- tcg_out_shri64(s, 0, data_reg, 32);
- tcg_out32(s, STWBRX | SAB(0, rbase, r1));
+ tcg_out_shri64(s, TCG_REG_R0, data_reg, 32);
+ tcg_out32(s, STWBRX | SAB(TCG_REG_R0, rbase, r1));
} else {
tcg_out32(s, insn | SAB(data_reg, rbase, r0));
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (13 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read Richard Henderson
@ 2013-08-05 18:28 ` Richard Henderson
2013-08-17 6:23 ` [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-05 18:28 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), Richard Henderson
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
configure | 2 +-
include/exec/exec-all.h | 4 +-
tcg/ppc64/tcg-target.c | 219 +++++++++++++++++++++++++++++++-----------------
3 files changed, 146 insertions(+), 79 deletions(-)
diff --git a/configure b/configure
index 18fa608..5b9a66c 100755
--- a/configure
+++ b/configure
@@ -3650,7 +3650,7 @@ echo "libs_softmmu=$libs_softmmu" >> $config_host_mak
echo "ARCH=$ARCH" >> $config_host_mak
case "$cpu" in
- arm|i386|x86_64|ppc|aarch64)
+ aarch64 | arm | i386 | x86_64 | ppc*)
# The TCG interpreter currently does not support ld/st optimization.
if test "$tcg_interpreter" = "no" ; then
echo "CONFIG_QEMU_LDST_OPTIMIZATION=y" >> $config_host_mak
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 26c3553..91b189b 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -326,11 +326,11 @@ extern uintptr_t tci_tb_ptr;
(5) post-process (e.g. stack adjust)
(6) jump to corresponding code of the next of fast path
*/
-# if defined(__i386__) || defined(__x86_64__)
+# if defined(__i386__) || defined(__x86_64__) || defined(_ARCH_PPC64)
# define GETRA() ((uintptr_t)__builtin_return_address(0))
/* The return address argument for ldst is passed directly. */
# define GETPC_LDST() (abort(), 0)
-# elif defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
+# elif defined(_ARCH_PPC)
# define GETRA() ((uintptr_t)__builtin_return_address(0))
# define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
# elif defined(__arm__)
diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 4b23597..7ecc032 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -830,26 +830,50 @@ static void tcg_out_call(TCGContext *s, tcg_target_long arg,
#endif
}
+static const PowerOpcode qemu_ldx_opc[8] = {
+#ifdef TARGET_WORDS_BIGENDIAN
+ LBZX, LHZX, LWZX, LDX,
+ 0, LHAX, LWAX, LDX
+#else
+ LBZX, LHBRX, LWBRX, LDBRX,
+ 0, 0, 0, LDBRX,
+#endif
+};
+
+static const PowerOpcode qemu_stx_opc[4] = {
+#ifdef TARGET_WORDS_BIGENDIAN
+ STBX, STHX, STWX, STDX
+#else
+ STBX, STHBRX, STWBRX, STDBRX,
+#endif
+};
+
+static const PowerOpcode qemu_exts_opc[4] = {
+ EXTSB, EXTSH, EXTSW, 0
+};
+
#if defined (CONFIG_SOFTMMU)
#include "exec/softmmu_defs.h"
/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
- int mmu_idx) */
+ * int mmu_idx, uintptr_t ra)
+ */
static const void * const qemu_ld_helpers[4] = {
- helper_ldb_mmu,
- helper_ldw_mmu,
- helper_ldl_mmu,
- helper_ldq_mmu,
+ helper_ret_ldb_mmu,
+ helper_ret_ldw_mmu,
+ helper_ret_ldl_mmu,
+ helper_ret_ldq_mmu,
};
/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
- uintxx_t val, int mmu_idx) */
+ * uintxx_t val, int mmu_idx, uintptr_t ra)
+ */
static const void * const qemu_st_helpers[4] = {
- helper_stb_mmu,
- helper_stw_mmu,
- helper_stl_mmu,
- helper_stq_mmu,
+ helper_ret_stb_mmu,
+ helper_ret_stw_mmu,
+ helper_ret_stl_mmu,
+ helper_ret_stq_mmu,
};
/* Perform the TLB load and compare. Places the result of the comparison
@@ -911,29 +935,108 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, int s_bits, TCGReg addr_reg,
return addr_reg;
}
-#endif
-static const PowerOpcode qemu_ldx_opc[8] = {
-#ifdef TARGET_WORDS_BIGENDIAN
- LBZX, LHZX, LWZX, LDX,
- 0, LHAX, LWAX, LDX
-#else
- LBZX, LHBRX, LWBRX, LDBRX,
- 0, 0, 0, LDBRX,
-#endif
-};
+/* Record the context of a call to the out of line helper code for the slow
+ path for a load or store, so that we can later generate the correct
+ helper code. */
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld, int opc,
+ int data_reg, int addr_reg, int mem_index,
+ uint8_t *raddr, uint8_t *label_ptr)
+{
+ int idx;
+ TCGLabelQemuLdst *label;
-static const PowerOpcode qemu_stx_opc[4] = {
-#ifdef TARGET_WORDS_BIGENDIAN
- STBX, STHX, STWX, STDX
-#else
- STBX, STHBRX, STWBRX, STDBRX,
-#endif
-};
+ if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST) {
+ tcg_abort();
+ }
-static const PowerOpcode qemu_exts_opc[4] = {
- EXTSB, EXTSH, EXTSW, 0
-};
+ idx = s->nb_qemu_ldst_labels++;
+ label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
+ label->is_ld = is_ld;
+ label->opc = opc;
+ label->datalo_reg = data_reg;
+ label->addrlo_reg = addr_reg;
+ label->mem_index = mem_index;
+ label->raddr = raddr;
+ label->label_ptr[0] = label_ptr;
+}
+
+/* See the GETPC definition in include/exec/exec-all.h. */
+static inline uintptr_t do_getpc(uint8_t *raddr)
+{
+ return (uintptr_t)raddr - 1;
+}
+
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+ int opc = lb->opc;
+ int s_bits = opc & 3;
+ PowerOpcode insn;
+
+ reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
+
+ tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R3, TCG_AREG0);
+
+ /* If the address needed to be zero-extended, we'll have already
+ placed it in R4. The only remaining case is 64-bit guest. */
+ if (lb->addrlo_reg != TCG_REG_R4) {
+ tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R4, lb->addrlo_reg);
+ }
+
+ tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R5, lb->mem_index);
+ tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R6, do_getpc(lb->raddr));
+
+ tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
+
+ if (opc & 4) {
+ insn = qemu_exts_opc[s_bits];
+ tcg_out32(s, insn | RA(lb->datalo_reg) | RS(TCG_REG_R3));
+ } else {
+ tcg_out_mov(s, TCG_TYPE_I64, lb->datalo_reg, TCG_REG_R3);
+ }
+
+ tcg_out_b(s, 0, (uintptr_t)lb->raddr);
+}
+
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
+{
+ int opc = lb->opc;
+
+ reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
+
+ tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R3, TCG_AREG0);
+
+ /* If the address needed to be zero-extended, we'll have already
+ placed it in R4. The only remaining case is 64-bit guest. */
+ if (lb->addrlo_reg != TCG_REG_R4) {
+ tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R4, lb->addrlo_reg);
+ }
+
+ tcg_out_rld(s, RLDICL, TCG_REG_R5, lb->datalo_reg,
+ 0, 64 - (1 << (3 + opc)));
+ tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R6, lb->mem_index);
+ tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_R7, do_getpc(lb->raddr));
+
+ tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
+
+ tcg_out_b(s, 0, (uintptr_t)lb->raddr);
+}
+
+void tcg_out_tb_finalize(TCGContext *s)
+{
+ int i, n = s->nb_qemu_ldst_labels;
+
+ /* qemu_ld/st slow paths */
+ for (i = 0; i < n; i++) {
+ TCGLabelQemuLdst *label = &s->qemu_ldst_labels[i];
+ if (label->is_ld) {
+ tcg_out_qemu_ld_slow_path(s, label);
+ } else {
+ tcg_out_qemu_st_slow_path(s, label);
+ }
+ }
+}
+#endif /* SOFTMMU */
static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
{
@@ -941,9 +1044,8 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
PowerOpcode insn;
int s_bits;
#ifdef CONFIG_SOFTMMU
- TCGReg ir;
int mem_index;
- void *label1_ptr, *label2_ptr;
+ void *label_ptr;
#endif
data_reg = *args++;
@@ -955,29 +1057,8 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
r0 = tcg_out_tlb_read(s, s_bits, addr_reg, mem_index, true);
- label1_ptr = s->code_ptr;
- tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-
- /* slow path */
- ir = TCG_REG_R3;
- tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
- tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
- tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
-
- tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1, LK);
-
- if (opc & 4) {
- insn = qemu_exts_opc[s_bits];
- tcg_out32(s, insn | RA(data_reg) | RS(3));
- } else if (data_reg != 3) {
- tcg_out_mov(s, TCG_TYPE_I64, data_reg, 3);
- }
-
- label2_ptr = s->code_ptr;
- tcg_out32(s, B);
-
- /* label1: fast path */
- reloc_pc14(label1_ptr, (tcg_target_long)s->code_ptr);
+ label_ptr = s->code_ptr;
+ tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_FALSE);
rbase = TCG_REG_R3;
r1 = TCG_REG_R0;
@@ -1007,7 +1088,8 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
}
#ifdef CONFIG_SOFTMMU
- reloc_pc24(label2_ptr, (tcg_target_long)s->code_ptr);
+ add_qemu_ldst_label(s, true, opc, data_reg, r0, mem_index,
+ s->code_ptr, label_ptr);
#endif
}
@@ -1016,9 +1098,8 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
TCGReg addr_reg, r0, r1, rbase, data_reg;
PowerOpcode insn;
#ifdef CONFIG_SOFTMMU
- TCGReg ir;
int mem_index;
- void *label1_ptr, *label2_ptr;
+ void *label_ptr;
#endif
data_reg = *args++;
@@ -1029,23 +1110,8 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
r0 = tcg_out_tlb_read(s, opc, addr_reg, mem_index, false);
- label1_ptr = s->code_ptr;
- tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_TRUE);
-
- /* slow path */
- ir = TCG_REG_R3;
- tcg_out_mov(s, TCG_TYPE_I64, ir++, TCG_AREG0);
- tcg_out_mov(s, TCG_TYPE_I64, ir++, addr_reg);
- tcg_out_rld(s, RLDICL, ir++, data_reg, 0, 64 - (1 << (3 + opc)));
- tcg_out_movi(s, TCG_TYPE_I64, ir++, mem_index);
-
- tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1, LK);
-
- label2_ptr = s->code_ptr;
- tcg_out32(s, B);
-
- /* label1: fast path */
- reloc_pc14(label1_ptr, (tcg_target_long) s->code_ptr);
+ label_ptr = s->code_ptr;
+ tcg_out32(s, BC | BI (7, CR_EQ) | BO_COND_FALSE);
rbase = TCG_REG_R3;
r1 = TCG_REG_R2;
@@ -1070,7 +1136,8 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
}
#ifdef CONFIG_SOFTMMU
- reloc_pc24 (label2_ptr, (tcg_target_long) s->code_ptr);
+ add_qemu_ldst_label(s, false, opc, data_reg, r0, mem_index,
+ s->code_ptr, label_ptr);
#endif
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
` (14 preceding siblings ...)
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
@ 2013-08-17 6:23 ` Richard Henderson
15 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2013-08-17 6:23 UTC (permalink / raw)
To: qemu-devel
Ping.
r~
On 08/05/2013 11:28 AM, Richard Henderson wrote:
> About half of these patches are focused on reducing the number of
> full 64-bit constants that need to be generated for addresses:
>
> E.g. patch 5, looking through the function descriptor. If the
> program is built --disable-pie, the elements of the function
> descriptors are all 32-bit constants.
>
> E.g. the end result of indirect jump threading + TCG_REG_TB.
> Before, we reserve 6 insn slots to generate the full 64-bit address.
> After, we use 2 insns -- addis + ld -- to load the full 64-bit
> address from the indirection slot.
>
> The second patch could probably be reverted. I'd planned to be
> able to use the same conditional call + tail call scheme as ARM,
> but I'd forgotten the need for a conditional store to go along
> with that. OTOH, it might still turn out to be useful somewhere.
>
>
> r~
>
>
> Richard Henderson (15):
> tcg-ppc64: Avoid code for nop move
> tcg-ppc64: Add an LK argument to tcg_out_call
> tcg-ppc64: Use the branch absolute instruction when possible
> tcg-ppc64: Don't load the static chain from TCG
> tcg-ppc64: Look through the function descriptor when profitable
> tcg-ppc64: Move AREG0 to r31
> tcg-ppc64: Tidy register allocation order
> tcg-ppc64: Create PowerOpcode
> tcg-ppc64: Handle long offsets better
> tcg-ppc64: Use indirect jump threading
> tcg-ppc64: Setup TCG_REG_TB
> tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long
> tcg-ppc64: Tidy tcg_target_qemu_prologue
> tcg-ppc64: Streamline tcg_out_tlb_read
> tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION
>
> configure | 2 +-
> include/exec/exec-all.h | 7 +-
> tcg/ppc64/tcg-target.c | 1079 ++++++++++++++++++++++++++---------------------
> tcg/ppc64/tcg-target.h | 2 +-
> 4 files changed, 598 insertions(+), 492 deletions(-)
>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-08-17 6:23 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05 18:28 [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 01/15] tcg-ppc64: Avoid code for nop move Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 02/15] tcg-ppc64: Add an LK argument to tcg_out_call Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 03/15] tcg-ppc64: Use the branch absolute instruction when possible Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 04/15] tcg-ppc64: Don't load the static chain from TCG Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 05/15] tcg-ppc64: Look through the function descriptor when profitable Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 06/15] tcg-ppc64: Move AREG0 to r31 Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 07/15] tcg-ppc64: Tidy register allocation order Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 08/15] tcg-ppc64: Create PowerOpcode Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 09/15] tcg-ppc64: Handle long offsets better Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 10/15] tcg-ppc64: Use indirect jump threading Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 11/15] tcg-ppc64: Setup TCG_REG_TB Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 12/15] tcg-ppc64: Use TCG_REG_TB in tcg_out_movi and tcg_out_mem_long Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 13/15] tcg-ppc64: Tidy tcg_target_qemu_prologue Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 14/15] tcg-ppc64: Streamline tcg_out_tlb_read Richard Henderson
2013-08-05 18:28 ` [Qemu-devel] [PATCH for-next 15/15] tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION Richard Henderson
2013-08-17 6:23 ` [Qemu-devel] [PATCH for-next 00/15] Collection of improvements for tcg/ppc64 Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).