qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/15] tcg-sparc improvments
@ 2012-03-25 22:27 Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 01/15] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

32-bit sparc hasn't worked in quite a while.  Missing opcodes,
incorrect opcodes, unconditional use of ASI_PRIMARY_LITTLE.

This patch set begins by dropping support for pre-v9 sparc.
This lets us clean things up quite a bit, using 64-bit load
and store operations.

I was still having problems with %g6 being clobbered in glibc.  
Patches 7-10 drop the use of global registers for the sparc
port entirely.  Given the hoops being used to protect areg0
around calls within the tcg generated code, deferring to a
%g7-relative tls access in the helpers is approximately as
efficient.  As targets are converted to CONFIG_TCG_PASS_AREG0
even this will improve as direct register access is available.



r~


Richard Henderson (15):
  tcg-sparc: Hack in qemu_ld/st64 for 32-bit.
  tcg-sparc: Fix ADDX opcode.
  tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
  tcg-sparc: Fix qemu_ld/st to handle 32-bit host.
  tcg-sparc: Simplify qemu_ld/st direct memory paths.
  tcg-sparc: Support GUEST_BASE.
  tcg-sparc: Steamline qemu_ld/st more.
  Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  tcg-sparc: Do not use a global register for AREG0.
  tcg-sparc: Change AREG0 in generated code to %i0.
  tcg-sparc: Clean up cruft stemming from attempts to use global
    registers.
  tcg-sparc: Mask shift immediates to avoid illegal insns.
  tcg-sparc: Use defines for temporaries.
  tcg-sparc: Add %g/%o registers to alloc_order
  tcg-sparc: Fix and enable direct TB chaining.

 configure              |   53 +---
 dyngen-exec.h          |   27 +-
 exec-all.h             |    9 +-
 exec.c                 |   16 +-
 tcg/sparc/tcg-target.c |  951 +++++++++++++++++++++++-------------------------
 tcg/sparc/tcg-target.h |   34 +-
 user-exec.c            |   17 +-
 7 files changed, 520 insertions(+), 587 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 01/15] tcg-sparc: Hack in qemu_ld/st64 for 32-bit.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 02/15] tcg-sparc: Fix ADDX opcode Richard Henderson
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

Not actually implemented, but at least we avoid the tcg assert
at startup.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 247a278..0e71618 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1586,6 +1586,9 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_brcond_i64, { "r", "rJ" } },
     { INDEX_op_setcond_i64, { "r", "r", "rJ" } },
+#else
+    { INDEX_op_qemu_ld64, { "L", "L", "L" } },
+    { INDEX_op_qemu_st64, { "L", "L", "L" } },
 #endif
     { -1 },
 };
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 02/15] tcg-sparc: Fix ADDX opcode.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 01/15] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 03/15] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 0e71618..358a70c 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -242,7 +242,7 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define ARITH_XOR  (INSN_OP(2) | INSN_OP3(0x03))
 #define ARITH_SUB  (INSN_OP(2) | INSN_OP3(0x04))
 #define ARITH_SUBCC (INSN_OP(2) | INSN_OP3(0x14))
-#define ARITH_ADDX (INSN_OP(2) | INSN_OP3(0x10))
+#define ARITH_ADDX (INSN_OP(2) | INSN_OP3(0x08))
 #define ARITH_SUBX (INSN_OP(2) | INSN_OP3(0x0c))
 #define ARITH_UMUL (INSN_OP(2) | INSN_OP3(0x0a))
 #define ARITH_UDIV (INSN_OP(2) | INSN_OP3(0x0e))
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 03/15] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 01/15] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 02/15] tcg-sparc: Fix ADDX opcode Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 04/15] tcg-sparc: Fix qemu_ld/st to handle 32-bit host Richard Henderson
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

Current code doesn't actually work in 32-bit mode at all.  Since
no one really noticed, drop the complication of v7 and v8 cpus.
Eliminate the --sparc_cpu configure option and standardize macro
testing on TCG_TARGET_REG_BITS.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure              |   41 ++++-------------------------------------
 dyngen-exec.h          |    4 +---
 tcg/sparc/tcg-target.c |   16 ++++------------
 tcg/sparc/tcg-target.h |    7 ++++---
 4 files changed, 13 insertions(+), 55 deletions(-)

diff --git a/configure b/configure
index 80ca430..7741ba9 100755
--- a/configure
+++ b/configure
@@ -86,7 +86,6 @@ source_path=`dirname "$0"`
 cpu=""
 interp_prefix="/usr/gnemul/qemu-%M"
 static="no"
-sparc_cpu=""
 cross_prefix=""
 audio_drv_list=""
 audio_card_list="ac97 es1370 sb16 hda"
@@ -216,21 +215,6 @@ for opt do
   ;;
   --disable-debug-info) debug_info="no"
   ;;
-  --sparc_cpu=*)
-    sparc_cpu="$optarg"
-    case $sparc_cpu in
-    v7|v8|v8plus|v8plusa)
-      cpu="sparc"
-    ;;
-    v9)
-      cpu="sparc64"
-    ;;
-    *)
-      echo "undefined SPARC architecture. Exiting";
-      exit 1
-    ;;
-    esac
-  ;;
   esac
 done
 # OS specific
@@ -284,8 +268,6 @@ elif check_define __i386__ ; then
 elif check_define __x86_64__ ; then
   cpu="x86_64"
 elif check_define __sparc__ ; then
-  # We can't check for 64 bit (when gcc is biarch) or V8PLUSA
-  # They must be specified using --sparc_cpu
   if check_define __arch64__ ; then
     cpu="sparc64"
   else
@@ -749,8 +731,6 @@ for opt do
   ;;
   --enable-uname-release=*) uname_release="$optarg"
   ;;
-  --sparc_cpu=*)
-  ;;
   --enable-werror) werror="yes"
   ;;
   --disable-werror) werror="no"
@@ -830,32 +810,19 @@ for opt do
   esac
 done
 
-#
-# If cpu ~= sparc and  sparc_cpu hasn't been defined, plug in the right
-# QEMU_CFLAGS/LDFLAGS (assume sparc_v8plus for 32-bit and sparc_v9 for 64-bit)
-#
 host_guest_base="no"
 case "$cpu" in
-    sparc) case $sparc_cpu in
-           v7|v8)
-             QEMU_CFLAGS="-mcpu=${sparc_cpu} -D__sparc_${sparc_cpu}__ $QEMU_CFLAGS"
-           ;;
-           v8plus|v8plusa)
-             QEMU_CFLAGS="-mcpu=ultrasparc -D__sparc_${sparc_cpu}__ $QEMU_CFLAGS"
-           ;;
-           *) # sparc_cpu not defined in the command line
-             QEMU_CFLAGS="-mcpu=ultrasparc -D__sparc_v8plus__ $QEMU_CFLAGS"
-           esac
+    sparc)
            LDFLAGS="-m32 $LDFLAGS"
-           QEMU_CFLAGS="-m32 -ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
+           QEMU_CFLAGS="-m32 -mcpu=ultrasparc $QEMU_CFLAGS"
+           QEMU_CFLAGS="-ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
            if test "$solaris" = "no" ; then
              QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
-             helper_cflags="-ffixed-i0"
            fi
            ;;
     sparc64)
-           QEMU_CFLAGS="-m64 -mcpu=ultrasparc -D__sparc_v9__ $QEMU_CFLAGS"
            LDFLAGS="-m64 $LDFLAGS"
+           QEMU_CFLAGS="-m64 -mcpu=ultrasparc $QEMU_CFLAGS"
            QEMU_CFLAGS="-ffixed-g5 -ffixed-g6 -ffixed-g7 $QEMU_CFLAGS"
            if test "$solaris" != "no" ; then
              QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
diff --git a/dyngen-exec.h b/dyngen-exec.h
index 083e20b..cfeef99 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -39,13 +39,11 @@
 #elif defined(__sparc__)
 #ifdef CONFIG_SOLARIS
 #define AREG0 "g2"
-#else
-#ifdef __sparc_v9__
+#elif HOST_LONG_BITS == 64
 #define AREG0 "g5"
 #else
 #define AREG0 "g6"
 #endif
-#endif
 #elif defined(__s390__)
 #define AREG0 "r10"
 #elif defined(__alpha__)
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 358a70c..257d20a 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -627,18 +627,10 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
 
     default:
         tcg_out_cmp(s, c1, c2, c2const);
-#if defined(__sparc_v9__) || defined(__sparc_v8plus__)
         tcg_out_movi_imm13(s, ret, 0);
-        tcg_out32 (s, ARITH_MOVCC | INSN_RD(ret)
-                   | INSN_RS1(tcg_cond_to_bcond[cond])
-                   | MOVCC_ICC | INSN_IMM11(1));
-#else
-        t = gen_new_label();
-        tcg_out_branch_i32(s, INSN_COND(tcg_cond_to_bcond[cond], 1), t);
-        tcg_out_movi_imm13(s, ret, 1);
-        tcg_out_movi_imm13(s, ret, 0);
-        tcg_out_label(s, t, s->code_ptr);
-#endif
+        tcg_out32(s, ARITH_MOVCC | INSN_RD(ret)
+                  | INSN_RS1(tcg_cond_to_bcond[cond])
+                  | MOVCC_ICC | INSN_IMM11(1));
         return;
     }
 
@@ -768,7 +760,7 @@ static const void * const qemu_st_helpers[4] = {
 #endif
 #endif
 
-#ifdef __arch64__
+#if TCG_TARGET_REG_BITS == 64
 #define HOST_LD_OP LDX
 #define HOST_ST_OP STX
 #define HOST_SLL_OP SHIFT_SLLX
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index ee2274d..56742bf 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -67,7 +67,8 @@ typedef enum {
 
 /* used for function call generation */
 #define TCG_REG_CALL_STACK TCG_REG_I6
-#ifdef __arch64__
+
+#if TCG_TARGET_REG_BITS == 64
 // Reserve space for AREG0
 #define TCG_TARGET_STACK_MINFRAME (176 + 4 * (int)sizeof(long) + \
                                    TCG_STATIC_CALL_ARGS_SIZE)
@@ -81,7 +82,7 @@ typedef enum {
 #define TCG_TARGET_STACK_ALIGN 8
 #endif
 
-#ifdef __arch64__
+#if TCG_TARGET_REG_BITS == 64
 #define TCG_TARGET_EXTEND_ARGS 1
 #endif
 
@@ -128,7 +129,7 @@ typedef enum {
 /* Note: must be synced with dyngen-exec.h */
 #ifdef CONFIG_SOLARIS
 #define TCG_AREG0 TCG_REG_G2
-#elif defined(__sparc_v9__)
+#elif HOST_LONG_BITS == 64
 #define TCG_AREG0 TCG_REG_G5
 #else
 #define TCG_AREG0 TCG_REG_G6
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 04/15] tcg-sparc: Fix qemu_ld/st to handle 32-bit host.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (2 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 03/15] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 05/15] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

At the same time, split out the tlb load logic to a new function.
Fixes the cases of two data registers and two address registers.
Fixes the signature of, and adds missing, qemu_ld/st opcodes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  751 ++++++++++++++++++++++++------------------------
 1 files changed, 378 insertions(+), 373 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 257d20a..8763b03 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -448,14 +448,15 @@ static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
     }
 }
 
-static inline void tcg_out_andi(TCGContext *s, int reg, tcg_target_long val)
+static inline void tcg_out_andi(TCGContext *s, int rd, int rs,
+                                tcg_target_long val)
 {
     if (val != 0) {
         if (check_fit_tl(val, 13))
-            tcg_out_arithi(s, reg, reg, val, ARITH_AND);
+            tcg_out_arithi(s, rd, rs, val, ARITH_AND);
         else {
             tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, val);
-            tcg_out_arith(s, reg, reg, TCG_REG_I5, ARITH_AND);
+            tcg_out_arith(s, rd, rs, TCG_REG_I5, ARITH_AND);
         }
     }
 }
@@ -744,422 +745,405 @@ static const void * const qemu_st_helpers[4] = {
     __stq_mmu,
 };
 #endif
-#endif
 
-#if TARGET_LONG_BITS == 32
-#define TARGET_LD_OP LDUW
-#else
-#define TARGET_LD_OP LDX
-#endif
+/* Perform the TLB load and compare.
 
-#if defined(CONFIG_SOFTMMU)
-#if HOST_LONG_BITS == 32
-#define TARGET_ADDEND_LD_OP LDUW
-#else
-#define TARGET_ADDEND_LD_OP LDX
-#endif
-#endif
+   Inputs:
+   ADDRLO_IDX contains the index into ARGS of the low part of the
+   address; the high part of the address is at ADDR_LOW_IDX+1.
 
-#if TCG_TARGET_REG_BITS == 64
-#define HOST_LD_OP LDX
-#define HOST_ST_OP STX
-#define HOST_SLL_OP SHIFT_SLLX
-#define HOST_SRA_OP SHIFT_SRAX
+   MEM_INDEX and S_BITS are the memory context and log2 size of the load.
+
+   WHICH is the offset into the CPUTLBEntry structure of the slot to read.
+   This should be offsetof addr_read or addr_write.
+
+   Outputs:
+   LABEL_PTRS is filled with the position of the forward jumps to the
+   TLB miss case.  This will always be a ,PN insn, so a 19-bit offset.
+
+   Returns a register loaded with the low part of the address, adjusted
+   as indicated by the TLB and so is a host address.  Undefined in the
+   TLB miss case.  */
+
+static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
+                            int s_bits, const TCGArg *args,
+                            uint32_t **label_ptr, int which)
+{
+    const int addrlo = args[addrlo_idx];
+    const int r0 = tcg_target_call_iarg_regs[0];
+    const int r1 = tcg_target_call_iarg_regs[1];
+    const int r2 = tcg_target_call_iarg_regs[2];
+    int addr = addrlo;
+    int tlb_ofs;
+
+    if (TCG_TARGET_REG_BITS == 32 && TARGET_LONG_BITS == 64) {
+        /* Assemble the 64-bit address in R0.  */
+        tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, r1, args[addrlo_idx + 1], 32, SHIFT_SLLX);
+        tcg_out_arith(s, r0, r0, r1, ARITH_OR);
+    }
+
+    /* Shift the page number down to tlb-entry.  */
+    tcg_out_arithi(s, r1, addrlo,
+                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS, SHIFT_SRL);
+
+    /* Mask out the page offset, except for the required alignment.  */
+    tcg_out_andi(s, r0, addr, TARGET_PAGE_MASK | ((1 << s_bits) - 1));
+
+    /* Compute tlb index, modulo tlb size.  */
+    tcg_out_andi(s, r1, r1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+
+    /* Relative to the current ENV.  */
+    tcg_out_arith(s, r1, TCG_AREG0, r1, ARITH_ADD);
+
+    /* Find a base address that can load both tlb comparator and addend.  */
+    tlb_ofs = offsetof(CPUArchState, tlb_table[mem_index][0]);
+    if (!check_fit_tl(tlb_ofs + sizeof(CPUTLBEntry), 13)) {
+        tcg_out_addi(s, r1, tlb_ofs);
+        tlb_ofs = 0;
+    }
+
+    /* ld [arg1 + which], arg2 */
+    tcg_out_ld(s, TCG_TYPE_TL, r2, r1, tlb_ofs + which);
+
+    /* subcc arg0, arg2, %g0 */
+    tcg_out_cmp(s, r0, r2, 0);
+
+    /* bne,pn %[ix]cc, label0 */
+    *label_ptr = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1) |
+                  ((TARGET_LONG_BITS == 64) << 21)));
+
+    /* TLB Hit.  Compute the host address into r1.  The ld is in the
+       branch delay slot; harmless for the TLB miss case.  */
+    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
+
+    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+        tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
+        tcg_out_arith(s, r1, r0, r1, ARITH_ADD);
+    } else {
+        tcg_out_arith(s, r1, addrlo, r1, ARITH_ADD);
+    }
+
+    return r1;
+}
+#endif /* CONFIG_SOFTMMU */
+
+static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
+                                   int datahi, int sizeop)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+    const int bigendian = 1;
 #else
-#define HOST_LD_OP LDUW
-#define HOST_ST_OP STW
-#define HOST_SLL_OP SHIFT_SLL
-#define HOST_SRA_OP SHIFT_SRA
+    const int bigendian = 0;
 #endif
+    switch (sizeop) {
+    case 0:
+        /* ldub [addr], datalo */
+        tcg_out_ldst(s, datalo, addr, 0, LDUB);
+        break;
+    case 0 | 4:
+        /* ldsb [addr], datalo */
+        tcg_out_ldst(s, datalo, addr, 0, LDSB);
+        break;
+    case 1:
+        if (bigendian) {
+            /* lduh [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDUH);
+        } else {
+            /* lduha [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDUHA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 1 | 4:
+        if (bigendian) {
+            /* ldsh [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDSH);
+        } else {
+            /* ldsha [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDSHA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 2:
+        if (bigendian) {
+            /* lduw [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDUW);
+        } else {
+            /* lduwa [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 2 | 4:
+        if (bigendian) {
+            /* ldsw [addr], datalo */
+            tcg_out_ldst(s, datalo, addr, 0, LDSW);
+        } else {
+            /* ldswa [addr] ASI_PRIMARY_LITTLE, datalo */
+            tcg_out_ldst_asi(s, datalo, addr, 0, LDSWA, ASI_PRIMARY_LITTLE);
+        }
+        break;
+    case 3:
+        if (TCG_TARGET_REG_BITS == 64) {
+            if (bigendian) {
+                /* ldx [addr], datalo */
+                tcg_out_ldst(s, datalo, addr, 0, LDX);
+            } else {
+                /* ldxa [addr] ASI_PRIMARY_LITTLE, datalo */
+                tcg_out_ldst_asi(s, datalo, addr, 0, LDXA, ASI_PRIMARY_LITTLE);
+            }
+        } else {
+            if (bigendian) {
+                tcg_out_ldst(s, datahi, addr, 0, LDUW);
+                tcg_out_ldst(s, datalo, addr, 4, LDUW);
+            } else {
+                tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
+                tcg_out_ldst_asi(s, datahi, addr, 4, LDUWA, ASI_PRIMARY_LITTLE);
+            }
+        }
+        break;
+    default:
+        tcg_abort();
+    }
+}
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 {
-    int addr_reg, data_reg, arg0, arg1, arg2, mem_index, s_bits;
+    int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
-    uint32_t *label1_ptr, *label2_ptr;
+    int memi_idx, memi, s_bits, n;
+    uint32_t *label_ptr[2];
 #endif
 
-    data_reg = *args++;
-    addr_reg = *args++;
-    mem_index = *args;
-    s_bits = opc & 3;
-
-    arg0 = TCG_REG_O0;
-    arg1 = TCG_REG_O1;
-    arg2 = TCG_REG_O2;
+    datahi = datalo = args[0];
+    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+        datahi = args[1];
+        addrlo_idx = 2;
+    }
 
 #if defined(CONFIG_SOFTMMU)
-    /* srl addr_reg, x, arg1 */
-    tcg_out_arithi(s, arg1, addr_reg, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS,
-                   SHIFT_SRL);
-    /* and addr_reg, x, arg0 */
-    tcg_out_arithi(s, arg0, addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1),
-                   ARITH_AND);
-
-    /* and arg1, x, arg1 */
-    tcg_out_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
-
-    /* add arg1, x, arg1 */
-    tcg_out_addi(s, arg1, offsetof(CPUArchState,
-                                   tlb_table[mem_index][0].addr_read));
-
-    /* add env, arg1, arg1 */
-    tcg_out_arith(s, arg1, TCG_AREG0, arg1, ARITH_ADD);
+    memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
+    memi = args[memi_idx];
+    s_bits = opc & 3;
 
-    /* ld [arg1], arg2 */
-    tcg_out32(s, TARGET_LD_OP | INSN_RD(arg2) | INSN_RS1(arg1) |
-              INSN_RS2(TCG_REG_G0));
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
+                                label_ptr, offsetof(CPUTLBEntry, addr_read));
 
-    /* subcc arg0, arg2, %g0 */
-    tcg_out_arith(s, TCG_REG_G0, arg0, arg2, ARITH_SUBCC);
+    /* TLB Hit.  */
+    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
 
-    /* will become:
-       be label1
-        or
-       be,pt %xcc label1 */
-    label1_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
+    /* b,pt,n label1 */
+    label_ptr[1] = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+                  | (1 << 29) | (1 << 19)));
 
-    /* mov (delay slot) */
-    tcg_out_mov(s, TCG_TYPE_PTR, arg0, addr_reg);
+    /* TLB Miss.  */
 
-    /* mov */
-    tcg_out_movi(s, TCG_TYPE_I32, arg1, mem_index);
+    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[0]);
+    n = 0;
 #ifdef CONFIG_TCG_PASS_AREG0
-    /* XXX/FIXME: suboptimal */
-    tcg_out_mov(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
-                tcg_target_call_iarg_regs[2]);
-    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
-                tcg_target_call_iarg_regs[1]);
-    tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
-                tcg_target_call_iarg_regs[0]);
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0],
-                TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
 #endif
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                    args[addrlo_idx + 1]);
+    }
+    tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                args[addrlo_idx]);
+
+    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
+       global registers */
+    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
 
-    /* XXX: move that code at the end of the TB */
     /* qemu_ld_helper[s_bits](arg0, arg1) */
     tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_ld_helpers[s_bits]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
                          & 0x3fffffff));
-    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-       global registers */
-    // delay slot
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_ST_OP);
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_LD_OP);
-
-    /* data_reg = sign_extend(arg0) */
+    /* delay slot */
+    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[n], memi);
+
+    /* Reload AREG0.  */
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
+
+    n = tcg_target_call_oarg_regs[0];
+    /* datalo = sign_extend(arg0) */
     switch(opc) {
     case 0 | 4:
-        /* sll arg0, 24/56, data_reg */
-        tcg_out_arithi(s, data_reg, arg0, (int)sizeof(tcg_target_long) * 8 - 8,
-                       HOST_SLL_OP);
-        /* sra data_reg, 24/56, data_reg */
-        tcg_out_arithi(s, data_reg, data_reg,
-                       (int)sizeof(tcg_target_long) * 8 - 8, HOST_SRA_OP);
+        /* Recall that SRA sign extends from bit 31 through bit 63.  */
+        tcg_out_arithi(s, datalo, n, 24, SHIFT_SLL);
+        tcg_out_arithi(s, datalo, datalo, 24, SHIFT_SRA);
         break;
     case 1 | 4:
-        /* sll arg0, 16/48, data_reg */
-        tcg_out_arithi(s, data_reg, arg0,
-                       (int)sizeof(tcg_target_long) * 8 - 16, HOST_SLL_OP);
-        /* sra data_reg, 16/48, data_reg */
-        tcg_out_arithi(s, data_reg, data_reg,
-                       (int)sizeof(tcg_target_long) * 8 - 16, HOST_SRA_OP);
+        tcg_out_arithi(s, datalo, n, 16, SHIFT_SLL);
+        tcg_out_arithi(s, datalo, datalo, 16, SHIFT_SRA);
         break;
     case 2 | 4:
-        /* sll arg0, 32, data_reg */
-        tcg_out_arithi(s, data_reg, arg0, 32, HOST_SLL_OP);
-        /* sra data_reg, 32, data_reg */
-        tcg_out_arithi(s, data_reg, data_reg, 32, HOST_SRA_OP);
+        tcg_out_arithi(s, datalo, n, 0, SHIFT_SRA);
         break;
+    case 3:
+        if (TCG_TARGET_REG_BITS == 32) {
+            tcg_out_mov(s, TCG_TYPE_REG, datahi, n);
+            tcg_out_mov(s, TCG_TYPE_REG, datalo, n + 1);
+            break;
+        }
+        /* FALLTHRU */
     case 0:
     case 1:
     case 2:
-    case 3:
     default:
         /* mov */
-        tcg_out_mov(s, TCG_TYPE_REG, data_reg, arg0);
+        tcg_out_mov(s, TCG_TYPE_REG, datalo, n);
         break;
     }
 
-    /* will become:
-       ba label2 */
-    label2_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
-
-    /* nop (delay slot */
-    tcg_out_nop(s);
-
-    /* label1: */
-#if TARGET_LONG_BITS == 32
-    /* be label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
+    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[1]);
 #else
-    /* be,pt %xcc label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1) |
-                   (0x5 << 19) | INSN_OFF19((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
-#endif
-
-    /* ld [arg1 + x], arg1 */
-    tcg_out_ldst(s, arg1, arg1, offsetof(CPUTLBEntry, addend) -
-                 offsetof(CPUTLBEntry, addr_read), TARGET_ADDEND_LD_OP);
-
-#if TARGET_LONG_BITS == 32
-    /* and addr_reg, x, arg0 */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, 0xffffffff);
-    tcg_out_arith(s, arg0, addr_reg, TCG_REG_I5, ARITH_AND);
-    /* add arg0, arg1, arg0 */
-    tcg_out_arith(s, arg0, arg0, arg1, ARITH_ADD);
-#else
-    /* add addr_reg, arg1, arg0 */
-    tcg_out_arith(s, arg0, addr_reg, arg1, ARITH_ADD);
-#endif
+    addr_reg = args[addrlo_idx];
+    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_I5;
+    }
+    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
+#endif /* CONFIG_SOFTMMU */
+}
 
+static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
+                                   int datahi, int sizeop)
+{
+#ifdef TARGET_WORDS_BIGENDIAN
+    const int bigendian = 1;
 #else
-    arg0 = addr_reg;
+    const int bigendian = 0;
 #endif
-
-    switch(opc) {
+    switch (sizeop) {
     case 0:
-        /* ldub [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDUB);
-        break;
-    case 0 | 4:
-        /* ldsb [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDSB);
+        /* stb datalo, [addr] */
+        tcg_out_ldst(s, datalo, addr, 0, STB);
         break;
     case 1:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* lduh [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDUH);
-#else
-        /* lduha [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDUHA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 1 | 4:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* ldsh [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDSH);
-#else
-        /* ldsha [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDSHA, ASI_PRIMARY_LITTLE);
-#endif
+        if (bigendian) {
+            /* sth datalo, [addr] */
+            tcg_out_ldst(s, datalo, addr, 0, STH);
+        } else {
+            /* stha datalo, [addr] ASI_PRIMARY_LITTLE */
+            tcg_out_ldst_asi(s, datalo, addr, 0, STHA, ASI_PRIMARY_LITTLE);
+        }
         break;
     case 2:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* lduw [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDUW);
-#else
-        /* lduwa [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDUWA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 2 | 4:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* ldsw [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDSW);
-#else
-        /* ldswa [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDSWA, ASI_PRIMARY_LITTLE);
-#endif
+        if (bigendian) {
+            /* stw datalo, [addr] */
+            tcg_out_ldst(s, datalo, addr, 0, STW);
+        } else {
+            /* stwa datalo, [addr] ASI_PRIMARY_LITTLE */
+            tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
+        }
         break;
     case 3:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* ldx [arg0], data_reg */
-        tcg_out_ldst(s, data_reg, arg0, 0, LDX);
-#else
-        /* ldxa [arg0] ASI_PRIMARY_LITTLE, data_reg */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, LDXA, ASI_PRIMARY_LITTLE);
-#endif
+        if (TCG_TARGET_REG_BITS == 64) {
+            if (bigendian) {
+                /* stx datalo, [addr] */
+                tcg_out_ldst(s, datalo, addr, 0, STX);
+            } else {
+                /* stxa datalo, [addr] ASI_PRIMARY_LITTLE */
+                tcg_out_ldst_asi(s, datalo, addr, 0, STXA, ASI_PRIMARY_LITTLE);
+            }
+        } else {
+            if (bigendian) {
+                tcg_out_ldst(s, datahi, addr, 0, STW);
+                tcg_out_ldst(s, datalo, addr, 4, STW);
+            } else {
+                tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
+                tcg_out_ldst_asi(s, datahi, addr, 4, STWA, ASI_PRIMARY_LITTLE);
+            }
+        }
         break;
     default:
         tcg_abort();
     }
-
-#if defined(CONFIG_SOFTMMU)
-    /* label2: */
-    *label2_ptr = (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label2_ptr));
-#endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 {
-    int addr_reg, data_reg, arg0, arg1, arg2, mem_index, s_bits;
+    int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
-    uint32_t *label1_ptr, *label2_ptr;
+    int memi_idx, memi, n;
+    uint32_t *label_ptr[2];
 #endif
 
-    data_reg = *args++;
-    addr_reg = *args++;
-    mem_index = *args;
-
-    s_bits = opc;
-
-    arg0 = TCG_REG_O0;
-    arg1 = TCG_REG_O1;
-    arg2 = TCG_REG_O2;
+    datahi = datalo = args[0];
+    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+        datahi = args[1];
+        addrlo_idx = 2;
+    }
 
 #if defined(CONFIG_SOFTMMU)
-    /* srl addr_reg, x, arg1 */
-    tcg_out_arithi(s, arg1, addr_reg, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS,
-                   SHIFT_SRL);
+    memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
+    memi = args[memi_idx];
 
-    /* and addr_reg, x, arg0 */
-    tcg_out_arithi(s, arg0, addr_reg, TARGET_PAGE_MASK | ((1 << s_bits) - 1),
-                   ARITH_AND);
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, opc, args,
+                                label_ptr, offsetof(CPUTLBEntry, addr_write));
 
-    /* and arg1, x, arg1 */
-    tcg_out_andi(s, arg1, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
+    /* TLB Hit.  */
+    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
 
-    /* add arg1, x, arg1 */
-    tcg_out_addi(s, arg1, offsetof(CPUArchState,
-                                   tlb_table[mem_index][0].addr_write));
+    /* b,pt,n label1 */
+    label_ptr[1] = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+                  | (1 << 29) | (1 << 19)));
 
-    /* add env, arg1, arg1 */
-    tcg_out_arith(s, arg1, TCG_AREG0, arg1, ARITH_ADD);
+    /* TLB Miss.  */
 
-    /* ld [arg1], arg2 */
-    tcg_out32(s, TARGET_LD_OP | INSN_RD(arg2) | INSN_RS1(arg1) |
-              INSN_RS2(TCG_REG_G0));
-
-    /* subcc arg0, arg2, %g0 */
-    tcg_out_arith(s, TCG_REG_G0, arg0, arg2, ARITH_SUBCC);
-
-    /* will become:
-       be label1
-        or
-       be,pt %xcc label1 */
-    label1_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
-
-    /* mov (delay slot) */
-    tcg_out_mov(s, TCG_TYPE_PTR, arg0, addr_reg);
-
-    /* mov */
-    tcg_out_mov(s, TCG_TYPE_REG, arg1, data_reg);
-
-    /* mov */
-    tcg_out_movi(s, TCG_TYPE_I32, arg2, mem_index);
+    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[0]);
 
+    n = 0;
 #ifdef CONFIG_TCG_PASS_AREG0
-    /* XXX/FIXME: suboptimal */
-    tcg_out_mov(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
-                tcg_target_call_iarg_regs[2]);
-    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[2],
-                tcg_target_call_iarg_regs[1]);
-    tcg_out_mov(s, TCG_TYPE_TL, tcg_target_call_iarg_regs[1],
-                tcg_target_call_iarg_regs[0]);
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0],
-                TCG_AREG0);
+    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
 #endif
-    /* XXX: move that code at the end of the TB */
-    /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
-    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[s_bits]
-                           - (tcg_target_ulong)s->code_ptr) >> 2)
-                         & 0x3fffffff));
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                    args[addrlo_idx + 1]);
+    }
+    tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
+                args[addrlo_idx]);
+    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+        tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datahi);
+    }
+    tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
+
     /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
        global registers */
-    // delay slot
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_ST_OP);
-    tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                 TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                 sizeof(long), HOST_LD_OP);
-
-    /* will become:
-       ba label2 */
-    label2_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, 0);
-
-    /* nop (delay slot) */
-    tcg_out_nop(s);
-
-#if TARGET_LONG_BITS == 32
-    /* be label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
-#else
-    /* be,pt %xcc label1 */
-    *label1_ptr = (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1) |
-                   (0x5 << 19) | INSN_OFF19((unsigned long)s->code_ptr -
-                              (unsigned long)label1_ptr));
-#endif
+    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
 
-    /* ld [arg1 + x], arg1 */
-    tcg_out_ldst(s, arg1, arg1, offsetof(CPUTLBEntry, addend) -
-                 offsetof(CPUTLBEntry, addr_write), TARGET_ADDEND_LD_OP);
+    /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
+    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[opc]
+                           - (tcg_target_ulong)s->code_ptr) >> 2)
+                         & 0x3fffffff));
+    /* delay slot */
+    tcg_out_movi(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n], memi);
 
-#if TARGET_LONG_BITS == 32
-    /* and addr_reg, x, arg0 */
-    tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, 0xffffffff);
-    tcg_out_arith(s, arg0, addr_reg, TCG_REG_I5, ARITH_AND);
-    /* add arg0, arg1, arg0 */
-    tcg_out_arith(s, arg0, arg0, arg1, ARITH_ADD);
-#else
-    /* add addr_reg, arg1, arg0 */
-    tcg_out_arith(s, arg0, addr_reg, arg1, ARITH_ADD);
-#endif
+    /* Reload AREG0.  */
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+               sizeof(long));
 
+    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                (unsigned long)label_ptr[1]);
 #else
-    arg0 = addr_reg;
-#endif
-
-    switch(opc) {
-    case 0:
-        /* stb data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STB);
-        break;
-    case 1:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* sth data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STH);
-#else
-        /* stha data_reg, [arg0] ASI_PRIMARY_LITTLE */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, STHA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 2:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* stw data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STW);
-#else
-        /* stwa data_reg, [arg0] ASI_PRIMARY_LITTLE */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, STWA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    case 3:
-#ifdef TARGET_WORDS_BIGENDIAN
-        /* stx data_reg, [arg0] */
-        tcg_out_ldst(s, data_reg, arg0, 0, STX);
-#else
-        /* stxa data_reg, [arg0] ASI_PRIMARY_LITTLE */
-        tcg_out_ldst_asi(s, data_reg, arg0, 0, STXA, ASI_PRIMARY_LITTLE);
-#endif
-        break;
-    default:
-        tcg_abort();
+    addr_reg = args[addrlo_idx];
+    if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
+        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_I5;
     }
-
-#if defined(CONFIG_SOFTMMU)
-    /* label2: */
-    *label2_ptr = (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x2) |
-                   INSN_OFF22((unsigned long)s->code_ptr -
-                              (unsigned long)label2_ptr));
-#endif
+    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
+#endif /* CONFIG_SOFTMMU */
 }
 
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
@@ -1205,12 +1189,12 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
            global registers */
         // delay slot
-        tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                     TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                     sizeof(long), HOST_ST_OP);
-        tcg_out_ldst(s, TCG_AREG0, TCG_REG_CALL_STACK,
-                     TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                     sizeof(long), HOST_LD_OP);
+        tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+                   sizeof(long));
+        tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
+                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
+                   sizeof(long));
         break;
     case INDEX_op_jmp:
     case INDEX_op_br:
@@ -1378,6 +1362,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out_qemu_ld(s, args, 2 | 4);
         break;
 #endif
+    case INDEX_op_qemu_ld64:
+        tcg_out_qemu_ld(s, args, 3);
+        break;
     case INDEX_op_qemu_st8:
         tcg_out_qemu_st(s, args, 0);
         break;
@@ -1387,6 +1374,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_qemu_st32:
         tcg_out_qemu_st(s, args, 2);
         break;
+    case INDEX_op_qemu_st64:
+        tcg_out_qemu_st(s, args, 3);
+        break;
 
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_movi_i64:
@@ -1451,13 +1441,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                             args[2], const_args[2]);
         break;
 
-    case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
-        break;
-    case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
-        break;
-
 #endif
     gen_arith:
         tcg_out_arithc(s, args[0], args[1], args[2], const_args[2], c);
@@ -1522,20 +1505,6 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_mulu2_i32, { "r", "r", "r", "rJ" } },
 #endif
 
-    { INDEX_op_qemu_ld8u, { "r", "L" } },
-    { INDEX_op_qemu_ld8s, { "r", "L" } },
-    { INDEX_op_qemu_ld16u, { "r", "L" } },
-    { INDEX_op_qemu_ld16s, { "r", "L" } },
-    { INDEX_op_qemu_ld32, { "r", "L" } },
-#if TCG_TARGET_REG_BITS == 64
-    { INDEX_op_qemu_ld32u, { "r", "L" } },
-    { INDEX_op_qemu_ld32s, { "r", "L" } },
-#endif
-
-    { INDEX_op_qemu_st8, { "L", "L" } },
-    { INDEX_op_qemu_st16, { "L", "L" } },
-    { INDEX_op_qemu_st32, { "L", "L" } },
-
 #if TCG_TARGET_REG_BITS == 64
     { INDEX_op_mov_i64, { "r", "r" } },
     { INDEX_op_movi_i64, { "r" } },
@@ -1550,8 +1519,6 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_st16_i64, { "r", "r" } },
     { INDEX_op_st32_i64, { "r", "r" } },
     { INDEX_op_st_i64, { "r", "r" } },
-    { INDEX_op_qemu_ld64, { "L", "L" } },
-    { INDEX_op_qemu_st64, { "L", "L" } },
 
     { INDEX_op_add_i64, { "r", "r", "rJ" } },
     { INDEX_op_mul_i64, { "r", "r", "rJ" } },
@@ -1578,10 +1545,48 @@ static const TCGTargetOpDef sparc_op_defs[] = {
 
     { INDEX_op_brcond_i64, { "r", "rJ" } },
     { INDEX_op_setcond_i64, { "r", "r", "rJ" } },
-#else
-    { INDEX_op_qemu_ld64, { "L", "L", "L" } },
+#endif
+
+#if TCG_TARGET_REG_BITS == 64
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32, { "r", "L" } },
+    { INDEX_op_qemu_ld32u, { "r", "L" } },
+    { INDEX_op_qemu_ld32s, { "r", "L" } },
+    { INDEX_op_qemu_ld64, { "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L" } },
+    { INDEX_op_qemu_st64, { "L", "L" } },
+#elif TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
+    { INDEX_op_qemu_ld8u, { "r", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L" } },
+    { INDEX_op_qemu_ld32, { "r", "L" } },
+    { INDEX_op_qemu_ld64, { "r", "r", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L" } },
     { INDEX_op_qemu_st64, { "L", "L", "L" } },
+#else
+    { INDEX_op_qemu_ld8u, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld8s, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld16u, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld16s, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld32, { "r", "L", "L" } },
+    { INDEX_op_qemu_ld64, { "L", "L", "L", "L" } },
+
+    { INDEX_op_qemu_st8, { "L", "L", "L" } },
+    { INDEX_op_qemu_st16, { "L", "L", "L" } },
+    { INDEX_op_qemu_st32, { "L", "L", "L" } },
+    { INDEX_op_qemu_st64, { "L", "L", "L", "L" } },
 #endif
+
     { -1 },
 };
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 05/15] tcg-sparc: Simplify qemu_ld/st direct memory paths.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (3 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 04/15] tcg-sparc: Fix qemu_ld/st to handle 32-bit host Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 06/15] tcg-sparc: Support GUEST_BASE Richard Henderson
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

Given that we have an opcode for all sizes, all endianness,
turn the functions into a simple table lookup.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  209 +++++++++++++-----------------------------------
 1 files changed, 56 insertions(+), 153 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 8763b03..1b27626 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -294,6 +294,16 @@ static inline int tcg_target_const_match(tcg_target_long val,
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
 
+#define LDUH_LE    (LDUHA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDSH_LE    (LDSHA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDUW_LE    (LDUWA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDSW_LE    (LDSWA | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define LDX_LE     (LDXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+
+#define STH_LE     (STHA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define STW_LE     (STWA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+#define STX_LE     (STXA  | INSN_ASI(ASI_PRIMARY_LITTLE))
+
 static inline void tcg_out_arith(TCGContext *s, int rd, int rs1, int rs2,
                                  int op)
 {
@@ -366,66 +376,46 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
     }
 }
 
-static inline void tcg_out_ld_raw(TCGContext *s, int ret,
-                                  tcg_target_long arg)
+static inline void tcg_out_ldst_rr(TCGContext *s, int data, int a1,
+                                   int a2, int op)
 {
-    tcg_out_sethi(s, ret, arg);
-    tcg_out32(s, LDUW | INSN_RD(ret) | INSN_RS1(ret) |
-              INSN_IMM13(arg & 0x3ff));
+    tcg_out32(s, op | INSN_RD(data) | INSN_RS1(a1) | INSN_RS2(a2));
 }
 
-static inline void tcg_out_ld_ptr(TCGContext *s, int ret,
-                                  tcg_target_long arg)
+static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
+                                int offset, int op)
 {
-    if (!check_fit_tl(arg, 10))
-        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ffULL);
-    if (TCG_TARGET_REG_BITS == 64) {
-        tcg_out32(s, LDX | INSN_RD(ret) | INSN_RS1(ret) |
-                  INSN_IMM13(arg & 0x3ff));
-    } else {
-        tcg_out32(s, LDUW | INSN_RD(ret) | INSN_RS1(ret) |
-                  INSN_IMM13(arg & 0x3ff));
-    }
-}
-
-static inline void tcg_out_ldst(TCGContext *s, int ret, int addr, int offset, int op)
-{
-    if (check_fit_tl(offset, 13))
+    if (check_fit_tl(offset, 13)) {
         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
                   INSN_IMM13(offset));
-    else {
+    } else {
         tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
-        tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(TCG_REG_I5) |
-                  INSN_RS2(addr));
+        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
     }
 }
 
-static inline void tcg_out_ldst_asi(TCGContext *s, int ret, int addr,
-                                    int offset, int op, int asi)
-{
-    tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
-    tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(TCG_REG_I5) |
-              INSN_ASI(asi) | INSN_RS2(addr));
-}
-
 static inline void tcg_out_ld(TCGContext *s, TCGType type, TCGReg ret,
                               TCGReg arg1, tcg_target_long arg2)
 {
-    if (type == TCG_TYPE_I32)
-        tcg_out_ldst(s, ret, arg1, arg2, LDUW);
-    else
-        tcg_out_ldst(s, ret, arg1, arg2, LDX);
+    tcg_out_ldst(s, ret, arg1, arg2, (type == TCG_TYPE_I32 ? LDUW : LDX));
 }
 
 static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                               TCGReg arg1, tcg_target_long arg2)
 {
-    if (type == TCG_TYPE_I32)
-        tcg_out_ldst(s, arg, arg1, arg2, STW);
-    else
-        tcg_out_ldst(s, arg, arg1, arg2, STX);
+    tcg_out_ldst(s, arg, arg1, arg2, (type == TCG_TYPE_I32 ? STW : STX));
 }
 
+static inline void tcg_out_ld_ptr(TCGContext *s, int ret,
+                                  tcg_target_long arg)
+{
+    if (!check_fit_tl(arg, 10)) {
+        tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
+    }
+    tcg_out_ld(s, TCG_TYPE_PTR, ret, ret, arg & 0x3ff);
+}
+
+
 static inline void tcg_out_sety(TCGContext *s, int rs)
 {
     tcg_out32(s, WRY | INSN_RS1(TCG_REG_G0) | INSN_RS2(rs));
@@ -833,76 +823,26 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
                                    int datahi, int sizeop)
 {
 #ifdef TARGET_WORDS_BIGENDIAN
-    const int bigendian = 1;
+    static const int ld_opc[8] = {
+        LDUB, LDUH, LDUW, LDX, LDSB, LDSH, LDSW, LDX
+    };
 #else
-    const int bigendian = 0;
+    static const int ld_opc[8] = {
+        LDUB, LDUH_LE, LDUW_LE, LDX_LE, LDSB, LDSH_LE, LDSW_LE, LDX_LE
+    };
 #endif
-    switch (sizeop) {
-    case 0:
-        /* ldub [addr], datalo */
-        tcg_out_ldst(s, datalo, addr, 0, LDUB);
-        break;
-    case 0 | 4:
-        /* ldsb [addr], datalo */
-        tcg_out_ldst(s, datalo, addr, 0, LDSB);
-        break;
-    case 1:
-        if (bigendian) {
-            /* lduh [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDUH);
-        } else {
-            /* lduha [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDUHA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 1 | 4:
-        if (bigendian) {
-            /* ldsh [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDSH);
-        } else {
-            /* ldsha [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDSHA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 2:
-        if (bigendian) {
-            /* lduw [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDUW);
-        } else {
-            /* lduwa [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 2 | 4:
-        if (bigendian) {
-            /* ldsw [addr], datalo */
-            tcg_out_ldst(s, datalo, addr, 0, LDSW);
-        } else {
-            /* ldswa [addr] ASI_PRIMARY_LITTLE, datalo */
-            tcg_out_ldst_asi(s, datalo, addr, 0, LDSWA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 3:
-        if (TCG_TARGET_REG_BITS == 64) {
-            if (bigendian) {
-                /* ldx [addr], datalo */
-                tcg_out_ldst(s, datalo, addr, 0, LDX);
-            } else {
-                /* ldxa [addr] ASI_PRIMARY_LITTLE, datalo */
-                tcg_out_ldst_asi(s, datalo, addr, 0, LDXA, ASI_PRIMARY_LITTLE);
-            }
-        } else {
-            if (bigendian) {
-                tcg_out_ldst(s, datahi, addr, 0, LDUW);
-                tcg_out_ldst(s, datalo, addr, 4, LDUW);
-            } else {
-                tcg_out_ldst_asi(s, datalo, addr, 0, LDUWA, ASI_PRIMARY_LITTLE);
-                tcg_out_ldst_asi(s, datahi, addr, 4, LDUWA, ASI_PRIMARY_LITTLE);
-            }
+
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        /* Load all 64-bits into an O/G register.  */
+        int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
+        tcg_out_ldst_rr(s, reg64, addr, TCG_REG_G0, ld_opc[sizeop]);
+        /* Move the two 32-bit pieces into the destination registers.  */
+        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
+        if (reg64 != datalo) {
+            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
         }
-        break;
-    default:
-        tcg_abort();
+    } else {
+        tcg_out_ldst_rr(s, datalo, addr, TCG_REG_G0, ld_opc[sizeop]);
     }
 }
 
@@ -1016,55 +956,18 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
                                    int datahi, int sizeop)
 {
 #ifdef TARGET_WORDS_BIGENDIAN
-    const int bigendian = 1;
+    static const int st_opc[4] = { STB, STH, STW, STX };
 #else
-    const int bigendian = 0;
+    static const int st_opc[4] = { STB, STH_LE, STW_LE, STX_LE };
 #endif
-    switch (sizeop) {
-    case 0:
-        /* stb datalo, [addr] */
-        tcg_out_ldst(s, datalo, addr, 0, STB);
-        break;
-    case 1:
-        if (bigendian) {
-            /* sth datalo, [addr] */
-            tcg_out_ldst(s, datalo, addr, 0, STH);
-        } else {
-            /* stha datalo, [addr] ASI_PRIMARY_LITTLE */
-            tcg_out_ldst_asi(s, datalo, addr, 0, STHA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 2:
-        if (bigendian) {
-            /* stw datalo, [addr] */
-            tcg_out_ldst(s, datalo, addr, 0, STW);
-        } else {
-            /* stwa datalo, [addr] ASI_PRIMARY_LITTLE */
-            tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
-        }
-        break;
-    case 3:
-        if (TCG_TARGET_REG_BITS == 64) {
-            if (bigendian) {
-                /* stx datalo, [addr] */
-                tcg_out_ldst(s, datalo, addr, 0, STX);
-            } else {
-                /* stxa datalo, [addr] ASI_PRIMARY_LITTLE */
-                tcg_out_ldst_asi(s, datalo, addr, 0, STXA, ASI_PRIMARY_LITTLE);
-            }
-        } else {
-            if (bigendian) {
-                tcg_out_ldst(s, datahi, addr, 0, STW);
-                tcg_out_ldst(s, datalo, addr, 4, STW);
-            } else {
-                tcg_out_ldst_asi(s, datalo, addr, 0, STWA, ASI_PRIMARY_LITTLE);
-                tcg_out_ldst_asi(s, datahi, addr, 4, STWA, ASI_PRIMARY_LITTLE);
-            }
-        }
-        break;
-    default:
-        tcg_abort();
+
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        tcg_out_arithi(s, TCG_REG_O0, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
+        tcg_out_arith(s, TCG_REG_O0, TCG_REG_O0, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_O0;
     }
+    tcg_out_ldst_rr(s, datalo, addr, TCG_REG_G0, st_opc[sizeop]);
 }
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 06/15] tcg-sparc: Support GUEST_BASE.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (4 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 05/15] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 07/15] tcg-sparc: Steamline qemu_ld/st more Richard Henderson
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure              |    2 ++
 tcg/sparc/tcg-target.c |   40 +++++++++++++++++++++++++++++-----------
 tcg/sparc/tcg-target.h |    2 ++
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/configure b/configure
index 7741ba9..a79a090 100755
--- a/configure
+++ b/configure
@@ -819,6 +819,7 @@ case "$cpu" in
            if test "$solaris" = "no" ; then
              QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
            fi
+           host_guest_base="yes"
            ;;
     sparc64)
            LDFLAGS="-m64 $LDFLAGS"
@@ -827,6 +828,7 @@ case "$cpu" in
            if test "$solaris" != "no" ; then
              QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
            fi
+           host_guest_base="yes"
            ;;
     s390)
            QEMU_CFLAGS="-m31 -march=z990 $QEMU_CFLAGS"
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 1b27626..9891648 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -59,6 +59,12 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+#ifdef CONFIG_USE_GUEST_BASE
+# define TCG_GUEST_BASE_REG TCG_REG_I3
+#else
+# define TCG_GUEST_BASE_REG TCG_REG_G0
+#endif
+
 #ifdef CONFIG_TCG_PASS_AREG0
 #define ARG_OFFSET 1
 #else
@@ -689,6 +695,14 @@ static void tcg_target_qemu_prologue(TCGContext *s)
     tcg_out32(s, SAVE | INSN_RD(TCG_REG_O6) | INSN_RS1(TCG_REG_O6) |
               INSN_IMM13(-(TCG_TARGET_STACK_MINFRAME +
                            CPU_TEMP_BUF_NLONGS * (int)sizeof(long))));
+
+#ifdef CONFIG_USE_GUEST_BASE
+    if (GUEST_BASE != 0) {
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_GUEST_BASE_REG, GUEST_BASE);
+        tcg_regset_set_reg(s->reserved_regs, TCG_GUEST_BASE_REG);
+    }
+#endif
+
     tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I1) |
               INSN_RS2(TCG_REG_G0));
     tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_I0);
@@ -819,8 +833,8 @@ static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
 }
 #endif /* CONFIG_SOFTMMU */
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
-                                   int datahi, int sizeop)
+static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int addend,
+                                   int datalo, int datahi, int sizeop)
 {
 #ifdef TARGET_WORDS_BIGENDIAN
     static const int ld_opc[8] = {
@@ -835,14 +849,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int datalo,
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         /* Load all 64-bits into an O/G register.  */
         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
-        tcg_out_ldst_rr(s, reg64, addr, TCG_REG_G0, ld_opc[sizeop]);
+        tcg_out_ldst_rr(s, reg64, addr, addend, ld_opc[sizeop]);
         /* Move the two 32-bit pieces into the destination registers.  */
         tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
         if (reg64 != datalo) {
             tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
         }
     } else {
-        tcg_out_ldst_rr(s, datalo, addr, TCG_REG_G0, ld_opc[sizeop]);
+        tcg_out_ldst_rr(s, datalo, addr, addend, ld_opc[sizeop]);
     }
 }
 
@@ -869,7 +883,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
                                 label_ptr, offsetof(CPUTLBEntry, addr_read));
 
     /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
+    tcg_out_qemu_ld_direct(s, addr_reg, TCG_REG_G0, datalo, datahi, opc);
 
     /* b,pt,n label1 */
     label_ptr[1] = (uint32_t *)s->code_ptr;
@@ -948,12 +962,14 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_I5;
     }
-    tcg_out_qemu_ld_direct(s, addr_reg, datalo, datahi, opc);
+    tcg_out_qemu_ld_direct(s, addr_reg,
+                           (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                           datalo, datahi, opc);
 #endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
-                                   int datahi, int sizeop)
+static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int addend,
+                                   int datalo, int datahi, int sizeop)
 {
 #ifdef TARGET_WORDS_BIGENDIAN
     static const int st_opc[4] = { STB, STH, STW, STX };
@@ -967,7 +983,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int datalo,
         tcg_out_arith(s, TCG_REG_O0, TCG_REG_O0, TCG_REG_O2, ARITH_OR);
         datalo = TCG_REG_O0;
     }
-    tcg_out_ldst_rr(s, datalo, addr, TCG_REG_G0, st_opc[sizeop]);
+    tcg_out_ldst_rr(s, datalo, addr, addend, st_opc[sizeop]);
 }
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
@@ -992,7 +1008,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                                 label_ptr, offsetof(CPUTLBEntry, addr_write));
 
     /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
+    tcg_out_qemu_st_direct(s, addr_reg, TCG_REG_G0, datalo, datahi, opc);
 
     /* b,pt,n label1 */
     label_ptr[1] = (uint32_t *)s->code_ptr;
@@ -1045,7 +1061,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_I5;
     }
-    tcg_out_qemu_st_direct(s, addr_reg, datalo, datahi, opc);
+    tcg_out_qemu_st_direct(s, addr_reg,
+                           (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                           datalo, datahi, opc);
 #endif /* CONFIG_SOFTMMU */
 }
 
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 56742bf..e69dfc8 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -126,6 +126,8 @@ typedef enum {
 #define TCG_TARGET_HAS_deposit_i64      0
 #endif
 
+#define TCG_TARGET_HAS_GUEST_BASE
+
 /* Note: must be synced with dyngen-exec.h */
 #ifdef CONFIG_SOLARIS
 #define TCG_AREG0 TCG_REG_G2
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 07/15] tcg-sparc: Steamline qemu_ld/st more.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (5 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 06/15] tcg-sparc: Support GUEST_BASE Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  235 +++++++++++++++++++++++++----------------------
 1 files changed, 125 insertions(+), 110 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 9891648..d45114f 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -761,22 +761,16 @@ static const void * const qemu_st_helpers[4] = {
    WHICH is the offset into the CPUTLBEntry structure of the slot to read.
    This should be offsetof addr_read or addr_write.
 
-   Outputs:
-   LABEL_PTRS is filled with the position of the forward jumps to the
-   TLB miss case.  This will always be a ,PN insn, so a 19-bit offset.
-
-   Returns a register loaded with the low part of the address, adjusted
-   as indicated by the TLB and so is a host address.  Undefined in the
-   TLB miss case.  */
+   The result of the TLB comparison is in %[ix]cc.  The sanitized address
+   is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
 
 static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
-                            int s_bits, const TCGArg *args,
-                            uint32_t **label_ptr, int which)
+                            int s_bits, const TCGArg *args, int which)
 {
     const int addrlo = args[addrlo_idx];
-    const int r0 = tcg_target_call_iarg_regs[0];
-    const int r1 = tcg_target_call_iarg_regs[1];
-    const int r2 = tcg_target_call_iarg_regs[2];
+    const int r0 = TCG_REG_O0;
+    const int r1 = TCG_REG_O1;
+    const int r2 = TCG_REG_O2;
     int addr = addrlo;
     int tlb_ofs;
 
@@ -807,60 +801,39 @@ static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
         tlb_ofs = 0;
     }
 
-    /* ld [arg1 + which], arg2 */
+    /* Load the tlb comparator and the addend.  */
     tcg_out_ld(s, TCG_TYPE_TL, r2, r1, tlb_ofs + which);
+    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
 
     /* subcc arg0, arg2, %g0 */
     tcg_out_cmp(s, r0, r2, 0);
 
-    /* bne,pn %[ix]cc, label0 */
-    *label_ptr = (uint32_t *)s->code_ptr;
-    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1) |
-                  ((TARGET_LONG_BITS == 64) << 21)));
-
-    /* TLB Hit.  Compute the host address into r1.  The ld is in the
-       branch delay slot; harmless for the TLB miss case.  */
-    tcg_out_ld(s, TCG_TYPE_PTR, r1, r1, tlb_ofs+offsetof(CPUTLBEntry, addend));
-
+    /* If the guest address must be zero-extended, do so now.  */
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, r0, addrlo, 0, SHIFT_SRL);
-        tcg_out_arith(s, r1, r0, r1, ARITH_ADD);
-    } else {
-        tcg_out_arith(s, r1, addrlo, r1, ARITH_ADD);
+        return r0;
     }
-
-    return r1;
+    return addrlo;
 }
 #endif /* CONFIG_SOFTMMU */
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int addr, int addend,
-                                   int datalo, int datahi, int sizeop)
-{
+static const int qemu_ld_opc[8] = {
 #ifdef TARGET_WORDS_BIGENDIAN
-    static const int ld_opc[8] = {
-        LDUB, LDUH, LDUW, LDX, LDSB, LDSH, LDSW, LDX
-    };
+    LDUB, LDUH, LDUW, LDX, LDSB, LDSH, LDSW, LDX
 #else
-    static const int ld_opc[8] = {
-        LDUB, LDUH_LE, LDUW_LE, LDX_LE, LDSB, LDSH_LE, LDSW_LE, LDX_LE
-    };
+    LDUB, LDUH_LE, LDUW_LE, LDX_LE, LDSB, LDSH_LE, LDSW_LE, LDX_LE
 #endif
+};
 
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
-        /* Load all 64-bits into an O/G register.  */
-        int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
-        tcg_out_ldst_rr(s, reg64, addr, addend, ld_opc[sizeop]);
-        /* Move the two 32-bit pieces into the destination registers.  */
-        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
-        if (reg64 != datalo) {
-            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
-        }
-    } else {
-        tcg_out_ldst_rr(s, datalo, addr, addend, ld_opc[sizeop]);
-    }
-}
+static const int qemu_st_opc[4] = {
+#ifdef TARGET_WORDS_BIGENDIAN
+    STB, STH, STW, STX
+#else
+    STB, STH_LE, STW_LE, STX_LE
+#endif
+};
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
 {
     int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
@@ -869,7 +842,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif
 
     datahi = datalo = args[0];
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         datahi = args[1];
         addrlo_idx = 2;
     }
@@ -877,27 +850,59 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #if defined(CONFIG_SOFTMMU)
     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
     memi = args[memi_idx];
-    s_bits = opc & 3;
+    s_bits = sizeop & 3;
 
     addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
-                                label_ptr, offsetof(CPUTLBEntry, addr_read));
+                                offsetof(CPUTLBEntry, addr_read));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, addr_reg, TCG_REG_G0, datalo, datahi, opc);
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        int reg64;
 
-    /* b,pt,n label1 */
-    label_ptr[1] = (uint32_t *)s->code_ptr;
-    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
-                  | (1 << 29) | (1 << 19)));
+        /* bne,pn %[xi]cc, label0 */
+        label_ptr[0] = (uint32_t *)s->code_ptr;
+        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_NE, 0) | INSN_OP2(0x1)
+                      | ((TARGET_LONG_BITS == 64) << 21)));
+
+        /* TLB Hit.  */
+        /* Load all 64-bits into an O/G register.  */
+        reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
+        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
+
+        /* Move the two 32-bit pieces into the destination registers.  */
+        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
+        if (reg64 != datalo) {
+            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
+        }
+
+        /* b,pt,n label1 */
+        label_ptr[1] = (uint32_t *)s->code_ptr;
+        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+                      | (1 << 29) | (1 << 19)));
+    } else {
+        /* The fast path is exactly one insn.  Thus we can perform the
+           entire TLB Hit in the (annulled) delay slot of the branch
+           over the TLB Miss case.  */
+
+        /* beq,a,pt %[xi]cc, label0 */
+        label_ptr[0] = NULL;
+        label_ptr[1] = (uint32_t *)s->code_ptr;
+        tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1)
+                      | ((TARGET_LONG_BITS == 64) << 21)
+                      | (1 << 29) | (1 << 19)));
+        /* delay slot */
+        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
+    }
 
     /* TLB Miss.  */
 
-    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[0]);
-    n = 0;
-#ifdef CONFIG_TCG_PASS_AREG0
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
-#endif
+    if (label_ptr[0]) {
+        *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
+                                    (unsigned long)label_ptr[0]);
+    }
+    n = ARG_OFFSET;
+    if (ARG_OFFSET) {
+       tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+    }
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                     args[addrlo_idx + 1]);
@@ -925,7 +930,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 
     n = tcg_target_call_oarg_regs[0];
     /* datalo = sign_extend(arg0) */
-    switch(opc) {
+    switch (sizeop) {
     case 0 | 4:
         /* Recall that SRA sign extends from bit 31 through bit 63.  */
         tcg_out_arithi(s, datalo, n, 24, SHIFT_SLL);
@@ -962,40 +967,35 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_I5;
     }
-    tcg_out_qemu_ld_direct(s, addr_reg,
-                           (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                           datalo, datahi, opc);
-#endif /* CONFIG_SOFTMMU */
-}
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int addr, int addend,
-                                   int datalo, int datahi, int sizeop)
-{
-#ifdef TARGET_WORDS_BIGENDIAN
-    static const int st_opc[4] = { STB, STH, STW, STX };
-#else
-    static const int st_opc[4] = { STB, STH_LE, STW_LE, STX_LE };
-#endif
+        tcg_out_ldst_rr(s, reg64, addr_reg,
+                        (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                        qemu_ld_opc[sizeop]);
 
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
-        tcg_out_arithi(s, TCG_REG_O0, datalo, 0, SHIFT_SRL);
-        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_O0, TCG_REG_O0, TCG_REG_O2, ARITH_OR);
-        datalo = TCG_REG_O0;
+        tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
+        if (reg64 != datalo) {
+            tcg_out_mov(s, TCG_TYPE_I32, datalo, reg64);
+        }
+    } else {
+        tcg_out_ldst_rr(s, datalo, addr_reg,
+                        (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                        qemu_ld_opc[sizeop]);
     }
-    tcg_out_ldst_rr(s, datalo, addr, addend, st_opc[sizeop]);
+#endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
 {
     int addrlo_idx = 1, datalo, datahi, addr_reg;
 #if defined(CONFIG_SOFTMMU)
     int memi_idx, memi, n;
-    uint32_t *label_ptr[2];
+    uint32_t *label_ptr;
 #endif
 
     datahi = datalo = args[0];
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         datahi = args[1];
         addrlo_idx = 2;
     }
@@ -1004,33 +1004,40 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
     memi = args[memi_idx];
 
-    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, opc, args,
-                                label_ptr, offsetof(CPUTLBEntry, addr_write));
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, sizeop, args,
+                                offsetof(CPUTLBEntry, addr_write));
 
-    /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, addr_reg, TCG_REG_G0, datalo, datahi, opc);
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
+        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
+        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
+        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_G1;
+    }
 
-    /* b,pt,n label1 */
-    label_ptr[1] = (uint32_t *)s->code_ptr;
-    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_A, 0) | INSN_OP2(0x1)
+    /* The fast path is exactly one insn.  Thus we can perform the entire
+       TLB Hit in the (annulled) delay slot of the branch over TLB Miss.  */
+    /* beq,a,pt %[xi]cc, label0 */
+    label_ptr = (uint32_t *)s->code_ptr;
+    tcg_out32(s, (INSN_OP(0) | INSN_COND(COND_E, 0) | INSN_OP2(0x1)
+                  | ((TARGET_LONG_BITS == 64) << 21)
                   | (1 << 29) | (1 << 19)));
+    /* delay slot */
+    tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_st_opc[sizeop]);
 
     /* TLB Miss.  */
-
-    *label_ptr[0] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[0]);
-
-    n = 0;
-#ifdef CONFIG_TCG_PASS_AREG0
-    tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[n++], TCG_AREG0);
-#endif
+    n = ARG_OFFSET;
+    if (ARG_OFFSET) {
+         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+    }
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                     args[addrlo_idx + 1]);
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                 args[addrlo_idx]);
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datahi);
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
@@ -1042,7 +1049,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                sizeof(long));
 
     /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
-    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[opc]
+    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[sizeop]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
                          & 0x3fffffff));
     /* delay slot */
@@ -1053,17 +1060,25 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
                sizeof(long));
 
-    *label_ptr[1] |= INSN_OFF19((unsigned long)s->code_ptr -
-                                (unsigned long)label_ptr[1]);
+    *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
+                             (unsigned long)label_ptr);
 #else
     addr_reg = args[addrlo_idx];
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
         tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_I5;
     }
-    tcg_out_qemu_st_direct(s, addr_reg,
-                           (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                           datalo, datahi, opc);
+    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
+        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
+        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
+        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_G1;
+    }
+    tcg_out_ldst_rr(s, datalo, addr_reg,
+                    (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
+                    qemu_st_opc[sizeop]);
 #endif /* CONFIG_SOFTMMU */
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (6 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 07/15] tcg-sparc: Steamline qemu_ld/st more Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-26 16:26   ` Blue Swirl
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 dyngen-exec.h |    5 +++++
 user-exec.c   |   17 ++++++++++++++---
 2 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/dyngen-exec.h b/dyngen-exec.h
index cfeef99..65fcb43 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -19,6 +19,10 @@
 #if !defined(__DYNGEN_EXEC_H__)
 #define __DYNGEN_EXEC_H__
 
+/* If the target has indicated that it does not need an AREG0,
+   don't declare the env variable at all, much less as a register.  */
+#if !defined(CONFIG_TCG_PASS_AREG0)
+
 #if defined(CONFIG_TCG_INTERPRETER)
 /* The TCG interpreter does not need a special register AREG0,
  * but it is possible to use one by defining AREG0.
@@ -65,4 +69,5 @@ register CPUArchState *env asm(AREG0);
 extern CPUArchState *env;
 #endif
 
+#endif /* !CONFIG_TCG_PASS_AREG0 */
 #endif /* !defined(__DYNGEN_EXEC_H__) */
diff --git a/user-exec.c b/user-exec.c
index cd905ff..e326104 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -58,7 +58,9 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
     struct sigcontext *uc = puc;
 #endif
 
+#ifndef CONFIG_TCG_PASS_AREG0
     env = env1;
+#endif
 
     /* XXX: restore cpu registers saved in host registers */
 
@@ -74,8 +76,8 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
         sigprocmask(SIG_SETMASK, &uc->sc_mask, NULL);
 #endif
     }
-    env->exception_index = -1;
-    longjmp(env->jmp_env, 1);
+    env1->exception_index = -1;
+    longjmp(env1->jmp_env, 1);
 }
 
 /* 'pc' is the host PC at which the exception was raised. 'address' is
@@ -89,9 +91,18 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
     TranslationBlock *tb;
     int ret;
 
+    /* XXX: find a correct solution for multithread */
+#ifdef CONFIG_TCG_PASS_AREG0
+    /* ??? While we no longer have a global env register, if PC is within
+       the code_gen_buffer then we know that env is within a known register
+       there, and we could have the signal handler extract that value.  */
+    CPUArchState *env = cpu_single_env;
+#else
     if (cpu_single_env) {
-        env = cpu_single_env; /* XXX: find a correct solution for multithread */
+        env = cpu_single_env;
     }
+#endif
+
 #if defined(DEBUG_SIGNAL)
     qemu_printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n",
                 pc, address, is_write, *(unsigned long *)old_set);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (7 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-26 16:31   ` Blue Swirl
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 10/15] tcg-sparc: Change AREG0 in generated code to %i0 Richard Henderson
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 dyngen-exec.h |   20 +++++++++++---------
 exec.c        |   16 ++++++++++++++--
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/dyngen-exec.h b/dyngen-exec.h
index 65fcb43..d673f9f 100644
--- a/dyngen-exec.h
+++ b/dyngen-exec.h
@@ -41,13 +41,8 @@
 #elif defined(__mips__)
 #define AREG0 "s0"
 #elif defined(__sparc__)
-#ifdef CONFIG_SOLARIS
-#define AREG0 "g2"
-#elif HOST_LONG_BITS == 64
-#define AREG0 "g5"
-#else
-#define AREG0 "g6"
-#endif
+/* Don't use a global register.  Working around glibc clobbering these
+   global registers is more trouble than just using TLS.  */
 #elif defined(__s390__)
 #define AREG0 "r10"
 #elif defined(__alpha__)
@@ -62,12 +57,19 @@
 #error unsupported CPU
 #endif
 
-#if defined(AREG0)
+#ifdef AREG0
 register CPUArchState *env asm(AREG0);
 #else
-/* TODO: Try env = cpu_single_env. */
+/* It's tempting to #define env cpu_single_cpu, but that runs afoul of
+   the other macro usage in target-foo/helper.h.  Instead use an alias.
+   That has to happen where cpu_single_cpu is defined, so just a
+   declaration here.  */
+#ifdef __linux__
+extern __thread CPUArchState *env;
+#else
 extern CPUArchState *env;
 #endif
+#endif /* AREG0 */
 
 #endif /* !CONFIG_TCG_PASS_AREG0 */
 #endif /* !defined(__DYNGEN_EXEC_H__) */
diff --git a/exec.c b/exec.c
index 6731ab8..d84caa5 100644
--- a/exec.c
+++ b/exec.c
@@ -124,9 +124,21 @@ static MemoryRegion io_mem_subpage_ram;
 #endif
 
 CPUArchState *first_cpu;
-/* current CPU in the current thread. It is only valid inside
-   cpu_exec() */
+
+/* Current CPU in the current thread. It is only valid inside cpu_exec().  */
 DEFINE_TLS(CPUArchState *,cpu_single_env);
+
+/* In dyngen-exec.h, without AREG0, we fall back to an alias to cpu_single_env.
+   We can't actually tell from here whether that's needed or not, but it does
+   not hurt to go ahead and make the declaration.  */
+#ifndef CONFIG_TCG_PASS_AREG0
+extern
+#ifdef __linux__
+  __thread
+#endif
+  CPUArchState *env __attribute__((alias("tls__cpu_single_env")));
+#endif /* CONFIG_TCG_PASS_AREG0 */
+
 /* 0 = Do not count executed instructions.
    1 = Precise instruction counting.
    2 = Adaptive rate instruction counting.  */
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 10/15] tcg-sparc: Change AREG0 in generated code to %i0.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (8 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 11/15] tcg-sparc: Clean up cruft stemming from attempts to use global registers Richard Henderson
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |    3 ++-
 tcg/sparc/tcg-target.h |    9 +--------
 2 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index d45114f..dc36840 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -705,7 +705,8 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
     tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I1) |
               INSN_RS2(TCG_REG_G0));
-    tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_I0);
+    /* delay slot */
+    tcg_out_nop(s);
 }
 
 #if defined(CONFIG_SOFTMMU)
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index e69dfc8..31b98e2 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -128,14 +128,7 @@ typedef enum {
 
 #define TCG_TARGET_HAS_GUEST_BASE
 
-/* Note: must be synced with dyngen-exec.h */
-#ifdef CONFIG_SOLARIS
-#define TCG_AREG0 TCG_REG_G2
-#elif HOST_LONG_BITS == 64
-#define TCG_AREG0 TCG_REG_G5
-#else
-#define TCG_AREG0 TCG_REG_G6
-#endif
+#define TCG_AREG0 TCG_REG_I0
 
 static inline void flush_icache_range(tcg_target_ulong start,
                                       tcg_target_ulong stop)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 11/15] tcg-sparc: Clean up cruft stemming from attempts to use global registers.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (9 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 10/15] tcg-sparc: Change AREG0 in generated code to %i0 Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 12/15] tcg-sparc: Mask shift immediates to avoid illegal insns Richard Henderson
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

Don't use -ffixed-gN.  Don't link statically.  Don't save/restore
AREG0 around calls.  Don't allocate space on the stack for AREG0 save.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 configure              |   12 ----------
 tcg/sparc/tcg-target.c |   57 ++++++++++++++++--------------------------------
 tcg/sparc/tcg-target.h |   18 ++++++---------
 3 files changed, 26 insertions(+), 61 deletions(-)

diff --git a/configure b/configure
index a79a090..4ae70c0 100755
--- a/configure
+++ b/configure
@@ -815,19 +815,11 @@ case "$cpu" in
     sparc)
            LDFLAGS="-m32 $LDFLAGS"
            QEMU_CFLAGS="-m32 -mcpu=ultrasparc $QEMU_CFLAGS"
-           QEMU_CFLAGS="-ffixed-g2 -ffixed-g3 $QEMU_CFLAGS"
-           if test "$solaris" = "no" ; then
-             QEMU_CFLAGS="-ffixed-g1 -ffixed-g6 $QEMU_CFLAGS"
-           fi
            host_guest_base="yes"
            ;;
     sparc64)
            LDFLAGS="-m64 $LDFLAGS"
            QEMU_CFLAGS="-m64 -mcpu=ultrasparc $QEMU_CFLAGS"
-           QEMU_CFLAGS="-ffixed-g5 -ffixed-g6 -ffixed-g7 $QEMU_CFLAGS"
-           if test "$solaris" != "no" ; then
-             QEMU_CFLAGS="-ffixed-g1 $QEMU_CFLAGS"
-           fi
            host_guest_base="yes"
            ;;
     s390)
@@ -3817,10 +3809,6 @@ fi
 
 if test "$target_linux_user" = "yes" -o "$target_bsd_user" = "yes" ; then
   case "$ARCH" in
-  sparc)
-    # -static is used to avoid g1/g3 usage by the dynamic linker
-    ldflags="$linker_script -static $ldflags"
-    ;;
   alpha | s390x)
     # The default placement of the application is fine.
     ;;
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index dc36840..c1d5ab1 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -167,9 +167,6 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O1);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_O2);
-#ifdef CONFIG_TCG_PASS_AREG0
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_O3);
-#endif
         break;
     case 'I':
         ct->ct |= TCG_CT_CONST_S11;
@@ -690,11 +687,22 @@ static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
 /* Generate global QEMU prologue and epilogue code */
 static void tcg_target_qemu_prologue(TCGContext *s)
 {
-    tcg_set_frame(s, TCG_REG_I6, TCG_TARGET_CALL_STACK_OFFSET,
-                  CPU_TEMP_BUF_NLONGS * (int)sizeof(long));
+    int tmp_buf_size, frame_size;
+
+    /* The TCG temp buffer is at the top of the frame, immediately
+       below the frame pointer.  */
+    tmp_buf_size = CPU_TEMP_BUF_NLONGS * (int)sizeof(long);
+    tcg_set_frame(s, TCG_REG_I6, TCG_TARGET_STACK_BIAS - tmp_buf_size,
+                  tmp_buf_size);
+
+    /* TCG_TARGET_CALL_STACK_OFFSET includes the stack bias, but is
+       otherwise the minimal frame usable by callees.  */
+    frame_size = TCG_TARGET_CALL_STACK_OFFSET - TCG_TARGET_STACK_BIAS;
+    frame_size += TCG_STATIC_CALL_ARGS_SIZE + tmp_buf_size;
+    frame_size += TCG_TARGET_STACK_ALIGN - 1;
+    frame_size &= -TCG_TARGET_STACK_ALIGN;
     tcg_out32(s, SAVE | INSN_RD(TCG_REG_O6) | INSN_RS1(TCG_REG_O6) |
-              INSN_IMM13(-(TCG_TARGET_STACK_MINFRAME +
-                           CPU_TEMP_BUF_NLONGS * (int)sizeof(long))));
+              INSN_IMM13(-frame_size));
 
 #ifdef CONFIG_USE_GUEST_BASE
     if (GUEST_BASE != 0) {
@@ -707,6 +715,8 @@ static void tcg_target_qemu_prologue(TCGContext *s)
               INSN_RS2(TCG_REG_G0));
     /* delay slot */
     tcg_out_nop(s);
+
+    /* No epilogue required.  We issue ret + restore directly in the TB.  */
 }
 
 #if defined(CONFIG_SOFTMMU)
@@ -911,12 +921,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                 args[addrlo_idx]);
 
-    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-       global registers */
-    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     /* qemu_ld_helper[s_bits](arg0, arg1) */
     tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_ld_helpers[s_bits]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
@@ -924,11 +928,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[n], memi);
 
-    /* Reload AREG0.  */
-    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     n = tcg_target_call_oarg_regs[0];
     /* datalo = sign_extend(arg0) */
     switch (sizeop) {
@@ -1043,12 +1042,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
 
-    /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-       global registers */
-    tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
     tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[sizeop]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
@@ -1056,11 +1049,6 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     /* delay slot */
     tcg_out_movi(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n], memi);
 
-    /* Reload AREG0.  */
-    tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-               TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-               sizeof(long));
-
     *label_ptr |= INSN_OFF19((unsigned long)s->code_ptr -
                              (unsigned long)label_ptr);
 #else
@@ -1123,15 +1111,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
             tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_I5) |
                       INSN_RS2(TCG_REG_G0));
         }
-        /* Store AREG0 in stack to avoid ugly glibc bugs that mangle
-           global registers */
-        // delay slot
-        tcg_out_st(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                   sizeof(long));
-        tcg_out_ld(s, TCG_TYPE_REG, TCG_AREG0, TCG_REG_CALL_STACK,
-                   TCG_TARGET_CALL_STACK_OFFSET - TCG_STATIC_CALL_ARGS_SIZE -
-                   sizeof(long));
+        /* delay slot */
+        tcg_out_nop(s);
         break;
     case INDEX_op_jmp:
     case INDEX_op_br:
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 31b98e2..b7afa7b 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -66,20 +66,16 @@ typedef enum {
 #define TCG_CT_CONST_S13 0x200
 
 /* used for function call generation */
-#define TCG_REG_CALL_STACK TCG_REG_I6
+#define TCG_REG_CALL_STACK TCG_REG_O6
 
 #if TCG_TARGET_REG_BITS == 64
-// Reserve space for AREG0
-#define TCG_TARGET_STACK_MINFRAME (176 + 4 * (int)sizeof(long) + \
-                                   TCG_STATIC_CALL_ARGS_SIZE)
-#define TCG_TARGET_CALL_STACK_OFFSET (2047 - 16)
-#define TCG_TARGET_STACK_ALIGN 16
+#define TCG_TARGET_STACK_BIAS           2047
+#define TCG_TARGET_STACK_ALIGN          16
+#define TCG_TARGET_CALL_STACK_OFFSET    (128 + 6*8 + TCG_TARGET_STACK_BIAS)
 #else
-// AREG0 + one word for alignment
-#define TCG_TARGET_STACK_MINFRAME (92 + (2 + 1) * (int)sizeof(long) + \
-                                   TCG_STATIC_CALL_ARGS_SIZE)
-#define TCG_TARGET_CALL_STACK_OFFSET TCG_TARGET_STACK_MINFRAME
-#define TCG_TARGET_STACK_ALIGN 8
+#define TCG_TARGET_STACK_BIAS           0
+#define TCG_TARGET_STACK_ALIGN          8
+#define TCG_TARGET_CALL_STACK_OFFSET    (64 + 4 + 6*4)
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 12/15] tcg-sparc: Mask shift immediates to avoid illegal insns.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (10 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 11/15] tcg-sparc: Clean up cruft stemming from attempts to use global registers Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 13/15] tcg-sparc: Use defines for temporaries Richard Henderson
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

The xtensa-test image generates a sra_i32 with count 0x40.
Whether this is accident of tcg constant propagation or
originating directly from the instruction stream is immaterial.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |   18 ++++++++++++------
 1 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index c1d5ab1..181ba26 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1184,13 +1184,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         goto gen_arith;
     case INDEX_op_shl_i32:
         c = SHIFT_SLL;
-        goto gen_arith;
+    do_shift32:
+        /* Limit immediate shift count lest we create an illegal insn.  */
+        tcg_out_arithc(s, args[0], args[1], args[2] & 31, const_args[2], c);
+        break;
     case INDEX_op_shr_i32:
         c = SHIFT_SRL;
-        goto gen_arith;
+        goto do_shift32;
     case INDEX_op_sar_i32:
         c = SHIFT_SRA;
-        goto gen_arith;
+        goto do_shift32;
     case INDEX_op_mul_i32:
         c = ARITH_UMUL;
         goto gen_arith;
@@ -1311,13 +1314,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
     case INDEX_op_shl_i64:
         c = SHIFT_SLLX;
-        goto gen_arith;
+    do_shift64:
+        /* Limit immediate shift count lest we create an illegal insn.  */
+        tcg_out_arithc(s, args[0], args[1], args[2] & 63, const_args[2], c);
+        break;
     case INDEX_op_shr_i64:
         c = SHIFT_SRLX;
-        goto gen_arith;
+        goto do_shift64;
     case INDEX_op_sar_i64:
         c = SHIFT_SRAX;
-        goto gen_arith;
+        goto do_shift64;
     case INDEX_op_mul_i64:
         c = ARITH_MULX;
         goto gen_arith;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 13/15] tcg-sparc: Use defines for temporaries.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (11 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 12/15] tcg-sparc: Mask shift immediates to avoid illegal insns Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-26 16:38   ` Blue Swirl
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 14/15] tcg-sparc: Add %g/%o registers to alloc_order Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 15/15] tcg-sparc: Fix and enable direct TB chaining Richard Henderson
  14 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

And change from %i4 to %g1 to remove a v8plus fixme.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |  110 ++++++++++++++++++++++++-----------------------
 1 files changed, 56 insertions(+), 54 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 181ba26..896fab1 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -59,8 +59,11 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif
 
+#define TCG_REG_TMP  TCG_REG_G1
+#define TCG_REG_TMP2 TCG_REG_I5
+
 #ifdef CONFIG_USE_GUEST_BASE
-# define TCG_GUEST_BASE_REG TCG_REG_I3
+# define TCG_GUEST_BASE_REG TCG_REG_I4
 #else
 # define TCG_GUEST_BASE_REG TCG_REG_G0
 #endif
@@ -372,10 +375,10 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
         tcg_out_sethi(s, ret, ~arg);
         tcg_out_arithi(s, ret, ret, (arg & 0x3ff) | -0x400, ARITH_XOR);
     } else {
-        tcg_out_movi_imm32(s, TCG_REG_I4, arg >> (TCG_TARGET_REG_BITS / 2));
-        tcg_out_arithi(s, TCG_REG_I4, TCG_REG_I4, 32, SHIFT_SLLX);
-        tcg_out_movi_imm32(s, ret, arg);
-        tcg_out_arith(s, ret, ret, TCG_REG_I4, ARITH_OR);
+        tcg_out_movi_imm32(s, ret, arg >> (TCG_TARGET_REG_BITS / 2));
+        tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
+        tcg_out_movi_imm32(s, TCG_REG_TMP2, arg);
+        tcg_out_arith(s, ret, ret, TCG_REG_TMP2, ARITH_OR);
     }
 }
 
@@ -392,8 +395,8 @@ static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
                   INSN_IMM13(offset));
     } else {
-        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
-        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, offset);
+        tcg_out_ldst_rr(s, ret, addr, TCG_REG_TMP, op);
     }
 }
 
@@ -435,8 +438,8 @@ static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
         if (check_fit_tl(val, 13))
             tcg_out_arithi(s, reg, reg, val, ARITH_ADD);
         else {
-            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, val);
-            tcg_out_arith(s, reg, reg, TCG_REG_I5, ARITH_ADD);
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, val);
+            tcg_out_arith(s, reg, reg, TCG_REG_TMP, ARITH_ADD);
         }
     }
 }
@@ -448,8 +451,8 @@ static inline void tcg_out_andi(TCGContext *s, int rd, int rs,
         if (check_fit_tl(val, 13))
             tcg_out_arithi(s, rd, rs, val, ARITH_AND);
         else {
-            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, val);
-            tcg_out_arith(s, rd, rs, TCG_REG_I5, ARITH_AND);
+            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, val);
+            tcg_out_arith(s, rd, rs, TCG_REG_TMP, ARITH_AND);
         }
     }
 }
@@ -461,8 +464,8 @@ static void tcg_out_div32(TCGContext *s, int rd, int rs1,
     if (uns) {
         tcg_out_sety(s, TCG_REG_G0);
     } else {
-        tcg_out_arithi(s, TCG_REG_I5, rs1, 31, SHIFT_SRA);
-        tcg_out_sety(s, TCG_REG_I5);
+        tcg_out_arithi(s, TCG_REG_TMP, rs1, 31, SHIFT_SRA);
+        tcg_out_sety(s, TCG_REG_TMP);
     }
 
     tcg_out_arithc(s, rd, rs1, val2, val2const,
@@ -608,8 +611,8 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
     case TCG_COND_GTU:
     case TCG_COND_GEU:
         if (c2const && c2 != 0) {
-            tcg_out_movi_imm13(s, TCG_REG_I5, c2);
-            c2 = TCG_REG_I5;
+            tcg_out_movi_imm13(s, TCG_REG_TMP, c2);
+            c2 = TCG_REG_TMP;
         }
         t = c1, c1 = c2, c2 = t, c2const = 0;
         cond = tcg_swap_cond(cond);
@@ -656,15 +659,15 @@ static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
 
     switch (cond) {
     case TCG_COND_EQ:
-        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_I5, al, bl, blconst);
+        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_TMP, al, bl, blconst);
         tcg_out_setcond_i32(s, TCG_COND_EQ, ret, ah, bh, bhconst);
-        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_AND);
+        tcg_out_arith(s, ret, ret, TCG_REG_TMP, ARITH_AND);
         break;
 
     case TCG_COND_NE:
-        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_I5, al, al, blconst);
+        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_TMP, al, al, blconst);
         tcg_out_setcond_i32(s, TCG_COND_NE, ret, ah, bh, bhconst);
-        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_OR);
+        tcg_out_arith(s, ret, ret, TCG_REG_TMP, ARITH_OR);
         break;
 
     default:
@@ -964,8 +967,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
 #else
     addr_reg = args[addrlo_idx];
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
-        addr_reg = TCG_REG_I5;
+        tcg_out_arithi(s, TCG_REG_TMP, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_TMP;
     }
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
@@ -1008,12 +1011,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
                                 offsetof(CPUTLBEntry, addr_write));
 
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
-        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
-        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
-        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        /* Reconstruct the full 64-bit value.  */
+        tcg_out_arithi(s, TCG_REG_TMP, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
-        datalo = TCG_REG_G1;
+        tcg_out_arith(s, TCG_REG_O2, TCG_REG_TMP, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_O2;
     }
 
     /* The fast path is exactly one insn.  Thus we can perform the entire
@@ -1054,16 +1056,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
 #else
     addr_reg = args[addrlo_idx];
     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
-        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
-        addr_reg = TCG_REG_I5;
+        tcg_out_arithi(s, TCG_REG_TMP, addr_reg, 0, SHIFT_SRL);
+        addr_reg = TCG_REG_TMP;
     }
     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
-        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
-        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
-        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
+        tcg_out_arithi(s, TCG_REG_TMP, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
-        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
-        datalo = TCG_REG_G1;
+        tcg_out_arith(s, TCG_REG_O2, TCG_REG_TMP, TCG_REG_O2, ARITH_OR);
+        datalo = TCG_REG_O2;
     }
     tcg_out_ldst_rr(s, datalo, addr_reg,
                     (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
@@ -1087,14 +1087,14 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
             /* direct jump method */
-            tcg_out_sethi(s, TCG_REG_I5, args[0] & 0xffffe000);
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
+            tcg_out_sethi(s, TCG_REG_TMP, args[0] & 0xffffe000);
+            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_TMP) |
                       INSN_IMM13((args[0] & 0x1fff)));
             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
         } else {
             /* indirect jump method */
-            tcg_out_ld_ptr(s, TCG_REG_I5, (tcg_target_long)(s->tb_next + args[0]));
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
+            tcg_out_ld_ptr(s, TCG_REG_TMP, (tcg_target_long)(s->tb_next + args[0]));
+            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_TMP) |
                       INSN_RS2(TCG_REG_G0));
         }
         tcg_out_nop(s);
@@ -1106,9 +1106,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
                                    - (tcg_target_ulong)s->code_ptr) >> 2)
                                  & 0x3fffffff));
         else {
-            tcg_out_ld_ptr(s, TCG_REG_I5,
+            tcg_out_ld_ptr(s, TCG_REG_TMP,
                            (tcg_target_long)(s->tb_next + args[0]));
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_I5) |
+            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_TMP) |
                       INSN_RS2(TCG_REG_G0));
         }
         /* delay slot */
@@ -1214,11 +1214,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
 
     case INDEX_op_rem_i32:
     case INDEX_op_remu_i32:
-        tcg_out_div32(s, TCG_REG_I5, args[1], args[2], const_args[2],
+        tcg_out_div32(s, TCG_REG_TMP, args[1], args[2], const_args[2],
                       opc == INDEX_op_remu_i32);
-        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
+        tcg_out_arithc(s, TCG_REG_TMP, TCG_REG_TMP, args[2], const_args[2],
                        ARITH_UMUL);
-        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
+        tcg_out_arith(s, args[0], args[1], TCG_REG_TMP, ARITH_SUB);
         break;
 
     case INDEX_op_brcond_i32:
@@ -1335,11 +1335,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         goto gen_arith;
     case INDEX_op_rem_i64:
     case INDEX_op_remu_i64:
-        tcg_out_arithc(s, TCG_REG_I5, args[1], args[2], const_args[2],
+        tcg_out_arithc(s, TCG_REG_TMP, args[1], args[2], const_args[2],
                        opc == INDEX_op_rem_i64 ? ARITH_SDIVX : ARITH_UDIVX);
-        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
+        tcg_out_arithc(s, TCG_REG_TMP, TCG_REG_TMP, args[2], const_args[2],
                        ARITH_MULX);
-        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
+        tcg_out_arith(s, args[0], args[1], TCG_REG_TMP, ARITH_SUB);
         break;
     case INDEX_op_ext32s_i64:
         if (const_args[1]) {
@@ -1537,15 +1537,17 @@ static void tcg_target_init(TCGContext *s)
                      (1 << TCG_REG_O7));
 
     tcg_regset_clear(s->reserved_regs);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0);
-#if TCG_TARGET_REG_BITS == 64
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I4); // for internal use
-#endif
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I5); // for internal use
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6);
-    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O7);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0); // zero
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G6); // reserved for os
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G7); // thread pointer
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6); // frame pointer
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7); // return address
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6); // stack pointer
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP); // for internal use
+    if (TCG_TARGET_REG_BITS == 64) {
+        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP2); // for internal use
+    }
+
     tcg_add_target_add_op_defs(sparc_op_defs);
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 14/15] tcg-sparc: Add %g/%o registers to alloc_order
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (12 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 13/15] tcg-sparc: Use defines for temporaries Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 15/15] tcg-sparc: Fix and enable direct TB chaining Richard Henderson
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c |   14 ++++++++++++++
 1 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 896fab1..ce7c44e 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -83,11 +83,25 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_L5,
     TCG_REG_L6,
     TCG_REG_L7,
+
     TCG_REG_I0,
     TCG_REG_I1,
     TCG_REG_I2,
     TCG_REG_I3,
     TCG_REG_I4,
+
+    TCG_REG_G2,
+    TCG_REG_G3,
+    TCG_REG_G4,
+    TCG_REG_G5,
+
+    TCG_REG_O0,
+    TCG_REG_O1,
+    TCG_REG_O2,
+    TCG_REG_O3,
+    TCG_REG_O4,
+    TCG_REG_O5,
+    TCG_REG_O7,
 };
 
 static const int tcg_target_call_iarg_regs[6] = {
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 15/15] tcg-sparc: Fix and enable direct TB chaining.
  2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
                   ` (13 preceding siblings ...)
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 14/15] tcg-sparc: Add %g/%o registers to alloc_order Richard Henderson
@ 2012-03-25 22:27 ` Richard Henderson
  14 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-25 22:27 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl


Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 exec-all.h             |    9 ++++++---
 tcg/sparc/tcg-target.c |   19 ++++++++++++++++---
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/exec-all.h b/exec-all.h
index 93a5b22..f7d4708 100644
--- a/exec-all.h
+++ b/exec-all.h
@@ -120,9 +120,10 @@ void tlb_set_page(CPUArchState *env, target_ulong vaddr,
 #define CODE_GEN_AVG_BLOCK_SIZE 64
 #endif
 
-#if defined(_ARCH_PPC) || defined(__x86_64__) || defined(__arm__) || defined(__i386__)
-#define USE_DIRECT_JUMP
-#elif defined(CONFIG_TCG_INTERPRETER)
+#if defined(__arm__) || defined(_ARCH_PPC) \
+    || defined(__x86_64__) || defined(__i386__) \
+    || defined(__sparc__) \
+    || defined(CONFIG_TCG_INTERPRETER)
 #define USE_DIRECT_JUMP
 #endif
 
@@ -232,6 +233,8 @@ static inline void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr
     __asm __volatile__ ("swi 0x9f0002" : : "r" (_beg), "r" (_end), "r" (_flg));
 #endif
 }
+#elif defined(__sparc__)
+extern void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr);
 #else
 #error tb_set_jmp_target1 is missing
 #endif
diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index ce7c44e..2a09e23 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -1101,10 +1101,8 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
     case INDEX_op_goto_tb:
         if (s->tb_jmp_offset) {
             /* direct jump method */
-            tcg_out_sethi(s, TCG_REG_TMP, args[0] & 0xffffe000);
-            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_TMP) |
-                      INSN_IMM13((args[0] & 0x1fff)));
             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
+            tcg_out32(s, CALL | (8 >> 2));
         } else {
             /* indirect jump method */
             tcg_out_ld_ptr(s, TCG_REG_TMP, (tcg_target_long)(s->tb_next + args[0]));
@@ -1627,3 +1625,18 @@ void tcg_register_jit(void *buf, size_t buf_size)
 
     tcg_register_jit_int(buf, buf_size, &debug_frame, sizeof(debug_frame));
 }
+
+void tb_set_jmp_target1(unsigned long jmp_addr, unsigned long addr)
+{
+    uint32_t *ptr = (uint32_t *)jmp_addr;
+    tcg_target_long disp = (tcg_target_long)(addr - jmp_addr) >> 2;
+
+    /* We can reach the entire address space for 32-bit.  For 64-bit
+       the code_gen_buffer can't be larger than 2GB.  */
+    if (TCG_TARGET_REG_BITS == 64 && !check_fit_tl(disp, 30)) {
+        abort();
+    }
+
+    *ptr = CALL | (disp & 0x3fffffff);
+    flush_icache_range(jmp_addr, jmp_addr + 4);
+}
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
@ 2012-03-26 16:26   ` Blue Swirl
  2012-03-26 16:31     ` Richard Henderson
  0 siblings, 1 reply; 22+ messages in thread
From: Blue Swirl @ 2012-03-26 16:26 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sun, Mar 25, 2012 at 22:27, Richard Henderson <rth@twiddle.net> wrote:
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  dyngen-exec.h |    5 +++++
>  user-exec.c   |   17 ++++++++++++++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/dyngen-exec.h b/dyngen-exec.h
> index cfeef99..65fcb43 100644
> --- a/dyngen-exec.h
> +++ b/dyngen-exec.h
> @@ -19,6 +19,10 @@
>  #if !defined(__DYNGEN_EXEC_H__)
>  #define __DYNGEN_EXEC_H__
>
> +/* If the target has indicated that it does not need an AREG0,
> +   don't declare the env variable at all, much less as a register.  */
> +#if !defined(CONFIG_TCG_PASS_AREG0)
> +
>  #if defined(CONFIG_TCG_INTERPRETER)
>  /* The TCG interpreter does not need a special register AREG0,
>  * but it is possible to use one by defining AREG0.
> @@ -65,4 +69,5 @@ register CPUArchState *env asm(AREG0);
>  extern CPUArchState *env;
>  #endif
>
> +#endif /* !CONFIG_TCG_PASS_AREG0 */
>  #endif /* !defined(__DYNGEN_EXEC_H__) */
> diff --git a/user-exec.c b/user-exec.c
> index cd905ff..e326104 100644
> --- a/user-exec.c
> +++ b/user-exec.c
> @@ -58,7 +58,9 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
>     struct sigcontext *uc = puc;
>  #endif
>
> +#ifndef CONFIG_TCG_PASS_AREG0
>     env = env1;
> +#endif

Shouldn't longjmp() restore global registers as well? Actually, we
return to cpu-exec.c which does not use global env. Isn't this
useless?

>
>     /* XXX: restore cpu registers saved in host registers */
>
> @@ -74,8 +76,8 @@ void cpu_resume_from_signal(CPUArchState *env1, void *puc)
>         sigprocmask(SIG_SETMASK, &uc->sc_mask, NULL);
>  #endif
>     }
> -    env->exception_index = -1;
> -    longjmp(env->jmp_env, 1);
> +    env1->exception_index = -1;
> +    longjmp(env1->jmp_env, 1);
>  }
>
>  /* 'pc' is the host PC at which the exception was raised. 'address' is
> @@ -89,9 +91,18 @@ static inline int handle_cpu_signal(unsigned long pc, unsigned long address,
>     TranslationBlock *tb;
>     int ret;
>
> +    /* XXX: find a correct solution for multithread */
> +#ifdef CONFIG_TCG_PASS_AREG0
> +    /* ??? While we no longer have a global env register, if PC is within
> +       the code_gen_buffer then we know that env is within a known register
> +       there, and we could have the signal handler extract that value.  */
> +    CPUArchState *env = cpu_single_env;

This just makes env a useless variable. The original code was trying
to restore the global variable, but the functions called later do not
use global env.

I'd change user-exec.c to work without global env use.

> +#else
>     if (cpu_single_env) {
> -        env = cpu_single_env; /* XXX: find a correct solution for multithread */
> +        env = cpu_single_env;
>     }
> +#endif
> +
>  #if defined(DEBUG_SIGNAL)
>     qemu_printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n",
>                 pc, address, is_write, *(unsigned long *)old_set);
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0.
  2012-03-26 16:26   ` Blue Swirl
@ 2012-03-26 16:31     ` Richard Henderson
  0 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2012-03-26 16:31 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 03/26/12 09:26, Blue Swirl wrote:
>> +#ifndef CONFIG_TCG_PASS_AREG0
>>     env = env1;
>> +#endif
> 
> Shouldn't longjmp() restore global registers as well? Actually, we
> return to cpu-exec.c which does not use global env. Isn't this
> useless?

Possibly.  I didn't think to try to actually remove these uses,
just get the code to compile without env being declared.

> I'd change user-exec.c to work without global env use.

I'll give it a shot...


r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0.
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
@ 2012-03-26 16:31   ` Blue Swirl
  2012-03-26 16:52     ` Richard Henderson
  0 siblings, 1 reply; 22+ messages in thread
From: Blue Swirl @ 2012-03-26 16:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sun, Mar 25, 2012 at 22:27, Richard Henderson <rth@twiddle.net> wrote:
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  dyngen-exec.h |   20 +++++++++++---------
>  exec.c        |   16 ++++++++++++++--
>  2 files changed, 25 insertions(+), 11 deletions(-)
>
> diff --git a/dyngen-exec.h b/dyngen-exec.h
> index 65fcb43..d673f9f 100644
> --- a/dyngen-exec.h
> +++ b/dyngen-exec.h
> @@ -41,13 +41,8 @@
>  #elif defined(__mips__)
>  #define AREG0 "s0"
>  #elif defined(__sparc__)
> -#ifdef CONFIG_SOLARIS
> -#define AREG0 "g2"
> -#elif HOST_LONG_BITS == 64
> -#define AREG0 "g5"
> -#else
> -#define AREG0 "g6"
> -#endif
> +/* Don't use a global register.  Working around glibc clobbering these
> +   global registers is more trouble than just using TLS.  */
>  #elif defined(__s390__)
>  #define AREG0 "r10"
>  #elif defined(__alpha__)
> @@ -62,12 +57,19 @@
>  #error unsupported CPU
>  #endif
>
> -#if defined(AREG0)
> +#ifdef AREG0
>  register CPUArchState *env asm(AREG0);
>  #else
> -/* TODO: Try env = cpu_single_env. */
> +/* It's tempting to #define env cpu_single_cpu, but that runs afoul of
> +   the other macro usage in target-foo/helper.h.  Instead use an alias.
> +   That has to happen where cpu_single_cpu is defined, so just a
> +   declaration here.  */
> +#ifdef __linux__
> +extern __thread CPUArchState *env;
> +#else
>  extern CPUArchState *env;
>  #endif
> +#endif /* AREG0 */
>
>  #endif /* !CONFIG_TCG_PASS_AREG0 */
>  #endif /* !defined(__DYNGEN_EXEC_H__) */
> diff --git a/exec.c b/exec.c
> index 6731ab8..d84caa5 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -124,9 +124,21 @@ static MemoryRegion io_mem_subpage_ram;
>  #endif
>
>  CPUArchState *first_cpu;
> -/* current CPU in the current thread. It is only valid inside
> -   cpu_exec() */
> +
> +/* Current CPU in the current thread. It is only valid inside cpu_exec().  */
>  DEFINE_TLS(CPUArchState *,cpu_single_env);
> +
> +/* In dyngen-exec.h, without AREG0, we fall back to an alias to cpu_single_env.
> +   We can't actually tell from here whether that's needed or not, but it does
> +   not hurt to go ahead and make the declaration.  */
> +#ifndef CONFIG_TCG_PASS_AREG0
> +extern
> +#ifdef __linux__
> +  __thread
> +#endif
> +  CPUArchState *env __attribute__((alias("tls__cpu_single_env")));
> +#endif /* CONFIG_TCG_PASS_AREG0 */

Please use DECLARE_TLS/DEFINE_TLS and global env accesses should also
use tls_var().

> +
>  /* 0 = Do not count executed instructions.
>    1 = Precise instruction counting.
>    2 = Adaptive rate instruction counting.  */
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 13/15] tcg-sparc: Use defines for temporaries.
  2012-03-25 22:27 ` [Qemu-devel] [PATCH 13/15] tcg-sparc: Use defines for temporaries Richard Henderson
@ 2012-03-26 16:38   ` Blue Swirl
  0 siblings, 0 replies; 22+ messages in thread
From: Blue Swirl @ 2012-03-26 16:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Sun, Mar 25, 2012 at 22:27, Richard Henderson <rth@twiddle.net> wrote:
> And change from %i4 to %g1 to remove a v8plus fixme.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/sparc/tcg-target.c |  110 ++++++++++++++++++++++++-----------------------
>  1 files changed, 56 insertions(+), 54 deletions(-)
>
> diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
> index 181ba26..896fab1 100644
> --- a/tcg/sparc/tcg-target.c
> +++ b/tcg/sparc/tcg-target.c
> @@ -59,8 +59,11 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>  };
>  #endif
>
> +#define TCG_REG_TMP  TCG_REG_G1
> +#define TCG_REG_TMP2 TCG_REG_I5
> +
>  #ifdef CONFIG_USE_GUEST_BASE
> -# define TCG_GUEST_BASE_REG TCG_REG_I3
> +# define TCG_GUEST_BASE_REG TCG_REG_I4
>  #else
>  # define TCG_GUEST_BASE_REG TCG_REG_G0
>  #endif
> @@ -372,10 +375,10 @@ static inline void tcg_out_movi(TCGContext *s, TCGType type,
>         tcg_out_sethi(s, ret, ~arg);
>         tcg_out_arithi(s, ret, ret, (arg & 0x3ff) | -0x400, ARITH_XOR);
>     } else {
> -        tcg_out_movi_imm32(s, TCG_REG_I4, arg >> (TCG_TARGET_REG_BITS / 2));
> -        tcg_out_arithi(s, TCG_REG_I4, TCG_REG_I4, 32, SHIFT_SLLX);
> -        tcg_out_movi_imm32(s, ret, arg);
> -        tcg_out_arith(s, ret, ret, TCG_REG_I4, ARITH_OR);
> +        tcg_out_movi_imm32(s, ret, arg >> (TCG_TARGET_REG_BITS / 2));
> +        tcg_out_arithi(s, ret, ret, 32, SHIFT_SLLX);
> +        tcg_out_movi_imm32(s, TCG_REG_TMP2, arg);
> +        tcg_out_arith(s, ret, ret, TCG_REG_TMP2, ARITH_OR);
>     }
>  }
>
> @@ -392,8 +395,8 @@ static inline void tcg_out_ldst(TCGContext *s, int ret, int addr,
>         tcg_out32(s, op | INSN_RD(ret) | INSN_RS1(addr) |
>                   INSN_IMM13(offset));
>     } else {
> -        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, offset);
> -        tcg_out_ldst_rr(s, ret, addr, TCG_REG_I5, op);
> +        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, offset);
> +        tcg_out_ldst_rr(s, ret, addr, TCG_REG_TMP, op);
>     }
>  }
>
> @@ -435,8 +438,8 @@ static inline void tcg_out_addi(TCGContext *s, int reg, tcg_target_long val)
>         if (check_fit_tl(val, 13))
>             tcg_out_arithi(s, reg, reg, val, ARITH_ADD);
>         else {
> -            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_I5, val);
> -            tcg_out_arith(s, reg, reg, TCG_REG_I5, ARITH_ADD);
> +            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP, val);
> +            tcg_out_arith(s, reg, reg, TCG_REG_TMP, ARITH_ADD);
>         }
>     }
>  }
> @@ -448,8 +451,8 @@ static inline void tcg_out_andi(TCGContext *s, int rd, int rs,
>         if (check_fit_tl(val, 13))
>             tcg_out_arithi(s, rd, rs, val, ARITH_AND);
>         else {
> -            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_I5, val);
> -            tcg_out_arith(s, rd, rs, TCG_REG_I5, ARITH_AND);
> +            tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP, val);
> +            tcg_out_arith(s, rd, rs, TCG_REG_TMP, ARITH_AND);
>         }
>     }
>  }
> @@ -461,8 +464,8 @@ static void tcg_out_div32(TCGContext *s, int rd, int rs1,
>     if (uns) {
>         tcg_out_sety(s, TCG_REG_G0);
>     } else {
> -        tcg_out_arithi(s, TCG_REG_I5, rs1, 31, SHIFT_SRA);
> -        tcg_out_sety(s, TCG_REG_I5);
> +        tcg_out_arithi(s, TCG_REG_TMP, rs1, 31, SHIFT_SRA);
> +        tcg_out_sety(s, TCG_REG_TMP);
>     }
>
>     tcg_out_arithc(s, rd, rs1, val2, val2const,
> @@ -608,8 +611,8 @@ static void tcg_out_setcond_i32(TCGContext *s, TCGCond cond, TCGArg ret,
>     case TCG_COND_GTU:
>     case TCG_COND_GEU:
>         if (c2const && c2 != 0) {
> -            tcg_out_movi_imm13(s, TCG_REG_I5, c2);
> -            c2 = TCG_REG_I5;
> +            tcg_out_movi_imm13(s, TCG_REG_TMP, c2);
> +            c2 = TCG_REG_TMP;
>         }
>         t = c1, c1 = c2, c2 = t, c2const = 0;
>         cond = tcg_swap_cond(cond);
> @@ -656,15 +659,15 @@ static void tcg_out_setcond2_i32(TCGContext *s, TCGCond cond, TCGArg ret,
>
>     switch (cond) {
>     case TCG_COND_EQ:
> -        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_I5, al, bl, blconst);
> +        tcg_out_setcond_i32(s, TCG_COND_EQ, TCG_REG_TMP, al, bl, blconst);
>         tcg_out_setcond_i32(s, TCG_COND_EQ, ret, ah, bh, bhconst);
> -        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_AND);
> +        tcg_out_arith(s, ret, ret, TCG_REG_TMP, ARITH_AND);
>         break;
>
>     case TCG_COND_NE:
> -        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_I5, al, al, blconst);
> +        tcg_out_setcond_i32(s, TCG_COND_NE, TCG_REG_TMP, al, al, blconst);
>         tcg_out_setcond_i32(s, TCG_COND_NE, ret, ah, bh, bhconst);
> -        tcg_out_arith(s, ret, ret, TCG_REG_I5, ARITH_OR);
> +        tcg_out_arith(s, ret, ret, TCG_REG_TMP, ARITH_OR);
>         break;
>
>     default:
> @@ -964,8 +967,8 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
>  #else
>     addr_reg = args[addrlo_idx];
>     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
> -        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
> -        addr_reg = TCG_REG_I5;
> +        tcg_out_arithi(s, TCG_REG_TMP, addr_reg, 0, SHIFT_SRL);
> +        addr_reg = TCG_REG_TMP;
>     }
>     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
>         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
> @@ -1008,12 +1011,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
>                                 offsetof(CPUTLBEntry, addr_write));
>
>     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> -        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
> -        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
> -        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
> +        /* Reconstruct the full 64-bit value.  */
> +        tcg_out_arithi(s, TCG_REG_TMP, datalo, 0, SHIFT_SRL);
>         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
> -        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
> -        datalo = TCG_REG_G1;
> +        tcg_out_arith(s, TCG_REG_O2, TCG_REG_TMP, TCG_REG_O2, ARITH_OR);
> +        datalo = TCG_REG_O2;
>     }
>
>     /* The fast path is exactly one insn.  Thus we can perform the entire
> @@ -1054,16 +1056,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
>  #else
>     addr_reg = args[addrlo_idx];
>     if (TCG_TARGET_REG_BITS == 64 && TARGET_LONG_BITS == 32) {
> -        tcg_out_arithi(s, TCG_REG_I5, addr_reg, 0, SHIFT_SRL);
> -        addr_reg = TCG_REG_I5;
> +        tcg_out_arithi(s, TCG_REG_TMP, addr_reg, 0, SHIFT_SRL);
> +        addr_reg = TCG_REG_TMP;
>     }
>     if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
> -        /* Reconstruct the full 64-bit value in %g1, using %o2 as temp.  */
> -        /* ??? Redefine the temps from %i4/%i5 so that we have a o/g temp. */
> -        tcg_out_arithi(s, TCG_REG_G1, datalo, 0, SHIFT_SRL);
> +        tcg_out_arithi(s, TCG_REG_TMP, datalo, 0, SHIFT_SRL);
>         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
> -        tcg_out_arith(s, TCG_REG_G1, TCG_REG_G1, TCG_REG_O2, ARITH_OR);
> -        datalo = TCG_REG_G1;
> +        tcg_out_arith(s, TCG_REG_O2, TCG_REG_TMP, TCG_REG_O2, ARITH_OR);
> +        datalo = TCG_REG_O2;
>     }
>     tcg_out_ldst_rr(s, datalo, addr_reg,
>                     (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
> @@ -1087,14 +1087,14 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>     case INDEX_op_goto_tb:
>         if (s->tb_jmp_offset) {
>             /* direct jump method */
> -            tcg_out_sethi(s, TCG_REG_I5, args[0] & 0xffffe000);
> -            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
> +            tcg_out_sethi(s, TCG_REG_TMP, args[0] & 0xffffe000);
> +            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_TMP) |
>                       INSN_IMM13((args[0] & 0x1fff)));
>             s->tb_jmp_offset[args[0]] = s->code_ptr - s->code_buf;
>         } else {
>             /* indirect jump method */
> -            tcg_out_ld_ptr(s, TCG_REG_I5, (tcg_target_long)(s->tb_next + args[0]));
> -            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_I5) |
> +            tcg_out_ld_ptr(s, TCG_REG_TMP, (tcg_target_long)(s->tb_next + args[0]));
> +            tcg_out32(s, JMPL | INSN_RD(TCG_REG_G0) | INSN_RS1(TCG_REG_TMP) |
>                       INSN_RS2(TCG_REG_G0));
>         }
>         tcg_out_nop(s);
> @@ -1106,9 +1106,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>                                    - (tcg_target_ulong)s->code_ptr) >> 2)
>                                  & 0x3fffffff));
>         else {
> -            tcg_out_ld_ptr(s, TCG_REG_I5,
> +            tcg_out_ld_ptr(s, TCG_REG_TMP,
>                            (tcg_target_long)(s->tb_next + args[0]));
> -            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_I5) |
> +            tcg_out32(s, JMPL | INSN_RD(TCG_REG_O7) | INSN_RS1(TCG_REG_TMP) |
>                       INSN_RS2(TCG_REG_G0));
>         }
>         /* delay slot */
> @@ -1214,11 +1214,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>
>     case INDEX_op_rem_i32:
>     case INDEX_op_remu_i32:
> -        tcg_out_div32(s, TCG_REG_I5, args[1], args[2], const_args[2],
> +        tcg_out_div32(s, TCG_REG_TMP, args[1], args[2], const_args[2],
>                       opc == INDEX_op_remu_i32);
> -        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
> +        tcg_out_arithc(s, TCG_REG_TMP, TCG_REG_TMP, args[2], const_args[2],
>                        ARITH_UMUL);
> -        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
> +        tcg_out_arith(s, args[0], args[1], TCG_REG_TMP, ARITH_SUB);
>         break;
>
>     case INDEX_op_brcond_i32:
> @@ -1335,11 +1335,11 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>         goto gen_arith;
>     case INDEX_op_rem_i64:
>     case INDEX_op_remu_i64:
> -        tcg_out_arithc(s, TCG_REG_I5, args[1], args[2], const_args[2],
> +        tcg_out_arithc(s, TCG_REG_TMP, args[1], args[2], const_args[2],
>                        opc == INDEX_op_rem_i64 ? ARITH_SDIVX : ARITH_UDIVX);
> -        tcg_out_arithc(s, TCG_REG_I5, TCG_REG_I5, args[2], const_args[2],
> +        tcg_out_arithc(s, TCG_REG_TMP, TCG_REG_TMP, args[2], const_args[2],
>                        ARITH_MULX);
> -        tcg_out_arith(s, args[0], args[1], TCG_REG_I5, ARITH_SUB);
> +        tcg_out_arith(s, args[0], args[1], TCG_REG_TMP, ARITH_SUB);
>         break;
>     case INDEX_op_ext32s_i64:
>         if (const_args[1]) {
> @@ -1537,15 +1537,17 @@ static void tcg_target_init(TCGContext *s)
>                      (1 << TCG_REG_O7));
>
>     tcg_regset_clear(s->reserved_regs);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0);
> -#if TCG_TARGET_REG_BITS == 64
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I4); // for internal use
> -#endif
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I5); // for internal use
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6);
> -    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O7);
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G0); // zero
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G6); // reserved for os
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_G7); // thread pointer
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I6); // frame pointer
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_I7); // return address
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_O6); // stack pointer
> +    tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP); // for internal use
> +    if (TCG_TARGET_REG_BITS == 64) {
> +        tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP2); // for internal use

Please fix the comment style above.

> +    }
> +
>     tcg_add_target_add_op_defs(sparc_op_defs);
>  }
>
> --
> 1.7.7.6
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0.
  2012-03-26 16:31   ` Blue Swirl
@ 2012-03-26 16:52     ` Richard Henderson
  2012-03-26 17:22       ` Blue Swirl
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2012-03-26 16:52 UTC (permalink / raw)
  To: Blue Swirl; +Cc: qemu-devel

On 03/26/12 09:31, Blue Swirl wrote:
>> > +/* In dyngen-exec.h, without AREG0, we fall back to an alias to cpu_single_env.
>> > +   We can't actually tell from here whether that's needed or not, but it does
>> > +   not hurt to go ahead and make the declaration.  */
>> > +#ifndef CONFIG_TCG_PASS_AREG0
>> > +extern
>> > +#ifdef __linux__
>> > +  __thread
>> > +#endif
>> > +  CPUArchState *env __attribute__((alias("tls__cpu_single_env")));
>> > +#endif /* CONFIG_TCG_PASS_AREG0 */
> Please use DECLARE_TLS/DEFINE_TLS and global env accesses should also
> use tls_var().
> 

That won't work.

This is intended to be a drop-in replacement for the "env" symbol that
we declare in dyngen-exec.h.  For all other hosts, this symbol is a
global register variable.  We can't go wrapping tls_var around all uses
in all target backends.

As I say in the comment, the most natural replacement is a preprocessor
macro, but then that fails with the uses of "env" in the DEF_HELPER_N
macros.

Which leaves no alternative -- short of converting *all* targets to
CONFIG_TCG_PASS_AREG0 first -- except the symbol alias you see there.

Hmm... actually... I'm wrong about the use of preprocessor macros.
The simple solution there is to re-order the includes on a few ports.
I.e. "helper.h" must come before "dyngen-exec.h".  Now that's a much
simpler fix...


r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0.
  2012-03-26 16:52     ` Richard Henderson
@ 2012-03-26 17:22       ` Blue Swirl
  0 siblings, 0 replies; 22+ messages in thread
From: Blue Swirl @ 2012-03-26 17:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Mar 26, 2012 at 16:52, Richard Henderson <rth@twiddle.net> wrote:
> On 03/26/12 09:31, Blue Swirl wrote:
>>> > +/* In dyngen-exec.h, without AREG0, we fall back to an alias to cpu_single_env.
>>> > +   We can't actually tell from here whether that's needed or not, but it does
>>> > +   not hurt to go ahead and make the declaration.  */
>>> > +#ifndef CONFIG_TCG_PASS_AREG0
>>> > +extern
>>> > +#ifdef __linux__
>>> > +  __thread
>>> > +#endif
>>> > +  CPUArchState *env __attribute__((alias("tls__cpu_single_env")));
>>> > +#endif /* CONFIG_TCG_PASS_AREG0 */
>> Please use DECLARE_TLS/DEFINE_TLS and global env accesses should also
>> use tls_var().
>>
>
> That won't work.
>
> This is intended to be a drop-in replacement for the "env" symbol that
> we declare in dyngen-exec.h.  For all other hosts, this symbol is a
> global register variable.  We can't go wrapping tls_var around all uses
> in all target backends.
>
> As I say in the comment, the most natural replacement is a preprocessor
> macro, but then that fails with the uses of "env" in the DEF_HELPER_N
> macros.
>
> Which leaves no alternative -- short of converting *all* targets to
> CONFIG_TCG_PASS_AREG0 first -- except the symbol alias you see there.

But at that point there will be no global env use anymore, so
dyngen-exec.h etc. can be removed. Perhaps this patch and its
dependencies should wait for that to happen. As an intermediate hack
it's sort of OK.

> Hmm... actually... I'm wrong about the use of preprocessor macros.
> The simple solution there is to re-order the includes on a few ports.
> I.e. "helper.h" must come before "dyngen-exec.h".  Now that's a much
> simpler fix...

OK.

>
>
> r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-03-26 17:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-25 22:27 [Qemu-devel] [PATCH 00/15] tcg-sparc improvments Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 01/15] tcg-sparc: Hack in qemu_ld/st64 for 32-bit Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 02/15] tcg-sparc: Fix ADDX opcode Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 03/15] tcg-sparc: Assume v9 cpu always, i.e. force v8plus in 32-bit mode Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 04/15] tcg-sparc: Fix qemu_ld/st to handle 32-bit host Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 05/15] tcg-sparc: Simplify qemu_ld/st direct memory paths Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 06/15] tcg-sparc: Support GUEST_BASE Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 07/15] tcg-sparc: Steamline qemu_ld/st more Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 08/15] Avoid declaring the env variable at all if CONFIG_TCG_PASS_AREG0 Richard Henderson
2012-03-26 16:26   ` Blue Swirl
2012-03-26 16:31     ` Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 09/15] tcg-sparc: Do not use a global register for AREG0 Richard Henderson
2012-03-26 16:31   ` Blue Swirl
2012-03-26 16:52     ` Richard Henderson
2012-03-26 17:22       ` Blue Swirl
2012-03-25 22:27 ` [Qemu-devel] [PATCH 10/15] tcg-sparc: Change AREG0 in generated code to %i0 Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 11/15] tcg-sparc: Clean up cruft stemming from attempts to use global registers Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 12/15] tcg-sparc: Mask shift immediates to avoid illegal insns Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 13/15] tcg-sparc: Use defines for temporaries Richard Henderson
2012-03-26 16:38   ` Blue Swirl
2012-03-25 22:27 ` [Qemu-devel] [PATCH 14/15] tcg-sparc: Add %g/%o registers to alloc_order Richard Henderson
2012-03-25 22:27 ` [Qemu-devel] [PATCH 15/15] tcg-sparc: Fix and enable direct TB chaining Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).