qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG
@ 2013-09-04 21:04 Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 01/16] tcg: Add TCGMemOp Richard Henderson
                   ` (15 more replies)
  0 siblings, 16 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

This patch set is far from complete, and is more of an RFC, but it
contains at least one example of each of the 4-5 steps in the conversion.

Step 1 is the most complete, as it's largely a search-and-replace step
on the tcg backends.  The enumeration values of TCGMemOp match the
current integer values, with the exception of the addition of BSWAP=8.
Therefore in some places in the patches I do more masking than previous,
to get rid of that bswap bit.

I at least cross-compiled to all but 3 of the tcg backends, but since
the demise of a portion of the gcc compile farm I'm no longer able to
test hppa, mips, sparc.  And for those, downloading the proper cross
environment turned out to be tricky -- curse our new external library
dependencies.

After that, I've only converted the i386 backend, and the ppc frontend,
to the new opcodes.  I'm able to boot the fedora 19 ppc64 installer,
and a browse of the dumps are encouraging.


r~


Richard Henderson (16):
  tcg: Add TCGMemOp
  tcg-i386: Use TCGMemOp within qemu_ldst routines
  tcg-aarch64: Use TCGMemOp within qemu_ldst routines
  tcg-arm: Use TCGMemOp within qemu_ldst routines
  tcg-s390: Use TCGMemOp within qemu_ldst routines
  tcg-ppc: Use TCGMemOp within qemu_ldst routines
  tcg-ppc64: Use TCGMemOp within qemu_ldst routines
  tcg-hppa: Use TCGMemOp within qemu_ldst routines
  tcg-mips: Use TCGMemOp within qemu_ldst routines
  tcg-sparc: Use TCGMemOp within qemu_ldst routines
  tcg: Add qemu_ld_st_i32/64
  exec: Add both big- and little-endian memory helpers
  tcg-i386: Tidy softmmu routines
  tcg-i386: Remove "cb" output restriction from qemu_st8 for i386
  tcg-i386: Support new ldst opcodes
  target-ppc: Convert to new ldst opcodes

 include/exec/softmmu_template.h | 286 +++++++++++++++--
 target-ppc/translate.c          | 147 +++------
 tcg/README                      |  43 ++-
 tcg/aarch64/tcg-target.c        | 126 ++++----
 tcg/aarch64/tcg-target.h        |   2 +
 tcg/arm/tcg-target.c            | 125 ++++----
 tcg/arm/tcg-target.h            |   2 +
 tcg/hppa/tcg-target.c           | 110 +++----
 tcg/hppa/tcg-target.h           |   2 +
 tcg/i386/tcg-target.c           | 695 ++++++++++++++++++----------------------
 tcg/i386/tcg-target.h           |   2 +
 tcg/ia64/tcg-target.h           |   2 +
 tcg/mips/tcg-target.c           | 116 +++----
 tcg/mips/tcg-target.h           |   2 +
 tcg/ppc/tcg-target.c            |  93 +++---
 tcg/ppc/tcg-target.h            |   2 +
 tcg/ppc64/tcg-target.c          |  82 ++---
 tcg/ppc64/tcg-target.h          |   2 +
 tcg/s390/tcg-target.c           | 107 +++----
 tcg/s390/tcg-target.h           |   2 +
 tcg/sparc/tcg-target.c          | 116 ++++---
 tcg/sparc/tcg-target.h          |   2 +
 tcg/tcg-op.h                    | 239 ++++----------
 tcg/tcg-opc.h                   |  96 ++++--
 tcg/tcg.c                       | 209 ++++++++++++
 tcg/tcg.h                       | 124 +++++--
 tcg/tci/tcg-target.h            |   2 +
 27 files changed, 1500 insertions(+), 1236 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 01/16] tcg: Add TCGMemOp
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 02/16] tcg-i386: Use TCGMemOp within qemu_ldst routines Richard Henderson
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/tcg.h | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index 53c4b33..91dcd92 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -197,6 +197,60 @@ typedef enum TCGType {
 #endif
 } TCGType;
 
+/* Constants for qemu_ld and qemu_st for the Memory Operation field.  */
+typedef enum TCGMemOp {
+    MO_8     = 0,
+    MO_16    = 1,
+    MO_32    = 2,
+    MO_64    = 3,
+    MO_SIZE  = 3,   /* Mask for the above.  */
+
+    MO_SIGN  = 4,   /* Sign-extended, otherwise zero-extended.  */
+
+    MO_BSWAP = 8,   /* Host reverse endian.  */
+#ifdef HOST_WORDS_BIGENDIAN
+    MO_LE    = MO_BSWAP,
+    MO_BE    = 0,
+#else
+    MO_LE    = 0,
+    MO_BE    = MO_BSWAP,
+#endif
+#ifdef TARGET_WORDS_BIGENDIAN
+    MO_TE    = MO_BE,
+#else
+    MO_TE    = MO_LE,
+#endif
+
+    /* Combinations of the above, for ease of use.  */
+    MO_UB    = MO_8,
+    MO_UW    = MO_16,
+    MO_UL    = MO_32,
+    MO_SB    = MO_SIGN | MO_8,
+    MO_SW    = MO_SIGN | MO_16,
+    MO_SL    = MO_SIGN | MO_32,
+    MO_Q     = MO_64,
+
+    MO_LEUW  = MO_LE | MO_UW,
+    MO_LEUL  = MO_LE | MO_UL,
+    MO_LESW  = MO_LE | MO_SW,
+    MO_LESL  = MO_LE | MO_SL,
+    MO_LEQ   = MO_LE | MO_Q,
+
+    MO_BEUW  = MO_BE | MO_UW,
+    MO_BEUL  = MO_BE | MO_UL,
+    MO_BESW  = MO_BE | MO_SW,
+    MO_BESL  = MO_BE | MO_SL,
+    MO_BEQ   = MO_BE | MO_Q,
+
+    MO_TEUW  = MO_TE | MO_UW,
+    MO_TEUL  = MO_TE | MO_UL,
+    MO_TESW  = MO_TE | MO_SW,
+    MO_TESL  = MO_TE | MO_SL,
+    MO_TEQ   = MO_TE | MO_Q,
+
+    MO_SSIZE = MO_SIZE | MO_SIGN,
+} TCGMemOp;
+
 typedef tcg_target_ulong TCGArg;
 
 /* Define a type and accessor macros for variables.  Using a struct is
@@ -217,8 +271,9 @@ typedef tcg_target_ulong TCGArg;
 #define TCG_MAX_QEMU_LDST       640
 
 typedef struct TCGLabelQemuLdst {
-    int is_ld:1;            /* qemu_ld: 1, qemu_st: 0 */
-    int opc:4;
+    unsigned is_ld : 1;          /* qemu_ld: 1, qemu_st: 0 */
+    TCGMemOp opc : 4;
+
     int addrlo_reg;         /* reg index for low word of guest virtual addr */
     int addrhi_reg;         /* reg index for high word of guest virtual addr */
     int datalo_reg;         /* reg index for low word to be loaded or stored */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 02/16] tcg-i386: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 01/16] tcg: Add TCGMemOp Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 03/16] tcg-aarch64: " Richard Henderson
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Step one in the transition, with constants passed down from tcg_out_op.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 123 ++++++++++++++++++++++++--------------------------
 1 file changed, 59 insertions(+), 64 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index c1f0741..ba24ec9 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1075,7 +1075,7 @@ static void add_qemu_ldst_label(TCGContext *s,
    First argument register is clobbered.  */
 
 static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx,
-                                    int mem_index, int s_bits,
+                                    int mem_index, TCGMemOp s_bits,
                                     const TCGArg *args,
                                     uint8_t **label_ptr, int which)
 {
@@ -1162,28 +1162,26 @@ static inline void setup_guest_base_seg(void)
 static inline void setup_guest_base_seg(void) { }
 #endif /* SOFTMMU */
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo, int datahi,
-                                   int base, intptr_t ofs, int seg, int sizeop)
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg base, intptr_t ofs, int seg,
+                                   TCGMemOp memop)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bswap = 1;
-#else
-    const int bswap = 0;
-#endif
-    switch (sizeop) {
-    case 0:
+    const TCGMemOp bswap = memop & MO_BSWAP;
+
+    switch (memop & MO_SSIZE) {
+    case MO_UB:
         tcg_out_modrm_offset(s, OPC_MOVZBL + seg, datalo, base, ofs);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_modrm_offset(s, OPC_MOVSBL + P_REXW + seg, datalo, base, ofs);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_modrm_offset(s, OPC_MOVZWL + seg, datalo, base, ofs);
         if (bswap) {
             tcg_out_rolw_8(s, datalo);
         }
         break;
-    case 1 | 4:
+    case MO_SW:
         if (bswap) {
             tcg_out_modrm_offset(s, OPC_MOVZWL + seg, datalo, base, ofs);
             tcg_out_rolw_8(s, datalo);
@@ -1193,14 +1191,14 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo, int datahi,
                                  datalo, base, ofs);
         }
         break;
-    case 2:
+    case MO_UL:
         tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg, datalo, base, ofs);
         if (bswap) {
             tcg_out_bswap32(s, datalo);
         }
         break;
 #if TCG_TARGET_REG_BITS == 64
-    case 2 | 4:
+    case MO_SL:
         if (bswap) {
             tcg_out_modrm_offset(s, OPC_MOVL_GvEv + seg, datalo, base, ofs);
             tcg_out_bswap32(s, datalo);
@@ -1210,7 +1208,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo, int datahi,
         }
         break;
 #endif
-    case 3:
+    case MO_Q:
         if (TCG_TARGET_REG_BITS == 64) {
             tcg_out_modrm_offset(s, OPC_MOVL_GvEv + P_REXW + seg,
                                  datalo, base, ofs);
@@ -1248,26 +1246,26 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo, int datahi,
 /* XXX: qemu_ld and qemu_st could be modified to clobber only EDX and
    EAX. It will be useful once fixed registers globals are less
    common. */
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     int data_reg, data_reg2 = 0;
     int addrlo_idx;
 #if defined(CONFIG_SOFTMMU)
-    int mem_index, s_bits;
+    int mem_index;
+    TCGMemOp s_bits;
     uint8_t *label_ptr[2];
 #endif
 
     data_reg = args[0];
     addrlo_idx = 1;
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
         data_reg2 = args[1];
         addrlo_idx = 2;
     }
 
 #if defined(CONFIG_SOFTMMU)
     mem_index = args[addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS)];
-    s_bits = opc & 3;
+    s_bits = opc & MO_SIZE;
 
     tcg_out_tlb_load(s, addrlo_idx, mem_index, s_bits, args,
                      label_ptr, offsetof(CPUTLBEntry, addr_read));
@@ -1312,27 +1310,24 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
 #endif
 }
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi,
-                                   int base, intptr_t ofs, int seg,
-                                   int sizeop)
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
+                                   TCGReg base, intptr_t ofs, int seg,
+                                   TCGMemOp memop)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bswap = 1;
-#else
-    const int bswap = 0;
-#endif
+    const TCGMemOp bswap = memop & MO_BSWAP;
+
     /* ??? Ideally we wouldn't need a scratch register.  For user-only,
        we could perform the bswap twice to restore the original value
        instead of moving to the scratch.  But as it is, the L constraint
        means that TCG_REG_L0 is definitely free here.  */
-    const int scratch = TCG_REG_L0;
+    const TCGReg scratch = TCG_REG_L0;
 
-    switch (sizeop) {
-    case 0:
+    switch (memop & MO_SIZE) {
+    case MO_8:
         tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R + seg,
                              datalo, base, ofs);
         break;
-    case 1:
+    case MO_16:
         if (bswap) {
             tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
             tcg_out_rolw_8(s, scratch);
@@ -1341,7 +1336,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi,
         tcg_out_modrm_offset(s, OPC_MOVL_EvGv + P_DATA16 + seg,
                              datalo, base, ofs);
         break;
-    case 2:
+    case MO_32:
         if (bswap) {
             tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
             tcg_out_bswap32(s, scratch);
@@ -1349,7 +1344,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi,
         }
         tcg_out_modrm_offset(s, OPC_MOVL_EvGv + seg, datalo, base, ofs);
         break;
-    case 3:
+    case MO_64:
         if (TCG_TARGET_REG_BITS == 64) {
             if (bswap) {
                 tcg_out_mov(s, TCG_TYPE_I64, scratch, datalo);
@@ -1375,13 +1370,13 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo, int datahi,
     }
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     int data_reg, data_reg2 = 0;
     int addrlo_idx;
 #if defined(CONFIG_SOFTMMU)
-    int mem_index, s_bits;
+    int mem_index;
+    TCGMemOp s_bits;
     uint8_t *label_ptr[2];
 #endif
 
@@ -1394,7 +1389,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
 
 #if defined(CONFIG_SOFTMMU)
     mem_index = args[addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS)];
-    s_bits = opc;
+    s_bits = opc & MO_SIZE;
 
     tcg_out_tlb_load(s, addrlo_idx, mem_index, s_bits, args,
                      label_ptr, offsetof(CPUTLBEntry, addr_write));
@@ -1483,8 +1478,8 @@ static void add_qemu_ldst_label(TCGContext *s,
  */
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-    int opc = l->opc;
-    int s_bits = opc & 3;
+    TCGMemOp opc = l->opc;
+    TCGMemOp s_bits = opc & MO_SIZE;
     TCGReg data_reg;
     uint8_t **label_ptr = &l->label_ptr[0];
 
@@ -1524,25 +1519,25 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     tcg_out_calli(s, (uintptr_t)qemu_ld_helpers[s_bits]);
 
     data_reg = l->datalo_reg;
-    switch(opc) {
-    case 0 | 4:
+    switch (opc & MO_SSIZE) {
+    case MO_SB:
         tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
         break;
-    case 1 | 4:
+    case MO_SW:
         tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
         break;
 #if TCG_TARGET_REG_BITS == 64
-    case 2 | 4:
+    case MO_SL:
         tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
         break;
 #endif
-    case 0:
-    case 1:
+    case MO_UB:
+    case MO_UW:
         /* Note that the helpers have zero-extended to tcg_target_long.  */
-    case 2:
+    case MO_UL:
         tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
         break;
-    case 3:
+    case MO_Q:
         if (TCG_TARGET_REG_BITS == 64) {
             tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
         } else if (data_reg == TCG_REG_EDX) {
@@ -1567,8 +1562,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
  */
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-    int opc = l->opc;
-    int s_bits = opc & 3;
+    TCGMemOp opc = l->opc;
+    TCGMemOp s_bits = opc & MO_SIZE;
     uint8_t **label_ptr = &l->label_ptr[0];
     TCGReg retaddr;
 
@@ -1595,7 +1590,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
         tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
         ofs += 4;
 
-        if (opc == 3) {
+        if (s_bits == MO_64) {
             tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
             ofs += 4;
         }
@@ -1609,7 +1604,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
     } else {
         tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
         /* The second argument is already loaded with addrlo.  */
-        tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
                     tcg_target_call_iarg_regs[2], l->datalo_reg);
         tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
                      l->mem_index);
@@ -1875,38 +1870,38 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_qemu_ld32u:
 #endif
     case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
 
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     OP_32_64(mulu2):
@@ -1967,7 +1962,7 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, 2 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESL);
         break;
 
     case INDEX_op_brcond_i64:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 03/16] tcg-aarch64: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 01/16] tcg: Add TCGMemOp Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 02/16] tcg-i386: Use TCGMemOp within qemu_ldst routines Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 04/16] tcg-arm: " Richard Henderson
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/aarch64/tcg-target.c | 126 +++++++++++++++++++++++------------------------
 1 file changed, 62 insertions(+), 64 deletions(-)

diff --git a/tcg/aarch64/tcg-target.c b/tcg/aarch64/tcg-target.c
index 651327e..608b735 100644
--- a/tcg/aarch64/tcg-target.c
+++ b/tcg/aarch64/tcg-target.c
@@ -21,12 +21,6 @@ static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
 };
 #endif /* NDEBUG */
 
-#ifdef TARGET_WORDS_BIGENDIAN
- #define TCG_LDST_BSWAP 1
-#else
- #define TCG_LDST_BSWAP 0
-#endif
-
 static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_X20, TCG_REG_X21, TCG_REG_X22, TCG_REG_X23,
     TCG_REG_X24, TCG_REG_X25, TCG_REG_X26, TCG_REG_X27,
@@ -902,7 +896,7 @@ static inline void tcg_out_rev16(TCGContext *s, bool ext, TCGReg rd, TCGReg rm)
     tcg_out32(s, base | rm << 5 | rd);
 }
 
-static inline void tcg_out_sxt(TCGContext *s, bool ext, int s_bits,
+static inline void tcg_out_sxt(TCGContext *s, bool ext, TCGMemOp s_bits,
                                TCGReg rd, TCGReg rn)
 {
     /* Using ALIASes SXTB, SXTH, SXTW, of SBFM Xd, Xn, #0, #7|15|31 */
@@ -910,7 +904,7 @@ static inline void tcg_out_sxt(TCGContext *s, bool ext, int s_bits,
     tcg_out_sbfm(s, ext, rd, rn, 0, bits);
 }
 
-static inline void tcg_out_uxt(TCGContext *s, int s_bits,
+static inline void tcg_out_uxt(TCGContext *s, TCGMemOp s_bits,
                                TCGReg rd, TCGReg rn)
 {
     /* Using ALIASes UXTB, UXTH of UBFM Wd, Wn, #0, #7|15 */
@@ -1006,10 +1000,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X2, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X3, (uintptr_t)lb->raddr);
 
-    tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[lb->opc & 3]);
+    tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[lb->opc & MO_SIZE]);
 
-    if (lb->opc & 0x04) {
-        tcg_out_sxt(s, 1, lb->opc & 3, lb->datalo_reg, TCG_REG_X0);
+    if (lb->opc & MO_SIGN) {
+        tcg_out_sxt(s, 1, lb->opc & MO_SIZE, lb->datalo_reg, TCG_REG_X0);
     } else {
         tcg_out_movr(s, 1, lb->datalo_reg, TCG_REG_X0);
     }
@@ -1027,7 +1021,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_X3, lb->mem_index);
     tcg_out_adr(s, TCG_REG_X4, (uintptr_t)lb->raddr);
 
-    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[lb->opc & 3]);
+    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[lb->opc & MO_SIZE]);
 
     tcg_out_goto(s, (tcg_target_long)lb->raddr);
 }
@@ -1045,7 +1039,7 @@ void tcg_out_tb_finalize(TCGContext *s)
     }
 }
 
-static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
+static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOp opc,
                                 TCGReg data_reg, TCGReg addr_reg,
                                 int mem_index,
                                 uint8_t *raddr, uint8_t *label_ptr)
@@ -1072,7 +1066,7 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
    slow path for the failure case, which will be patched later when finalizing
    the slow path. Generated code returns the host addend in X1,
    clobbers X0,X2,X3,TMP. */
-static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
+static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, TCGMemOp s_bits,
                              uint8_t **label_ptr, int mem_index, int is_read)
 {
     TCGReg base = TCG_AREG0;
@@ -1129,49 +1123,51 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg addr_reg, int s_bits,
 
 #endif /* CONFIG_SOFTMMU */
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc, TCGReg data_r,
                                    TCGReg addr_r, TCGReg off_r)
 {
-    switch (opc) {
-    case 0:
+    const TCGMemOp bswap = opc & MO_BSWAP;
+
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_ldst_r(s, LDST_8, LDST_LD, data_r, addr_r, off_r);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ldst_r(s, LDST_8, LDST_LD_S_X, data_r, addr_r, off_r);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
-        if (TCG_LDST_BSWAP) {
+        if (bswap) {
             tcg_out_rev16(s, 0, data_r, data_r);
         }
         break;
-    case 1 | 4:
-        if (TCG_LDST_BSWAP) {
+    case MO_SW:
+        if (bswap) {
             tcg_out_ldst_r(s, LDST_16, LDST_LD, data_r, addr_r, off_r);
             tcg_out_rev16(s, 0, data_r, data_r);
-            tcg_out_sxt(s, 1, 1, data_r, data_r);
+            tcg_out_sxt(s, 1, MO_16, data_r, data_r);
         } else {
             tcg_out_ldst_r(s, LDST_16, LDST_LD_S_X, data_r, addr_r, off_r);
         }
         break;
-    case 2:
+    case MO_UL:
         tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
-        if (TCG_LDST_BSWAP) {
+        if (bswap) {
             tcg_out_rev(s, 0, data_r, data_r);
         }
         break;
-    case 2 | 4:
-        if (TCG_LDST_BSWAP) {
+    case MO_SL:
+        if (bswap) {
             tcg_out_ldst_r(s, LDST_32, LDST_LD, data_r, addr_r, off_r);
             tcg_out_rev(s, 0, data_r, data_r);
-            tcg_out_sxt(s, 1, 2, data_r, data_r);
+            tcg_out_sxt(s, 1, MO_32, data_r, data_r);
         } else {
             tcg_out_ldst_r(s, LDST_32, LDST_LD_S_X, data_r, addr_r, off_r);
         }
         break;
-    case 3:
+    case MO_Q:
         tcg_out_ldst_r(s, LDST_64, LDST_LD, data_r, addr_r, off_r);
-        if (TCG_LDST_BSWAP) {
+        if (bswap) {
             tcg_out_rev(s, 1, data_r, data_r);
         }
         break;
@@ -1180,31 +1176,33 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data_r,
     }
 }
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data_r,
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc, TCGReg data_r,
                                    TCGReg addr_r, TCGReg off_r)
 {
-    switch (opc) {
-    case 0:
+    const TCGMemOp bswap = opc & MO_BSWAP;
+
+    switch (opc & MO_SIZE) {
+    case MO_8:
         tcg_out_ldst_r(s, LDST_8, LDST_ST, data_r, addr_r, off_r);
         break;
-    case 1:
-        if (TCG_LDST_BSWAP) {
+    case MO_16:
+        if (bswap) {
             tcg_out_rev16(s, 0, TCG_REG_TMP, data_r);
             tcg_out_ldst_r(s, LDST_16, LDST_ST, TCG_REG_TMP, addr_r, off_r);
         } else {
             tcg_out_ldst_r(s, LDST_16, LDST_ST, data_r, addr_r, off_r);
         }
         break;
-    case 2:
-        if (TCG_LDST_BSWAP) {
+    case MO_32:
+        if (bswap) {
             tcg_out_rev(s, 0, TCG_REG_TMP, data_r);
             tcg_out_ldst_r(s, LDST_32, LDST_ST, TCG_REG_TMP, addr_r, off_r);
         } else {
             tcg_out_ldst_r(s, LDST_32, LDST_ST, data_r, addr_r, off_r);
         }
         break;
-    case 3:
-        if (TCG_LDST_BSWAP) {
+    case MO_64:
+        if (bswap) {
             tcg_out_rev(s, 1, TCG_REG_TMP, data_r);
             tcg_out_ldst_r(s, LDST_64, LDST_ST, TCG_REG_TMP, addr_r, off_r);
         } else {
@@ -1216,11 +1214,12 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data_r,
     }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index;
+    TCGMemOp s_bits;
     uint8_t *label_ptr;
 #endif
     data_reg = args[0];
@@ -1228,7 +1227,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = args[2];
-    s_bits = opc & 3;
+    s_bits = opc & MO_SIZE;
     tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 1);
     tcg_out_qemu_ld_direct(s, opc, data_reg, addr_reg, TCG_REG_X1);
     add_qemu_ldst_label(s, 1, opc, data_reg, addr_reg,
@@ -1239,11 +1238,12 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index;
+    TCGMemOp s_bits;
     uint8_t *label_ptr;
 #endif
     data_reg = args[0];
@@ -1251,7 +1251,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 
 #ifdef CONFIG_SOFTMMU
     mem_index = args[2];
-    s_bits = opc & 3;
+    s_bits = opc & MO_SIZE;
 
     tcg_out_tlb_read(s, addr_reg, s_bits, &label_ptr, mem_index, 0);
     tcg_out_qemu_st_direct(s, opc, data_reg, addr_reg, TCG_REG_X1);
@@ -1534,40 +1534,38 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0 | 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 4 | 0);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 0 | 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 4 | 1);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
+    case INDEX_op_qemu_ld32:
     case INDEX_op_qemu_ld32u:
-        tcg_out_qemu_ld(s, args, 0 | 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, 4 | 2);
-        break;
-    case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 0 | 2);
+        tcg_out_qemu_ld(s, args, MO_TESL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 0 | 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     case INDEX_op_bswap32_i64:
@@ -1585,22 +1583,22 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 
     case INDEX_op_ext8s_i64:
     case INDEX_op_ext8s_i32:
-        tcg_out_sxt(s, ext, 0, a0, a1);
+        tcg_out_sxt(s, ext, MO_8, a0, a1);
         break;
     case INDEX_op_ext16s_i64:
     case INDEX_op_ext16s_i32:
-        tcg_out_sxt(s, ext, 1, a0, a1);
+        tcg_out_sxt(s, ext, MO_16, a0, a1);
         break;
     case INDEX_op_ext32s_i64:
-        tcg_out_sxt(s, 1, 2, a0, a1);
+        tcg_out_sxt(s, 1, MO_32, a0, a1);
         break;
     case INDEX_op_ext8u_i64:
     case INDEX_op_ext8u_i32:
-        tcg_out_uxt(s, 0, a0, a1);
+        tcg_out_uxt(s, MO_8, a0, a1);
         break;
     case INDEX_op_ext16u_i64:
     case INDEX_op_ext16u_i32:
-        tcg_out_uxt(s, 1, a0, a1);
+        tcg_out_uxt(s, MO_16, a0, a1);
         break;
     case INDEX_op_ext32u_i64:
         tcg_out_movr(s, 0, a0, a1);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 04/16] tcg-arm: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (2 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 03/16] tcg-aarch64: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 05/16] tcg-s390: " Richard Henderson
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 125 +++++++++++++++++++++++++--------------------------
 1 file changed, 61 insertions(+), 64 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index ca8b293..9625154 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -1167,7 +1167,7 @@ QEMU_BUILD_BUG_ON(offsetof(CPUArchState, tlb_table[NB_MMU_MODES - 1][1])
    containing the addend of the tlb entry.  Clobbers R0, R1, R2, TMP.  */
 
 static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
-                               int s_bits, int mem_index, bool is_load)
+                               TCGMemOp s_bits, int mem_index, bool is_load)
 {
     TCGReg base = TCG_AREG0;
     int cmp_off =
@@ -1238,7 +1238,7 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
 /* Record the context of a call to the out of line helper code for the slow
    path for a load or store, so that we can later generate the correct
    helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
+static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOp opc,
                                 int data_reg, int data_reg2, int addrlo_reg,
                                 int addrhi_reg, int mem_index,
                                 uint8_t *raddr, uint8_t *label_ptr)
@@ -1266,7 +1266,7 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
     TCGReg argreg, data_reg, data_reg2;
-    int opc = lb->opc;
+    TCGMemOp opc = lb->opc & MO_SSIZE;
     uintptr_t func;
 
     reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
@@ -1284,11 +1284,11 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
        icache usage.  For pre-armv6, use the signed helpers since we do
        not have a single insn sign-extend.  */
     if (use_armv6_instructions) {
-        func = (uintptr_t)qemu_ld_helpers[opc & 3];
+        func = (uintptr_t)qemu_ld_helpers[opc & MO_SIZE];
     } else {
         func = (uintptr_t)qemu_ld_helpers[opc];
-        if (opc & 4) {
-            opc = 2;
+        if (opc & MO_SIGN) {
+            opc = MO_UL;
         }
     }
     tcg_out_call(s, func);
@@ -1296,16 +1296,16 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     data_reg = lb->datalo_reg;
     data_reg2 = lb->datahi_reg;
     switch (opc) {
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ext8s(s, COND_AL, data_reg, TCG_REG_R0);
         break;
-    case 1 | 4:
+    case MO_SW:
         tcg_out_ext16s(s, COND_AL, data_reg, TCG_REG_R0);
         break;
     default:
         tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
         break;
-    case 3:
+    case MO_Q:
         if (data_reg != TCG_REG_R1) {
             tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
             tcg_out_mov_reg(s, COND_AL, data_reg2, TCG_REG_R1);
@@ -1326,6 +1326,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
     TCGReg argreg, data_reg, data_reg2;
+    TCGMemOp s_bits = lb->opc & MO_SIZE;
 
     reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
 
@@ -1339,17 +1340,18 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
     data_reg = lb->datalo_reg;
     data_reg2 = lb->datahi_reg;
-    switch (lb->opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         argreg = tcg_out_arg_reg8(s, argreg, data_reg);
         break;
-    case 1:
+    case MO_16:
         argreg = tcg_out_arg_reg16(s, argreg, data_reg);
         break;
-    case 2:
+    case MO_32:
+    default:
         argreg = tcg_out_arg_reg32(s, argreg, data_reg);
         break;
-    case 3:
+    case MO_64:
         argreg = tcg_out_arg_reg64(s, argreg, data_reg, data_reg2);
         break;
     }
@@ -1358,32 +1360,27 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     argreg = tcg_out_arg_reg32(s, argreg, TCG_REG_R14);
 
     /* Tail-call to the helper, which will return to the fast path.  */
-    tcg_out_goto(s, COND_AL, (tcg_target_long) qemu_st_helpers[lb->opc & 3]);
+    tcg_out_goto(s, COND_AL, (tcg_target_long) qemu_st_helpers[s_bits]);
 }
 #endif /* SOFTMMU */
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg, data_reg2;
-    bool bswap;
+    TCGMemOp bswap = opc & MO_BSWAP;
+    TCGMemOp s_bits = opc & MO_SIZE;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index;
     TCGReg addr_reg2, addend;
     uint8_t *label_ptr;
 #endif
-#ifdef TARGET_WORDS_BIGENDIAN
-    bswap = 1;
-#else
-    bswap = 0;
-#endif
 
     data_reg = *args++;
-    data_reg2 = (opc == 3 ? *args++ : 0);
+    data_reg2 = (s_bits == MO_64 ? *args++ : 0);
     addr_reg = *args++;
 #ifdef CONFIG_SOFTMMU
     addr_reg2 = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     mem_index = *args;
-    s_bits = opc & 3;
 
     addend = tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits, mem_index, 1);
 
@@ -1392,20 +1389,20 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     label_ptr = s->code_ptr;
     tcg_out_bl_noaddr(s, COND_NE);
 
-    switch (opc) {
-    case 0:
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_ld8_r(s, COND_AL, data_reg, addr_reg, addend);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ld8s_r(s, COND_AL, data_reg, addr_reg, addend);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, addend);
         if (bswap) {
             tcg_out_bswap16(s, COND_AL, data_reg, data_reg);
         }
         break;
-    case 1 | 4:
+    case MO_SW:
         if (bswap) {
             tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, addend);
             tcg_out_bswap16s(s, COND_AL, data_reg, data_reg);
@@ -1413,14 +1410,14 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_ld16s_r(s, COND_AL, data_reg, addr_reg, addend);
         }
         break;
-    case 2:
+    case MO_UL:
     default:
         tcg_out_ld32_r(s, COND_AL, data_reg, addr_reg, addend);
         if (bswap) {
             tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
         }
         break;
-    case 3:
+    case MO_Q:
         if (bswap) {
             TCGReg t = data_reg;
             data_reg = data_reg2;
@@ -1462,20 +1459,20 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
             offset &= ~(0xff << i);
         }
     }
-    switch (opc) {
-    case 0:
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_ld8_12(s, COND_AL, data_reg, addr_reg, 0);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ld8s_8(s, COND_AL, data_reg, addr_reg, 0);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_ld16u_8(s, COND_AL, data_reg, addr_reg, 0);
         if (bswap) {
             tcg_out_bswap16(s, COND_AL, data_reg, data_reg);
         }
         break;
-    case 1 | 4:
+    case MO_SW:
         if (bswap) {
             tcg_out_ld16u_8(s, COND_AL, data_reg, addr_reg, 0);
             tcg_out_bswap16s(s, COND_AL, data_reg, data_reg);
@@ -1483,14 +1480,14 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_ld16s_8(s, COND_AL, data_reg, addr_reg, 0);
         }
         break;
-    case 2:
+    case MO_UL:
     default:
         tcg_out_ld32_12(s, COND_AL, data_reg, addr_reg, 0);
         if (bswap) {
             tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
         }
         break;
-    case 3:
+    case MO_Q:
         if (use_armv6_instructions && !bswap
             && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
             tcg_out_ldrd_8(s, COND_AL, data_reg, addr_reg, 0);
@@ -1513,12 +1510,13 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg, data_reg2;
-    bool bswap;
+    TCGMemOp bswap = opc & MO_BSWAP;
+    TCGMemOp s_bits = opc & MO_SIZE;
 #ifdef CONFIG_SOFTMMU
-    int mem_index, s_bits;
+    int mem_index;
     TCGReg addr_reg2, addend;
     uint8_t *label_ptr;
 #endif
@@ -1529,20 +1527,19 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 #endif
 
     data_reg = *args++;
-    data_reg2 = (opc == 3 ? *args++ : 0);
+    data_reg2 = (s_bits == MO_64 ? *args++ : 0);
     addr_reg = *args++;
 #ifdef CONFIG_SOFTMMU
     addr_reg2 = (TARGET_LONG_BITS == 64 ? *args++ : 0);
     mem_index = *args;
-    s_bits = opc & 3;
 
     addend = tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits, mem_index, 0);
 
-    switch (opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         tcg_out_st8_r(s, COND_EQ, data_reg, addr_reg, addend);
         break;
-    case 1:
+    case MO_16:
         if (bswap) {
             tcg_out_bswap16st(s, COND_EQ, TCG_REG_R0, data_reg);
             tcg_out_st16_r(s, COND_EQ, TCG_REG_R0, addr_reg, addend);
@@ -1550,7 +1547,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_st16_r(s, COND_EQ, data_reg, addr_reg, addend);
         }
         break;
-    case 2:
+    case MO_32:
     default:
         if (bswap) {
             tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
@@ -1559,7 +1556,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_st32_r(s, COND_EQ, data_reg, addr_reg, addend);
         }
         break;
-    case 3:
+    case MO_64:
         if (bswap) {
             tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg2);
             tcg_out_st32_rwb(s, COND_EQ, TCG_REG_R0, addend, addr_reg);
@@ -1597,11 +1594,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
             offset &= ~(0xff << i);
         }
     }
-    switch (opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         tcg_out_st8_12(s, COND_AL, data_reg, addr_reg, 0);
         break;
-    case 1:
+    case MO_16:
         if (bswap) {
             tcg_out_bswap16st(s, COND_AL, TCG_REG_R0, data_reg);
             tcg_out_st16_8(s, COND_AL, TCG_REG_R0, addr_reg, 0);
@@ -1609,7 +1606,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_st16_8(s, COND_AL, data_reg, addr_reg, 0);
         }
         break;
-    case 2:
+    case MO_32:
     default:
         if (bswap) {
             tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
@@ -1618,7 +1615,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_st32_12(s, COND_AL, data_reg, addr_reg, 0);
         }
         break;
-    case 3:
+    case MO_64:
         if (bswap) {
             tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg2);
             tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addr_reg, 0);
@@ -1902,35 +1899,35 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
 
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     case INDEX_op_bswap16_i32:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 05/16] tcg-s390: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (3 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 04/16] tcg-arm: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 06/16] tcg-ppc: " Richard Henderson
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/s390/tcg-target.c | 107 ++++++++++++++++++++++----------------------------
 1 file changed, 46 insertions(+), 61 deletions(-)

diff --git a/tcg/s390/tcg-target.c b/tcg/s390/tcg-target.c
index 1b44aee..e640f86 100644
--- a/tcg/s390/tcg-target.c
+++ b/tcg/s390/tcg-target.c
@@ -224,16 +224,6 @@ typedef enum S390Opcode {
     RX_STH      = 0x40,
 } S390Opcode;
 
-#define LD_SIGNED      0x04
-#define LD_UINT8       0x00
-#define LD_INT8        (LD_UINT8 | LD_SIGNED)
-#define LD_UINT16      0x01
-#define LD_INT16       (LD_UINT16 | LD_SIGNED)
-#define LD_UINT32      0x02
-#define LD_INT32       (LD_UINT32 | LD_SIGNED)
-#define LD_UINT64      0x03
-#define LD_INT64       (LD_UINT64 | LD_SIGNED)
-
 #ifndef NDEBUG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "%r0", "%r1", "%r2", "%r3", "%r4", "%r5", "%r6", "%r7",
@@ -1284,22 +1274,19 @@ static void tgen_calli(TCGContext *s, tcg_target_long dest)
     }
 }
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
+static void tcg_out_qemu_ld_direct(TCGContext *s, TCGMemOp opc, TCGReg data,
                                    TCGReg base, TCGReg index, int disp)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bswap = 0;
-#else
-    const int bswap = 1;
-#endif
-    switch (opc) {
-    case LD_UINT8:
+    const TCGMemOp bswap = opc & MO_BSWAP;
+
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_insn(s, RXY, LLGC, data, base, index, disp);
         break;
-    case LD_INT8:
+    case MO_SB:
         tcg_out_insn(s, RXY, LGB, data, base, index, disp);
         break;
-    case LD_UINT16:
+    case MO_UW:
         if (bswap) {
             /* swapped unsigned halfword load with upper bits zeroed */
             tcg_out_insn(s, RXY, LRVH, data, base, index, disp);
@@ -1308,7 +1295,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
             tcg_out_insn(s, RXY, LLGH, data, base, index, disp);
         }
         break;
-    case LD_INT16:
+    case MO_SW:
         if (bswap) {
             /* swapped sign-extended halfword load */
             tcg_out_insn(s, RXY, LRVH, data, base, index, disp);
@@ -1317,7 +1304,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
             tcg_out_insn(s, RXY, LGH, data, base, index, disp);
         }
         break;
-    case LD_UINT32:
+    case MO_UL:
         if (bswap) {
             /* swapped unsigned int load with upper bits zeroed */
             tcg_out_insn(s, RXY, LRV, data, base, index, disp);
@@ -1326,7 +1313,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
             tcg_out_insn(s, RXY, LLGF, data, base, index, disp);
         }
         break;
-    case LD_INT32:
+    case MO_SL:
         if (bswap) {
             /* swapped sign-extended int load */
             tcg_out_insn(s, RXY, LRV, data, base, index, disp);
@@ -1335,7 +1322,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
             tcg_out_insn(s, RXY, LGF, data, base, index, disp);
         }
         break;
-    case LD_UINT64:
+    case MO_Q:
         if (bswap) {
             tcg_out_insn(s, RXY, LRVG, data, base, index, disp);
         } else {
@@ -1347,23 +1334,20 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int opc, TCGReg data,
     }
 }
 
-static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data,
+static void tcg_out_qemu_st_direct(TCGContext *s, TCGMemOp opc, TCGReg data,
                                    TCGReg base, TCGReg index, int disp)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bswap = 0;
-#else
-    const int bswap = 1;
-#endif
-    switch (opc) {
-    case LD_UINT8:
+    const TCGMemOp bswap = opc & MO_BSWAP;
+
+    switch (opc & MO_SIZE) {
+    case MO_8:
         if (disp >= 0 && disp < 0x1000) {
             tcg_out_insn(s, RX, STC, data, base, index, disp);
         } else {
             tcg_out_insn(s, RXY, STCY, data, base, index, disp);
         }
         break;
-    case LD_UINT16:
+    case MO_16:
         if (bswap) {
             tcg_out_insn(s, RXY, STRVH, data, base, index, disp);
         } else if (disp >= 0 && disp < 0x1000) {
@@ -1372,7 +1356,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data,
             tcg_out_insn(s, RXY, STHY, data, base, index, disp);
         }
         break;
-    case LD_UINT32:
+    case MO_32:
         if (bswap) {
             tcg_out_insn(s, RXY, STRV, data, base, index, disp);
         } else if (disp >= 0 && disp < 0x1000) {
@@ -1381,7 +1365,7 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data,
             tcg_out_insn(s, RXY, STY, data, base, index, disp);
         }
         break;
-    case LD_UINT64:
+    case MO_64:
         if (bswap) {
             tcg_out_insn(s, RXY, STRVG, data, base, index, disp);
         } else {
@@ -1395,14 +1379,15 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int opc, TCGReg data,
 
 #if defined(CONFIG_SOFTMMU)
 static TCGReg tcg_prepare_qemu_ldst(TCGContext* s, TCGReg data_reg,
-                                    TCGReg addr_reg, int mem_index, int opc,
-                                    uint16_t **label2_ptr_p, int is_store)
+                                    TCGReg addr_reg, int mem_index,
+                                    TCGMemOp opc, uint16_t **label2_ptr_p,
+                                    int is_store)
 {
     const TCGReg arg0 = tcg_target_call_iarg_regs[0];
     const TCGReg arg1 = tcg_target_call_iarg_regs[1];
     const TCGReg arg2 = tcg_target_call_iarg_regs[2];
     const TCGReg arg3 = tcg_target_call_iarg_regs[3];
-    int s_bits = opc & 3;
+    TCGMemOp s_bits = opc & MO_SIZE;
     uint16_t *label1_ptr;
     tcg_target_long ofs;
 
@@ -1446,17 +1431,17 @@ static TCGReg tcg_prepare_qemu_ldst(TCGContext* s, TCGReg data_reg,
     if (is_store) {
         /* Make sure to zero-extend the value to the full register
            for the calling convention.  */
-        switch (opc) {
-        case LD_UINT8:
+        switch (opc & MO_SIZE) {
+        case MO_8:
             tgen_ext8u(s, TCG_TYPE_I64, arg2, data_reg);
             break;
-        case LD_UINT16:
+        case MO_16:
             tgen_ext16u(s, TCG_TYPE_I64, arg2, data_reg);
             break;
-        case LD_UINT32:
+        case MO_32:
             tgen_ext32u(s, arg2, data_reg);
             break;
-        case LD_UINT64:
+        case MO_64:
             tcg_out_mov(s, TCG_TYPE_I64, arg2, data_reg);
             break;
         default:
@@ -1471,14 +1456,14 @@ static TCGReg tcg_prepare_qemu_ldst(TCGContext* s, TCGReg data_reg,
         tgen_calli(s, (tcg_target_ulong)qemu_ld_helpers[s_bits]);
 
         /* sign extension */
-        switch (opc) {
-        case LD_INT8:
+        switch (opc & MO_SSIZE) {
+        case MO_SB:
             tgen_ext8s(s, TCG_TYPE_I64, data_reg, TCG_REG_R2);
             break;
-        case LD_INT16:
+        case MO_SW:
             tgen_ext16s(s, TCG_TYPE_I64, data_reg, TCG_REG_R2);
             break;
-        case LD_INT32:
+        case MO_SL:
             tgen_ext32s(s, data_reg, TCG_REG_R2);
             break;
         default:
@@ -1531,7 +1516,7 @@ static void tcg_prepare_user_ldst(TCGContext *s, TCGReg *addr_reg,
 
 /* load data with address translation (if applicable)
    and endianness conversion */
-static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* args, int opc)
+static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg;
 #if defined(CONFIG_SOFTMMU)
@@ -1560,7 +1545,7 @@ static void tcg_out_qemu_ld(TCGContext* s, const TCGArg* args, int opc)
 #endif
 }
 
-static void tcg_out_qemu_st(TCGContext* s, const TCGArg* args, int opc)
+static void tcg_out_qemu_st(TCGContext* s, const TCGArg* args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg;
 #if defined(CONFIG_SOFTMMU)
@@ -1833,36 +1818,36 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, LD_UINT8);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, LD_INT8);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, LD_UINT16);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, LD_INT16);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
         /* ??? Technically we can use a non-extending instruction.  */
-        tcg_out_qemu_ld(s, args, LD_UINT32);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, LD_UINT64);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
 
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, LD_UINT8);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, LD_UINT16);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, LD_UINT32);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, LD_UINT64);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     case INDEX_op_mov_i64:
@@ -2066,10 +2051,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld32u:
-        tcg_out_qemu_ld(s, args, LD_UINT32);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, LD_INT32);
+        tcg_out_qemu_ld(s, args, MO_TESL);
         break;
 
     OP_32_64(deposit):
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 06/16] tcg-ppc: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (4 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 05/16] tcg-s390: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 07/16] tcg-ppc64: " Richard Henderson
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc/tcg-target.c | 93 ++++++++++++++++++++++++----------------------------
 1 file changed, 43 insertions(+), 50 deletions(-)

diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 82cedce..2fa99b8 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -522,7 +522,7 @@ static void tcg_out_call (TCGContext *s, tcg_target_long arg, int const_arg)
 
 static void add_qemu_ldst_label (TCGContext *s,
                                  int is_ld,
-                                 int opc,
+                                 TCGMemOp opc,
                                  int data_reg,
                                  int data_reg2,
                                  int addrlo_reg,
@@ -576,7 +576,7 @@ static const void * const qemu_st_helpers[4] = {
    Clobbers R1 and R2.  */
 
 static void tcg_out_tlb_check(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
-                              TCGReg addrlo, TCGReg addrhi, int s_bits,
+                              TCGReg addrlo, TCGReg addrhi, TCGMemOp s_bits,
                               int mem_index, int is_load, uint8_t **label_ptr)
 {
     int cmp_off =
@@ -648,10 +648,11 @@ static void tcg_out_tlb_check(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
 }
 #endif
 
-static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addrlo, datalo, datahi, rbase;
-    int bswap;
+    TCGMemOp bswap = opc & MO_BSWAP;
+    TCGMemOp s_bits = opc & MO_SIZE;
 #ifdef CONFIG_SOFTMMU
     int mem_index;
     TCGReg addrhi;
@@ -659,7 +660,7 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 #endif
 
     datalo = *args++;
-    datahi = (opc == 3 ? *args++ : 0);
+    datahi = (s_bits == MO_64 ? *args++ : 0);
     addrlo = *args++;
 
 #ifdef CONFIG_SOFTMMU
@@ -667,31 +668,25 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
     mem_index = *args;
 
     tcg_out_tlb_check(s, TCG_REG_R3, TCG_REG_R4, TCG_REG_R0, addrlo,
-                      addrhi, opc & 3, mem_index, 0, &label_ptr);
+                      addrhi, s_bits, mem_index, 0, &label_ptr);
     rbase = TCG_REG_R3;
 #else  /* !CONFIG_SOFTMMU */
     rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
 #endif
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    bswap = 0;
-#else
-    bswap = 1;
-#endif
-
-    switch (opc) {
+    switch (opc & MO_SSIZE) {
     default:
-    case 0:
+    case MO_UB:
         tcg_out32(s, LBZX | TAB(datalo, rbase, addrlo));
         break;
-    case 0|4:
+    case MO_SB:
         tcg_out32(s, LBZX | TAB(datalo, rbase, addrlo));
         tcg_out32(s, EXTSB | RA(datalo) | RS(datalo));
         break;
-    case 1:
+    case MO_UW:
         tcg_out32(s, (bswap ? LHBRX : LHZX) | TAB(datalo, rbase, addrlo));
         break;
-    case 1|4:
+    case MO_SW:
         if (bswap) {
             tcg_out32(s, LHBRX | TAB(datalo, rbase, addrlo));
             tcg_out32(s, EXTSH | RA(datalo) | RS(datalo));
@@ -699,10 +694,10 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
             tcg_out32(s, LHAX | TAB(datalo, rbase, addrlo));
         }
         break;
-    case 2:
+    case MO_UL:
         tcg_out32(s, (bswap ? LWBRX : LWZX) | TAB(datalo, rbase, addrlo));
         break;
-    case 3:
+    case MO_Q:
         if (bswap) {
             tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4);
             tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo));
@@ -726,10 +721,11 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
 #endif
 }
 
-static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addrlo, datalo, datahi, rbase;
-    int bswap;
+    TCGMemOp bswap = opc & MO_BSWAP;
+    TCGMemOp s_bits = opc & MO_SIZE;
 #ifdef CONFIG_SOFTMMU
     int mem_index;
     TCGReg addrhi;
@@ -737,7 +733,7 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
 #endif
 
     datalo = *args++;
-    datahi = (opc == 3 ? *args++ : 0);
+    datahi = (s_bits == MO_64 ? *args++ : 0);
     addrlo = *args++;
 
 #ifdef CONFIG_SOFTMMU
@@ -745,28 +741,24 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
     mem_index = *args;
 
     tcg_out_tlb_check(s, TCG_REG_R3, TCG_REG_R4, TCG_REG_R0, addrlo,
-                      addrhi, opc & 3, mem_index, 0, &label_ptr);
+                      addrhi, s_bits, mem_index, 0, &label_ptr);
     rbase = TCG_REG_R3;
 #else  /* !CONFIG_SOFTMMU */
     rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
 #endif
 
-#ifdef TARGET_WORDS_BIGENDIAN
-    bswap = 0;
-#else
-    bswap = 1;
-#endif
-    switch (opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         tcg_out32(s, STBX | SAB(datalo, rbase, addrlo));
         break;
-    case 1:
+    case MO_16:
         tcg_out32(s, (bswap ? STHBRX : STHX) | SAB(datalo, rbase, addrlo));
         break;
-    case 2:
+    case MO_32:
+    default:
         tcg_out32(s, (bswap ? STWBRX : STWX) | SAB(datalo, rbase, addrlo));
         break;
-    case 3:
+    case MO_64:
         if (bswap) {
             tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4);
             tcg_out32(s, STWBRX | SAB(datalo, rbase, addrlo));
@@ -792,7 +784,7 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
     TCGReg ir, datalo, datahi;
-    int opc = lb->opc;
+    TCGMemOp opc = lb->opc & MO_SSIZE;
 
     reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
 
@@ -812,14 +804,14 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index);
     tcg_out32(s, MFSPR | RT(ir++) | LR);
 
-    tcg_out_call(s, (uintptr_t)qemu_ld_helpers[opc & 3], 1);
+    tcg_out_call(s, (uintptr_t)qemu_ld_helpers[opc & MO_SIZE], 1);
 
     datalo = lb->datalo_reg;
     switch (opc) {
-    case 0|4:
+    case MO_SB:
         tcg_out32(s, EXTSB | RA(datalo) | RS(TCG_REG_R3));
         break;
-    case 1|4:
+    case MO_SW:
         tcg_out32(s, EXTSH | RA(datalo) | RS(TCG_REG_R3));
         break;
 
@@ -827,7 +819,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
         tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R3);
         break;
 
-    case 3:
+    case MO_Q:
         datahi = lb->datahi_reg;
         if (datalo != TCG_REG_R3) {
             tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4);
@@ -850,6 +842,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
     TCGReg ir;
+    TCGMemOp s_bits = lb->opc & MO_SIZE;
 
     reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
 
@@ -866,7 +859,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
         tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
     }
 
-    if (lb->opc != 3) {
+    if (s_bits != MO_64) {
         tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg);
     } else {
 #ifdef TCG_TARGET_CALL_ALIGN_ARGS
@@ -879,7 +872,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index);
     tcg_out32(s, MFSPR | RT(ir++) | LR);
 
-    tcg_out_call(s, (uintptr_t)qemu_st_helpers[lb->opc], 1);
+    tcg_out_call(s, (uintptr_t)qemu_st_helpers[s_bits], 1);
 
     tcg_out_b(s, 0, (uintptr_t)lb->raddr);
 }
@@ -1705,34 +1698,34 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     case INDEX_op_ext8s_i32:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 07/16] tcg-ppc64: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (5 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 06/16] tcg-ppc: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 08/16] tcg-hppa: " Richard Henderson
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/ppc64/tcg-target.c | 82 +++++++++++++++++++++++++++-----------------------
 1 file changed, 44 insertions(+), 38 deletions(-)

diff --git a/tcg/ppc64/tcg-target.c b/tcg/ppc64/tcg-target.c
index 73b7034..7da6a2f 100644
--- a/tcg/ppc64/tcg-target.c
+++ b/tcg/ppc64/tcg-target.c
@@ -807,22 +807,28 @@ static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
     }
 }
 
-static const uint32_t qemu_ldx_opc[8] = {
-#ifdef TARGET_WORDS_BIGENDIAN
-    LBZX, LHZX, LWZX, LDX,
-    0,    LHAX, LWAX, LDX
-#else
-    LBZX, LHBRX, LWBRX, LDBRX,
-    0,    0,     0,     LDBRX,
-#endif
+static const uint32_t qemu_ldx_opc[16] = {
+    [MO_UB] = LBZX,
+    [MO_UW] = LHZX,
+    [MO_UL] = LWZX,
+    [MO_Q]  = LDX,
+    [MO_SW] = LHAX,
+    [MO_SL] = LWAX,
+    [MO_BSWAP | MO_UB] = LBZX,
+    [MO_BSWAP | MO_UW] = LHBRX,
+    [MO_BSWAP | MO_UL] = LWBRX,
+    [MO_BSWAP | MO_Q]  = LDBRX,
 };
 
-static const uint32_t qemu_stx_opc[4] = {
-#ifdef TARGET_WORDS_BIGENDIAN
-    STBX, STHX, STWX, STDX
-#else
-    STBX, STHBRX, STWBRX, STDBRX,
-#endif
+static const uint32_t qemu_stx_opc[16] = {
+    [MO_UB] = STBX,
+    [MO_UW] = STHX,
+    [MO_UL] = STWX,
+    [MO_Q]  = STDX,
+    [MO_BSWAP | MO_UB] = STBX,
+    [MO_BSWAP | MO_UW] = STHBRX,
+    [MO_BSWAP | MO_UL] = STWBRX,
+    [MO_BSWAP | MO_Q]  = STDBRX,
 };
 
 static const uint32_t qemu_exts_opc[4] = {
@@ -854,7 +860,7 @@ static const void * const qemu_st_helpers[4] = {
    in CR7, loads the addend of the TLB into R3, and returns the register
    containing the guest address (zero-extended into R4).  Clobbers R0 and R2. */
 
-static TCGReg tcg_out_tlb_read(TCGContext *s, int s_bits, TCGReg addr_reg,
+static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits, TCGReg addr_reg,
                                int mem_index, bool is_read)
 {
     int cmp_off
@@ -927,7 +933,7 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, int s_bits, TCGReg addr_reg,
 /* Record the context of a call to the out of line helper code for the slow
    path for a load or store, so that we can later generate the correct
    helper code.  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, int opc,
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOp opc,
                                 int data_reg, int addr_reg, int mem_index,
                                 uint8_t *raddr, uint8_t *label_ptr)
 {
@@ -951,8 +957,8 @@ static void add_qemu_ldst_label(TCGContext *s, bool is_ld, int opc,
 
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-    int opc = lb->opc;
-    int s_bits = opc & 3;
+    TCGMemOp opc = lb->opc & MO_SSIZE;
+    TCGMemOp s_bits = lb->opc & MO_SIZE;
 
     reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
 
@@ -967,7 +973,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
     tcg_out_call(s, (tcg_target_long)qemu_ld_helpers[s_bits], 1);
 
-    if (opc & 4) {
+    if (opc & MO_SIGN) {
         uint32_t insn = qemu_exts_opc[s_bits];
         tcg_out32(s, insn | RA(lb->datalo_reg) | RS(TCG_REG_R3));
     } else {
@@ -979,7 +985,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-    int opc = lb->opc;
+    TCGMemOp s_bits = lb->opc & MO_SIZE;
 
     reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
 
@@ -990,11 +996,11 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     tcg_out_mov(s, TCG_TYPE_I64, TCG_REG_R4, lb->addrlo_reg);
 
     tcg_out_rld(s, RLDICL, TCG_REG_R5, lb->datalo_reg,
-                0, 64 - (1 << (3 + opc)));
+                0, 64 - (1 << (3 + s_bits)));
     tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_R6, lb->mem_index);
     tcg_out32(s, MFSPR | RT(TCG_REG_R7) | LR);
 
-    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[opc], 1);
+    tcg_out_call(s, (tcg_target_long)qemu_st_helpers[s_bits], 1);
 
     tcg_out_b(s, 0, (uintptr_t)lb->raddr);
 }
@@ -1015,10 +1021,11 @@ void tcg_out_tb_finalize(TCGContext *s)
 }
 #endif /* SOFTMMU */
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_reg, data_reg, rbase;
-    uint32_t insn, s_bits;
+    uint32_t insn;
+    TCGMemOp s_bits = opc & MO_SIZE;
 #ifdef CONFIG_SOFTMMU
     int mem_index;
     void *label_ptr;
@@ -1026,7 +1033,6 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 
     data_reg = *args++;
     addr_reg = *args++;
-    s_bits = opc & 3;
 
 #ifdef CONFIG_SOFTMMU
     mem_index = *args;
@@ -1055,7 +1061,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     } else if (insn) {
         tcg_out32(s, insn | TAB(data_reg, rbase, addr_reg));
     } else {
-        insn = qemu_ldx_opc[s_bits];
+        insn = qemu_ldx_opc[opc & (MO_SIZE | MO_BSWAP)];
         tcg_out32(s, insn | TAB(data_reg, rbase, addr_reg));
         insn = qemu_exts_opc[s_bits];
         tcg_out32(s, insn | RA(data_reg) | RS(data_reg));
@@ -1067,7 +1073,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 #endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_reg, rbase, data_reg;
     uint32_t insn;
@@ -1847,38 +1853,38 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
     case INDEX_op_qemu_ld32u:
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, 2 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     case INDEX_op_ext8s_i32:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 08/16] tcg-hppa: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (6 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 07/16] tcg-ppc64: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 09/16] tcg-mips: " Richard Henderson
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Untested.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/hppa/tcg-target.c | 110 ++++++++++++++++++++++++--------------------------
 1 file changed, 53 insertions(+), 57 deletions(-)

diff --git a/tcg/hppa/tcg-target.c b/tcg/hppa/tcg-target.c
index 236b39c..b13c2d8 100644
--- a/tcg/hppa/tcg-target.c
+++ b/tcg/hppa/tcg-target.c
@@ -935,7 +935,8 @@ static const void * const qemu_st_helpers[4] = {
    return value is 0, R1 is not used.  */
 
 static int tcg_out_tlb_read(TCGContext *s, int r0, int r1, int addrlo,
-                            int addrhi, int s_bits, int lab_miss, int offset)
+                            int addrhi, TCGMemOp s_bits, int lab_miss,
+                            int offset)
 {
     int ret;
 
@@ -1020,30 +1021,27 @@ static int tcg_out_arg_reg64(TCGContext *s, int argno, TCGArg vl, TCGArg vh)
 }
 #endif
 
-static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo_reg, int datahi_reg,
-                                   int addr_reg, int addend_reg, int opc)
+static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo_reg,
+                                   int datahi_reg, int addr_reg,
+                                   int addend_reg, TCGMemOp opc)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bswap = 0;
-#else
-    const int bswap = 1;
-#endif
+    const TCGMemOp bswap = opc & MO_BSWAP;
 
-    switch (opc) {
-    case 0:
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_ldst_index(s, datalo_reg, addr_reg, addend_reg, INSN_LDBX);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ldst_index(s, datalo_reg, addr_reg, addend_reg, INSN_LDBX);
         tcg_out_ext8s(s, datalo_reg, datalo_reg);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_ldst_index(s, datalo_reg, addr_reg, addend_reg, INSN_LDHX);
         if (bswap) {
             tcg_out_bswap16(s, datalo_reg, datalo_reg, 0);
         }
         break;
-    case 1 | 4:
+    case MO_SW:
         tcg_out_ldst_index(s, datalo_reg, addr_reg, addend_reg, INSN_LDHX);
         if (bswap) {
             tcg_out_bswap16(s, datalo_reg, datalo_reg, 1);
@@ -1051,13 +1049,13 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo_reg, int datahi_reg
             tcg_out_ext16s(s, datalo_reg, datalo_reg);
         }
         break;
-    case 2:
+    case MO_UL:
         tcg_out_ldst_index(s, datalo_reg, addr_reg, addend_reg, INSN_LDWX);
         if (bswap) {
             tcg_out_bswap32(s, datalo_reg, datalo_reg, TCG_REG_R20);
         }
         break;
-    case 3:
+    case MO_Q:
         if (bswap) {
             int t = datahi_reg;
             datahi_reg = datalo_reg;
@@ -1087,11 +1085,12 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, int datalo_reg, int datahi_reg
     }
 }
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
+    TCGMemOp s_bits = opc & MO_SIZE;
     int datalo_reg = *args++;
     /* Note that datahi_reg is only used for 64-bit loads.  */
-    int datahi_reg = (opc == 3 ? *args++ : TCG_REG_R0);
+    int datahi_reg = (s_bits == MO_64 ? *args++ : TCG_REG_R0);
     int addrlo_reg = *args++;
 
 #if defined(CONFIG_SOFTMMU)
@@ -1105,7 +1104,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 
     offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_read);
     offset = tcg_out_tlb_read(s, TCG_REG_R26, TCG_REG_R25, addrlo_reg,
-                              addrhi_reg, opc & 3, lab1, offset);
+                              addrhi_reg, s_bits, lab1, offset);
 
     /* TLB Hit.  */
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R20,
@@ -1128,26 +1127,26 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     }
     argno = tcg_out_arg_reg32(s, argno, mem_index, true);
 
-    tcg_out_call(s, qemu_ld_helpers[opc & 3]);
+    tcg_out_call(s, qemu_ld_helpers[s_bits]);
 
-    switch (opc) {
-    case 0:
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_andi(s, datalo_reg, TCG_REG_RET0, 0xff);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ext8s(s, datalo_reg, TCG_REG_RET0);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_andi(s, datalo_reg, TCG_REG_RET0, 0xffff);
         break;
-    case 1 | 4:
+    case MO_SW:
         tcg_out_ext16s(s, datalo_reg, TCG_REG_RET0);
         break;
-    case 2:
-    case 2 | 4:
+    case MO_UL:
+    case MO_SL:
         tcg_out_mov(s, TCG_TYPE_I32, datalo_reg, TCG_REG_RET0);
         break;
-    case 3:
+    case MO_Q:
         tcg_out_mov(s, TCG_TYPE_I32, datahi_reg, TCG_REG_RET0);
         tcg_out_mov(s, TCG_TYPE_I32, datalo_reg, TCG_REG_RET1);
         break;
@@ -1164,33 +1163,29 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
 }
 
 static void tcg_out_qemu_st_direct(TCGContext *s, int datalo_reg,
-                                   int datahi_reg, int addr_reg, int opc)
+                                   int datahi_reg, int addr_reg, TCGMemOp opc)
 {
-#ifdef TARGET_WORDS_BIGENDIAN
-    const int bswap = 0;
-#else
-    const int bswap = 1;
-#endif
+    const TCGMemOp bswap = opc & MO_BSWAP;
 
-    switch (opc) {
-    case 0:
+    switch (opc & MO_SIZE) {
+    case MO_8:
         tcg_out_ldst(s, datalo_reg, addr_reg, 0, INSN_STB);
         break;
-    case 1:
+    case MO_16:
         if (bswap) {
             tcg_out_bswap16(s, TCG_REG_R20, datalo_reg, 0);
             datalo_reg = TCG_REG_R20;
         }
         tcg_out_ldst(s, datalo_reg, addr_reg, 0, INSN_STH);
         break;
-    case 2:
+    case MO_32:
         if (bswap) {
             tcg_out_bswap32(s, TCG_REG_R20, datalo_reg, TCG_REG_R20);
             datalo_reg = TCG_REG_R20;
         }
         tcg_out_ldst(s, datalo_reg, addr_reg, 0, INSN_STW);
         break;
-    case 3:
+    case MO_64:
         if (bswap) {
             tcg_out_bswap32(s, TCG_REG_R20, datalo_reg, TCG_REG_R20);
             tcg_out_bswap32(s, TCG_REG_R23, datahi_reg, TCG_REG_R23);
@@ -1206,11 +1201,12 @@ static void tcg_out_qemu_st_direct(TCGContext *s, int datalo_reg,
 
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
+    TCGMemOp s_bits = opc & MO_SIZE;
     int datalo_reg = *args++;
     /* Note that datahi_reg is only used for 64-bit loads.  */
-    int datahi_reg = (opc == 3 ? *args++ : TCG_REG_R0);
+    int datahi_reg = (s_bits == MO_64 ? *args++ : TCG_REG_R0);
     int addrlo_reg = *args++;
 
 #if defined(CONFIG_SOFTMMU)
@@ -1224,7 +1220,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
 
     offset = offsetof(CPUArchState, tlb_table[mem_index][0].addr_write);
     offset = tcg_out_tlb_read(s, TCG_REG_R26, TCG_REG_R25, addrlo_reg,
-                              addrhi_reg, opc, lab1, offset);
+                              addrhi_reg, s_bits, lab1, offset);
 
     /* TLB Hit.  */
     tcg_out_ld(s, TCG_TYPE_PTR, TCG_REG_R20,
@@ -1250,19 +1246,19 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
     }
 
     next = (argno < 4 ? tcg_target_call_iarg_regs[argno] : TCG_REG_R20);
-    switch(opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         tcg_out_andi(s, next, datalo_reg, 0xff);
         argno = tcg_out_arg_reg32(s, argno, next, false);
         break;
-    case 1:
+    case MO_16:
         tcg_out_andi(s, next, datalo_reg, 0xffff);
         argno = tcg_out_arg_reg32(s, argno, next, false);
         break;
-    case 2:
+    case MO_32:
         argno = tcg_out_arg_reg32(s, argno, datalo_reg, false);
         break;
-    case 3:
+    case MO_64:
         argno = tcg_out_arg_reg64(s, argno, datalo_reg, datahi_reg);
         break;
     default:
@@ -1270,7 +1266,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
     }
     argno = tcg_out_arg_reg32(s, argno, mem_index, true);
 
-    tcg_out_call(s, qemu_st_helpers[opc]);
+    tcg_out_call(s, qemu_st_helpers[s_bits]);
 
     /* label2: */
     tcg_out_label(s, lab2, s->code_ptr);
@@ -1534,35 +1530,35 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
 
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     default:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 09/16] tcg-mips: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (7 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 08/16] tcg-hppa: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 10/16] tcg-sparc: " Richard Henderson
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Untested.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/mips/tcg-target.c | 116 ++++++++++++++++++++++----------------------------
 1 file changed, 52 insertions(+), 64 deletions(-)

diff --git a/tcg/mips/tcg-target.c b/tcg/mips/tcg-target.c
index 5f0a65b..3ef5487 100644
--- a/tcg/mips/tcg-target.c
+++ b/tcg/mips/tcg-target.c
@@ -24,12 +24,6 @@
  * THE SOFTWARE.
  */
 
-#if defined(TCG_TARGET_WORDS_BIGENDIAN) == defined(TARGET_WORDS_BIGENDIAN)
-# define TCG_NEED_BSWAP 0
-#else
-# define TCG_NEED_BSWAP 1
-#endif
-
 #ifndef NDEBUG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "zero",
@@ -938,10 +932,11 @@ static const void * const qemu_st_helpers[4] = {
 };
 #endif
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_regl, data_regl, data_regh, data_reg1, data_reg2;
+    TCGMemOp s_bits = opc & MO_SIZE;
+    TCGMemOp bswap = opc & MO_BSWAP;
 #if defined(CONFIG_SOFTMMU)
     void *label1_ptr, *label2_ptr;
     int arg_num;
@@ -954,10 +949,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
 # endif
 #endif
     data_regl = *args++;
-    if (opc == 3)
-        data_regh = *args++;
-    else
-        data_regh = 0;
+    data_regh = (s_bits == MO_64 ? *args++ : 0);
     addr_regl = *args++;
 #if defined(CONFIG_SOFTMMU)
 # if TARGET_LONG_BITS == 64
@@ -973,10 +965,9 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
     addr_meml = 0;
 # endif
     mem_index = *args;
-    s_bits = opc & 3;
 #endif
 
-    if (opc == 3) {
+    if (s_bits == MO_64) {
 #if defined(TCG_TARGET_WORDS_BIGENDIAN)
         data_reg1 = data_regh;
         data_reg2 = data_regl;
@@ -988,6 +979,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
         data_reg1 = data_regl;
         data_reg2 = 0;
     }
+
 #if defined(CONFIG_SOFTMMU)
     tcg_out_opc_sa(s, OPC_SRL, TCG_REG_A0, addr_regl, TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
     tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_A0, TCG_REG_A0, (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS);
@@ -1029,23 +1021,23 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
     tcg_out_opc_reg(s, OPC_JALR, TCG_REG_RA, TCG_REG_T9, 0);
     tcg_out_nop(s);
 
-    switch(opc) {
-    case 0:
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_opc_imm(s, OPC_ANDI, data_reg1, TCG_REG_V0, 0xff);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_ext8s(s, data_reg1, TCG_REG_V0);
         break;
-    case 1:
+    case MO_UW:
         tcg_out_opc_imm(s, OPC_ANDI, data_reg1, TCG_REG_V0, 0xffff);
         break;
-    case 1 | 4:
+    case MO_SW:
         tcg_out_ext16s(s, data_reg1, TCG_REG_V0);
         break;
-    case 2:
+    case MO_UL:
         tcg_out_mov(s, TCG_TYPE_I32, data_reg1, TCG_REG_V0);
         break;
-    case 3:
+    case MO_Q:
         tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_V1);
         tcg_out_mov(s, TCG_TYPE_I32, data_reg1, TCG_REG_V0);
         break;
@@ -1072,39 +1064,39 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
     }
 #endif
 
-    switch(opc) {
-    case 0:
+    switch (opc & MO_SSIZE) {
+    case MO_UB:
         tcg_out_opc_imm(s, OPC_LBU, data_reg1, TCG_REG_V0, 0);
         break;
-    case 0 | 4:
+    case MO_SB:
         tcg_out_opc_imm(s, OPC_LB, data_reg1, TCG_REG_V0, 0);
         break;
-    case 1:
-        if (TCG_NEED_BSWAP) {
+    case MO_UW:
+        if (bswap) {
             tcg_out_opc_imm(s, OPC_LHU, TCG_REG_T0, TCG_REG_V0, 0);
             tcg_out_bswap16(s, data_reg1, TCG_REG_T0);
         } else {
             tcg_out_opc_imm(s, OPC_LHU, data_reg1, TCG_REG_V0, 0);
         }
         break;
-    case 1 | 4:
-        if (TCG_NEED_BSWAP) {
+    case MO_SW:
+        if (bswap) {
             tcg_out_opc_imm(s, OPC_LHU, TCG_REG_T0, TCG_REG_V0, 0);
             tcg_out_bswap16s(s, data_reg1, TCG_REG_T0);
         } else {
             tcg_out_opc_imm(s, OPC_LH, data_reg1, TCG_REG_V0, 0);
         }
         break;
-    case 2:
-        if (TCG_NEED_BSWAP) {
+    case MO_UL:
+        if (bswap) {
             tcg_out_opc_imm(s, OPC_LW, TCG_REG_T0, TCG_REG_V0, 0);
             tcg_out_bswap32(s, data_reg1, TCG_REG_T0);
         } else {
             tcg_out_opc_imm(s, OPC_LW, data_reg1, TCG_REG_V0, 0);
         }
         break;
-    case 3:
-        if (TCG_NEED_BSWAP) {
+    case MO_Q:
+        if (bswap) {
             tcg_out_opc_imm(s, OPC_LW, TCG_REG_T0, TCG_REG_V0, 4);
             tcg_out_bswap32(s, data_reg1, TCG_REG_T0);
             tcg_out_opc_imm(s, OPC_LW, TCG_REG_T0, TCG_REG_V0, 0);
@@ -1123,10 +1115,11 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args,
 #endif
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
-                            int opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
     TCGReg addr_regl, data_regl, data_regh, data_reg1, data_reg2;
+    TCGMemOp s_bits = opc & MO_SIZE;
+    TCGMemOp bswap = opc & MO_BSWAP;
 #if defined(CONFIG_SOFTMMU)
     uint8_t *label1_ptr, *label2_ptr;
     int arg_num;
@@ -1141,11 +1134,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
 # endif
 #endif
     data_regl = *args++;
-    if (opc == 3) {
-        data_regh = *args++;
-    } else {
-        data_regh = 0;
-    }
+    data_regh = (s_bits == MO_64 ? *args++ : 0);
     addr_regl = *args++;
 #if defined(CONFIG_SOFTMMU)
 # if TARGET_LONG_BITS == 64
@@ -1161,10 +1150,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
     addr_meml = 0;
 # endif
     mem_index = *args;
-    s_bits = opc;
 #endif
 
-    if (opc == 3) {
+    if (s_bits == MO_64) {
 #if defined(TCG_TARGET_WORDS_BIGENDIAN)
         data_reg1 = data_regh;
         data_reg2 = data_regl;
@@ -1213,17 +1201,17 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
 # else
     tcg_out_call_iarg_reg32(s, &arg_num, addr_regl);
 # endif
-    switch(opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         tcg_out_call_iarg_reg8(s, &arg_num, data_regl);
         break;
-    case 1:
+    case MO_16:
         tcg_out_call_iarg_reg16(s, &arg_num, data_regl);
         break;
-    case 2:
+    case MO_32:
         tcg_out_call_iarg_reg32(s, &arg_num, data_regl);
         break;
-    case 3:
+    case MO_64:
         tcg_out_call_iarg_reg64(s, &arg_num, data_regl, data_regh);
         break;
     default:
@@ -1254,12 +1242,12 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
 
 #endif
 
-    switch(opc) {
-    case 0:
+    switch (s_bits) {
+    case MO_8:
         tcg_out_opc_imm(s, OPC_SB, data_reg1, TCG_REG_A0, 0);
         break;
-    case 1:
-        if (TCG_NEED_BSWAP) {
+    case MO_16:
+        if (bswap) {
             tcg_out_opc_imm(s, OPC_ANDI, TCG_REG_T0, data_reg1, 0xffff);
             tcg_out_bswap16(s, TCG_REG_T0, TCG_REG_T0);
             tcg_out_opc_imm(s, OPC_SH, TCG_REG_T0, TCG_REG_A0, 0);
@@ -1267,16 +1255,16 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args,
             tcg_out_opc_imm(s, OPC_SH, data_reg1, TCG_REG_A0, 0);
         }
         break;
-    case 2:
-        if (TCG_NEED_BSWAP) {
+    case MO_32:
+        if (bswap) {
             tcg_out_bswap32(s, TCG_REG_T0, data_reg1);
             tcg_out_opc_imm(s, OPC_SW, TCG_REG_T0, TCG_REG_A0, 0);
         } else {
             tcg_out_opc_imm(s, OPC_SW, data_reg1, TCG_REG_A0, 0);
         }
         break;
-    case 3:
-        if (TCG_NEED_BSWAP) {
+    case MO_64:
+        if (bswap) {
             tcg_out_bswap32(s, TCG_REG_T0, data_reg2);
             tcg_out_opc_imm(s, OPC_SW, TCG_REG_T0, TCG_REG_A0, 0);
             tcg_out_bswap32(s, TCG_REG_T0, data_reg1);
@@ -1550,34 +1538,34 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
     default:
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 10/16] tcg-sparc: Use TCGMemOp within qemu_ldst routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (8 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 09/16] tcg-mips: " Richard Henderson
@ 2013-09-04 21:04 ` Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 11/16] tcg: Add qemu_ld_st_i32/64 Richard Henderson
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Untested.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/sparc/tcg-target.c | 116 +++++++++++++++++++++++++++----------------------
 1 file changed, 65 insertions(+), 51 deletions(-)

diff --git a/tcg/sparc/tcg-target.c b/tcg/sparc/tcg-target.c
index 9574954..747510a 100644
--- a/tcg/sparc/tcg-target.c
+++ b/tcg/sparc/tcg-target.c
@@ -864,7 +864,7 @@ static const void * const qemu_st_helpers[4] = {
    is in the returned register, maybe %o0.  The TLB addend is in %o1.  */
 
 static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
-                            int s_bits, const TCGArg *args, int which)
+                            TCGMemOp s_bits, const TCGArg *args, int which)
 {
     const int addrlo = args[addrlo_idx];
     const int r0 = TCG_REG_O0;
@@ -916,32 +916,46 @@ static int tcg_out_tlb_load(TCGContext *s, int addrlo_idx, int mem_index,
 }
 #endif /* CONFIG_SOFTMMU */
 
-static const int qemu_ld_opc[8] = {
-#ifdef TARGET_WORDS_BIGENDIAN
-    LDUB, LDUH, LDUW, LDX, LDSB, LDSH, LDSW, LDX
-#else
-    LDUB, LDUH_LE, LDUW_LE, LDX_LE, LDSB, LDSH_LE, LDSW_LE, LDX_LE
-#endif
+static const int qemu_ld_opc[16] = {
+    [MO_UB]   = LDUB,
+    [MO_SB]   = LDSB,
+
+    [MO_BEUW] = LDUH,
+    [MO_BESW] = LDSH,
+    [MO_BEUL] = LDUW,
+    [MO_BESL] = LDSW,
+    [MO_BEQ]  = LDX,
+
+    [MO_LEUW] = LDUH_LE,
+    [MO_LESW] = LDSH_LE,
+    [MO_LEUL] = LDUW_LE,
+    [MO_LESL] = LDSW_LE,
+    [MO_LEQ]  = LDX_LE,
 };
 
-static const int qemu_st_opc[4] = {
-#ifdef TARGET_WORDS_BIGENDIAN
-    STB, STH, STW, STX
-#else
-    STB, STH_LE, STW_LE, STX_LE
-#endif
+static const int qemu_st_opc[16] = {
+    [MO_UB]   = STB,
+
+    [MO_BEUW] = STH,
+    [MO_BEUL] = STW,
+    [MO_BEQ]  = STX,
+
+    [MO_LEUW] = STH_LE,
+    [MO_LEUL] = STW_LE,
+    [MO_LEQ]  = STX_LE,
 };
 
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp memop)
 {
     int addrlo_idx = 1, datalo, datahi, addr_reg;
+    TCGMemOp s_bits = memop & MO_SIZE;
 #if defined(CONFIG_SOFTMMU)
-    int memi_idx, memi, s_bits, n;
+    int memi_idx, memi, n;
     uint32_t *label_ptr[2];
 #endif
 
     datahi = datalo = args[0];
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         datahi = args[1];
         addrlo_idx = 2;
     }
@@ -949,12 +963,11 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
 #if defined(CONFIG_SOFTMMU)
     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
     memi = args[memi_idx];
-    s_bits = sizeop & 3;
 
     addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
                                 offsetof(CPUTLBEntry, addr_read));
 
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         int reg64;
 
         /* bne,pn %[xi]cc, label0 */
@@ -965,7 +978,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
         /* TLB Hit.  */
         /* Load all 64-bits into an O/G register.  */
         reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
-        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
+        tcg_out_ldst_rr(s, reg64, addr_reg, TCG_REG_O1, qemu_ld_opc[memop]);
 
         /* Move the two 32-bit pieces into the destination registers.  */
         tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
@@ -987,7 +1000,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
         tcg_out_bpcc0(s, COND_E, BPCC_A | BPCC_PT
                       | (TARGET_LONG_BITS == 64 ? BPCC_XCC : BPCC_ICC), 0);
         /* delay slot */
-        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_ld_opc[sizeop]);
+        tcg_out_ldst_rr(s, datalo, addr_reg, TCG_REG_O1, qemu_ld_opc[memop]);
     }
 
     /* TLB Miss.  */
@@ -1014,29 +1027,29 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
 
     n = tcg_target_call_oarg_regs[0];
     /* datalo = sign_extend(arg0) */
-    switch (sizeop) {
-    case 0 | 4:
+    switch (memop & MO_SSIZE) {
+    case MO_SB:
         /* Recall that SRA sign extends from bit 31 through bit 63.  */
         tcg_out_arithi(s, datalo, n, 24, SHIFT_SLL);
         tcg_out_arithi(s, datalo, datalo, 24, SHIFT_SRA);
         break;
-    case 1 | 4:
+    case MO_SW:
         tcg_out_arithi(s, datalo, n, 16, SHIFT_SLL);
         tcg_out_arithi(s, datalo, datalo, 16, SHIFT_SRA);
         break;
-    case 2 | 4:
+    case MO_SL:
         tcg_out_arithi(s, datalo, n, 0, SHIFT_SRA);
         break;
-    case 3:
+    case MO_Q:
         if (TCG_TARGET_REG_BITS == 32) {
             tcg_out_mov(s, TCG_TYPE_REG, datahi, n);
             tcg_out_mov(s, TCG_TYPE_REG, datalo, n + 1);
             break;
         }
         /* FALLTHRU */
-    case 0:
-    case 1:
-    case 2:
+    case MO_UB:
+    case MO_UW:
+    case MO_UL:
     default:
         /* mov */
         tcg_out_mov(s, TCG_TYPE_REG, datalo, n);
@@ -1051,12 +1064,12 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
         tcg_out_arithi(s, TCG_REG_T1, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_T1;
     }
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         int reg64 = (datalo < 16 ? datalo : TCG_REG_O0);
 
         tcg_out_ldst_rr(s, reg64, addr_reg,
                         (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                        qemu_ld_opc[sizeop]);
+                        qemu_ld_opc[memop]);
 
         tcg_out_arithi(s, datahi, reg64, 32, SHIFT_SRLX);
         if (reg64 != datalo) {
@@ -1065,21 +1078,22 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int sizeop)
     } else {
         tcg_out_ldst_rr(s, datalo, addr_reg,
                         (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                        qemu_ld_opc[sizeop]);
+                        qemu_ld_opc[memop]);
     }
 #endif /* CONFIG_SOFTMMU */
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp memop)
 {
     int addrlo_idx = 1, datalo, datahi, addr_reg;
+    TCGMemOp s_bits = memop & MO_SIZE;
 #if defined(CONFIG_SOFTMMU)
     int memi_idx, memi, n, datafull;
     uint32_t *label_ptr;
 #endif
 
     datahi = datalo = args[0];
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         datahi = args[1];
         addrlo_idx = 2;
     }
@@ -1088,11 +1102,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     memi_idx = addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS);
     memi = args[memi_idx];
 
-    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, sizeop, args,
+    addr_reg = tcg_out_tlb_load(s, addrlo_idx, memi, s_bits, args,
                                 offsetof(CPUTLBEntry, addr_write));
 
     datafull = datalo;
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         /* Reconstruct the full 64-bit value.  */
         tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
@@ -1107,7 +1121,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     tcg_out_bpcc0(s, COND_E, BPCC_A | BPCC_PT
                   | (TARGET_LONG_BITS == 64 ? BPCC_XCC : BPCC_ICC), 0);
     /* delay slot */
-    tcg_out_ldst_rr(s, datafull, addr_reg, TCG_REG_O1, qemu_st_opc[sizeop]);
+    tcg_out_ldst_rr(s, datafull, addr_reg, TCG_REG_O1, qemu_st_opc[memop]);
 
     /* TLB Miss.  */
 
@@ -1119,13 +1133,13 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++],
                 args[addrlo_idx]);
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datahi);
     }
     tcg_out_mov(s, TCG_TYPE_REG, tcg_target_call_iarg_regs[n++], datalo);
 
     /* qemu_st_helper[s_bits](arg0, arg1, arg2) */
-    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[sizeop]
+    tcg_out32(s, CALL | ((((tcg_target_ulong)qemu_st_helpers[s_bits]
                            - (tcg_target_ulong)s->code_ptr) >> 2)
                          & 0x3fffffff));
     /* delay slot */
@@ -1139,7 +1153,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
         tcg_out_arithi(s, TCG_REG_T1, addr_reg, 0, SHIFT_SRL);
         addr_reg = TCG_REG_T1;
     }
-    if (TCG_TARGET_REG_BITS == 32 && sizeop == 3) {
+    if (TCG_TARGET_REG_BITS == 32 && s_bits == MO_64) {
         tcg_out_arithi(s, TCG_REG_T1, datalo, 0, SHIFT_SRL);
         tcg_out_arithi(s, TCG_REG_O2, datahi, 32, SHIFT_SLLX);
         tcg_out_arith(s, TCG_REG_O2, TCG_REG_T1, TCG_REG_O2, ARITH_OR);
@@ -1147,7 +1161,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int sizeop)
     }
     tcg_out_ldst_rr(s, datalo, addr_reg,
                     (GUEST_BASE ? TCG_GUEST_BASE_REG : TCG_REG_G0),
-                    qemu_st_opc[sizeop]);
+                    qemu_st_opc[memop]);
 #endif /* CONFIG_SOFTMMU */
 }
 
@@ -1344,42 +1358,42 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         break;
 
     case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, 0);
+        tcg_out_qemu_ld(s, args, MO_UB);
         break;
     case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, 0 | 4);
+        tcg_out_qemu_ld(s, args, MO_SB);
         break;
     case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, 1);
+        tcg_out_qemu_ld(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, 1 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESW);
         break;
     case INDEX_op_qemu_ld32:
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_qemu_ld32u:
 #endif
-        tcg_out_qemu_ld(s, args, 2);
+        tcg_out_qemu_ld(s, args, MO_TEUL);
         break;
 #if TCG_TARGET_REG_BITS == 64
     case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, 2 | 4);
+        tcg_out_qemu_ld(s, args, MO_TESL);
         break;
 #endif
     case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, 3);
+        tcg_out_qemu_ld(s, args, MO_TEQ);
         break;
     case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, 0);
+        tcg_out_qemu_st(s, args, MO_UB);
         break;
     case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, 1);
+        tcg_out_qemu_st(s, args, MO_TEUW);
         break;
     case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, 2);
+        tcg_out_qemu_st(s, args, MO_TEUL);
         break;
     case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, 3);
+        tcg_out_qemu_st(s, args, MO_TEQ);
         break;
 
 #if TCG_TARGET_REG_BITS == 64
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 11/16] tcg: Add qemu_ld_st_i32/64
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (9 preceding siblings ...)
  2013-09-04 21:04 ` [Qemu-devel] [PATCH 10/16] tcg-sparc: " Richard Henderson
@ 2013-09-04 21:05 ` Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 12/16] exec: Add both big- and little-endian memory helpers Richard Henderson
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Step two in the transition, adding the new ldst opcodes.  Keep the old
opcodes around until all backends support the new opcodes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README               |  43 ++++-----
 tcg/aarch64/tcg-target.h |   2 +
 tcg/arm/tcg-target.h     |   2 +
 tcg/hppa/tcg-target.h    |   2 +
 tcg/i386/tcg-target.h    |   2 +
 tcg/ia64/tcg-target.h    |   2 +
 tcg/mips/tcg-target.h    |   2 +
 tcg/ppc/tcg-target.h     |   2 +
 tcg/ppc64/tcg-target.h   |   2 +
 tcg/s390/tcg-target.h    |   2 +
 tcg/sparc/tcg-target.h   |   2 +
 tcg/tcg-op.h             | 239 ++++++++++++-----------------------------------
 tcg/tcg-opc.h            |  96 ++++++++++++-------
 tcg/tcg.c                | 209 +++++++++++++++++++++++++++++++++++++++++
 tcg/tci/tcg-target.h     |   2 +
 15 files changed, 370 insertions(+), 239 deletions(-)

diff --git a/tcg/README b/tcg/README
index 063aeb9..f178212 100644
--- a/tcg/README
+++ b/tcg/README
@@ -412,30 +412,25 @@ current TB was linked to this TB. Otherwise execute the next
 instructions. Only indices 0 and 1 are valid and tcg_gen_goto_tb may be issued
 at most once with each slot index per TB.
 
-* qemu_ld8u t0, t1, flags
-qemu_ld8s t0, t1, flags
-qemu_ld16u t0, t1, flags
-qemu_ld16s t0, t1, flags
-qemu_ld32 t0, t1, flags
-qemu_ld32u t0, t1, flags
-qemu_ld32s t0, t1, flags
-qemu_ld64 t0, t1, flags
-
-Load data at the QEMU CPU address t1 into t0. t1 has the QEMU CPU address
-type. 'flags' contains the QEMU memory index (selects user or kernel access)
-for example.
-
-Note that "qemu_ld32" implies a 32-bit result, while "qemu_ld32u" and
-"qemu_ld32s" imply a 64-bit result appropriately extended from 32 bits.
-
-* qemu_st8 t0, t1, flags
-qemu_st16 t0, t1, flags
-qemu_st32 t0, t1, flags
-qemu_st64 t0, t1, flags
-
-Store the data t0 at the QEMU CPU Address t1. t1 has the QEMU CPU
-address type. 'flags' contains the QEMU memory index (selects user or
-kernel access) for example.
+* qemu_ld_i32/i64 t0, t1, flags, memidx
+* qemu_st_i32/i64 t0, t1, flags, memidx
+
+Load data at the guest address t1 into t0, or store data in t0 at guest
+address t1.  The _i32/_i64 size applies to the size of the input/output
+register t0 only.  The address t1 is always sized according to the guest,
+and the width of the memory operation is controlled by flags.
+
+Both t0 and t1 may be split into little-endian ordered pairs of registers
+if dealing with 64-bit quantities on a 32-bit host.
+
+The memidx selects the qemu tlb index to use (e.g. user or kernel access).
+The flags are the TCGMemOp bits, selecting the sign, width, and endianness
+of the memory access.
+
+For a 32-bit host, qemu_ld/st_i64 is guaranteed to only be used with a
+64-bit memory access specified in flags.
+
+*********
 
 Note 1: Some shortcuts are defined when the last operand is known to be
 a constant (e.g. addi for add, movi for mov).
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index e80bf78..3723320 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -99,6 +99,8 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
     __builtin___clear_cache((char *)start, (char *)stop);
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 01b9de0..a8311c2 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -86,6 +86,8 @@ extern bool use_idiv_instructions;
 #define TCG_TARGET_HAS_div_i32          use_idiv_instructions
 #define TCG_TARGET_HAS_rem_i32          0
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 extern bool tcg_target_deposit_valid(int ofs, int len);
 #define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
 
diff --git a/tcg/hppa/tcg-target.h b/tcg/hppa/tcg-target.h
index dbb1cc8..bd409a3 100644
--- a/tcg/hppa/tcg-target.h
+++ b/tcg/hppa/tcg-target.h
@@ -105,6 +105,8 @@ typedef enum {
 #define TCG_TARGET_HAS_ext8u_i32        0 /* and rd, rs, 0xff */
 #define TCG_TARGET_HAS_ext16u_i32       0 /* and rd, rs, 0xffff */
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 #define TCG_AREG0 TCG_REG_R17
 
 
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 487a092..47fdb81 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -131,6 +131,8 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i64        0
 #endif
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
      ((ofs) == 0 && (len) == 16))
diff --git a/tcg/ia64/tcg-target.h b/tcg/ia64/tcg-target.h
index 8e0ce35..9fe3bf6 100644
--- a/tcg/ia64/tcg-target.h
+++ b/tcg/ia64/tcg-target.h
@@ -153,6 +153,8 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i32        0
 #define TCG_TARGET_HAS_mulsh_i64        0
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 #define TCG_TARGET_deposit_i32_valid(ofs, len) ((len) <= 16)
 #define TCG_TARGET_deposit_i64_valid(ofs, len) ((len) <= 16)
 
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index 2eb266f..8f85d49 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -123,6 +123,8 @@ extern bool use_mips32r2_instructions;
 #define TCG_TARGET_HAS_ext16s_i32       use_mips32r2_instructions
 #define TCG_TARGET_HAS_rot_i32          use_mips32r2_instructions
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 /* optional instructions automatically implemented */
 #define TCG_TARGET_HAS_neg_i32          0 /* sub  rd, zero, rt   */
 #define TCG_TARGET_HAS_ext8u_i32        0 /* andi rt, rs, 0xff   */
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index 758e9b6..2662689 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -101,6 +101,8 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i32        0
 #define TCG_TARGET_HAS_mulsh_i32        0
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 #define TCG_AREG0 TCG_REG_R27
 
 #define tcg_qemu_tb_exec(env, tb_ptr) \
diff --git a/tcg/ppc64/tcg-target.h b/tcg/ppc64/tcg-target.h
index 8f490e1..0b75170 100644
--- a/tcg/ppc64/tcg-target.h
+++ b/tcg/ppc64/tcg-target.h
@@ -125,6 +125,8 @@ typedef enum {
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 #define TCG_AREG0 TCG_REG_R27
 
 #define TCG_TARGET_EXTEND_ARGS 1
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index eb691b3..a57aad5 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -100,6 +100,8 @@ typedef enum TCGReg {
 #define TCG_TARGET_HAS_muluh_i64        0
 #define TCG_TARGET_HAS_mulsh_i64        0
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 extern bool tcg_target_deposit_valid(int ofs, int len);
 #define TCG_TARGET_deposit_i32_valid  tcg_target_deposit_valid
 #define TCG_TARGET_deposit_i64_valid  tcg_target_deposit_valid
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 3b65c8e..2bee75a 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -149,6 +149,8 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i64        0
 #endif
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 #define TCG_AREG0 TCG_REG_I0
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index bb30a7c..7eabf22 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -137,24 +137,6 @@ static inline void tcg_gen_ldst_op_i64(TCGOpcode opc, TCGv_i64 val,
     *tcg_ctx.gen_opparam_ptr++ = offset;
 }
 
-static inline void tcg_gen_qemu_ldst_op_i64_i32(TCGOpcode opc, TCGv_i64 val,
-                                                TCGv_i32 addr, TCGArg mem_index)
-{
-    *tcg_ctx.gen_opc_ptr++ = opc;
-    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I64(val);
-    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I32(addr);
-    *tcg_ctx.gen_opparam_ptr++ = mem_index;
-}
-
-static inline void tcg_gen_qemu_ldst_op_i64_i64(TCGOpcode opc, TCGv_i64 val,
-                                                TCGv_i64 addr, TCGArg mem_index)
-{
-    *tcg_ctx.gen_opc_ptr++ = opc;
-    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I64(val);
-    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I64(addr);
-    *tcg_ctx.gen_opparam_ptr++ = mem_index;
-}
-
 static inline void tcg_gen_op4_i32(TCGOpcode opc, TCGv_i32 arg1, TCGv_i32 arg2,
                                    TCGv_i32 arg3, TCGv_i32 arg4)
 {
@@ -361,6 +343,21 @@ static inline void tcg_gen_op6ii_i64(TCGOpcode opc, TCGv_i64 arg1,
     *tcg_ctx.gen_opparam_ptr++ = arg6;
 }
 
+static inline void tcg_add_param_i32(TCGv_i32 val)
+{
+    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I32(val);
+}
+
+static inline void tcg_add_param_i64(TCGv_i64 val)
+{
+#if TCG_TARGET_REG_BITS == 32
+    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I32(TCGV_LOW(val));
+    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I32(TCGV_HIGH(val));
+#else
+    *tcg_ctx.gen_opparam_ptr++ = GET_TCGV_I64(val);
+#endif
+}
+
 static inline void gen_set_label(int n)
 {
     tcg_gen_op1i(INDEX_op_set_label, n);
@@ -2600,11 +2597,12 @@ static inline void tcg_gen_muls2_i64(TCGv_i64 rl, TCGv_i64 rh,
 #define tcg_global_mem_new tcg_global_mem_new_i32
 #define tcg_temp_local_new() tcg_temp_local_new_i32()
 #define tcg_temp_free tcg_temp_free_i32
-#define tcg_gen_qemu_ldst_op tcg_gen_op3i_i32
-#define tcg_gen_qemu_ldst_op_i64 tcg_gen_qemu_ldst_op_i64_i32
 #define TCGV_UNUSED(x) TCGV_UNUSED_I32(x)
 #define TCGV_IS_UNUSED(x) TCGV_IS_UNUSED_I32(x)
 #define TCGV_EQUAL(a, b) TCGV_EQUAL_I32(a, b)
+#define tcg_add_param_tl tcg_add_param_i32
+#define tcg_gen_qemu_ld_tl tcg_gen_qemu_ld_i32
+#define tcg_gen_qemu_st_tl tcg_gen_qemu_st_i32
 #else
 #define TCGv TCGv_i64
 #define tcg_temp_new() tcg_temp_new_i64()
@@ -2612,11 +2610,12 @@ static inline void tcg_gen_muls2_i64(TCGv_i64 rl, TCGv_i64 rh,
 #define tcg_global_mem_new tcg_global_mem_new_i64
 #define tcg_temp_local_new() tcg_temp_local_new_i64()
 #define tcg_temp_free tcg_temp_free_i64
-#define tcg_gen_qemu_ldst_op tcg_gen_op3i_i64
-#define tcg_gen_qemu_ldst_op_i64 tcg_gen_qemu_ldst_op_i64_i64
 #define TCGV_UNUSED(x) TCGV_UNUSED_I64(x)
 #define TCGV_IS_UNUSED(x) TCGV_IS_UNUSED_I64(x)
 #define TCGV_EQUAL(a, b) TCGV_EQUAL_I64(a, b)
+#define tcg_add_param_tl tcg_add_param_i64
+#define tcg_gen_qemu_ld_tl tcg_gen_qemu_ld_i64
+#define tcg_gen_qemu_st_tl tcg_gen_qemu_st_i64
 #endif
 
 /* debug info: write the PC of the corresponding QEMU CPU instruction */
@@ -2648,197 +2647,67 @@ static inline void tcg_gen_goto_tb(unsigned idx)
     tcg_gen_op1i(INDEX_op_goto_tb, idx);
 }
 
-#if TCG_TARGET_REG_BITS == 32
-static inline void tcg_gen_qemu_ld8u(TCGv ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_ld8u, ret, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld8u, TCGV_LOW(ret), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-    tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
-#endif
-}
-
-static inline void tcg_gen_qemu_ld8s(TCGv ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_ld8s, ret, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld8s, TCGV_LOW(ret), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-    tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
-#endif
-}
 
-static inline void tcg_gen_qemu_ld16u(TCGv ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_ld16u, ret, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld16u, TCGV_LOW(ret), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-    tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
-#endif
-}
-
-static inline void tcg_gen_qemu_ld16s(TCGv ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_ld16s, ret, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld16s, TCGV_LOW(ret), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-    tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
-#endif
-}
-
-static inline void tcg_gen_qemu_ld32u(TCGv ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_ld32, ret, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld32, TCGV_LOW(ret), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-    tcg_gen_movi_i32(TCGV_HIGH(ret), 0);
-#endif
-}
-
-static inline void tcg_gen_qemu_ld32s(TCGv ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_ld32, ret, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld32, TCGV_LOW(ret), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-    tcg_gen_sari_i32(TCGV_HIGH(ret), TCGV_LOW(ret), 31);
-#endif
-}
-
-static inline void tcg_gen_qemu_ld64(TCGv_i64 ret, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op4i_i32(INDEX_op_qemu_ld64, TCGV_LOW(ret), TCGV_HIGH(ret), addr, mem_index);
-#else
-    tcg_gen_op5i_i32(INDEX_op_qemu_ld64, TCGV_LOW(ret), TCGV_HIGH(ret),
-                     TCGV_LOW(addr), TCGV_HIGH(addr), mem_index);
-#endif
-}
-
-static inline void tcg_gen_qemu_st8(TCGv arg, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_st8, arg, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_st8, TCGV_LOW(arg), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-#endif
-}
-
-static inline void tcg_gen_qemu_st16(TCGv arg, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_st16, arg, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_st16, TCGV_LOW(arg), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-#endif
-}
-
-static inline void tcg_gen_qemu_st32(TCGv arg, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op3i_i32(INDEX_op_qemu_st32, arg, addr, mem_index);
-#else
-    tcg_gen_op4i_i32(INDEX_op_qemu_st32, TCGV_LOW(arg), TCGV_LOW(addr),
-                     TCGV_HIGH(addr), mem_index);
-#endif
-}
-
-static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
-{
-#if TARGET_LONG_BITS == 32
-    tcg_gen_op4i_i32(INDEX_op_qemu_st64, TCGV_LOW(arg), TCGV_HIGH(arg), addr,
-                     mem_index);
-#else
-    tcg_gen_op5i_i32(INDEX_op_qemu_st64, TCGV_LOW(arg), TCGV_HIGH(arg),
-                     TCGV_LOW(addr), TCGV_HIGH(addr), mem_index);
-#endif
-}
-
-#define tcg_gen_ld_ptr(R, A, O) tcg_gen_ld_i32(TCGV_PTR_TO_NAT(R), (A), (O))
-#define tcg_gen_discard_ptr(A) tcg_gen_discard_i32(TCGV_PTR_TO_NAT(A))
-
-#else /* TCG_TARGET_REG_BITS == 32 */
+void tcg_gen_qemu_ld_i32(TCGv_i32, TCGv, TCGArg, TCGMemOp);
+void tcg_gen_qemu_st_i32(TCGv_i32, TCGv, TCGArg, TCGMemOp);
+void tcg_gen_qemu_ld_i64(TCGv_i64, TCGv, TCGArg, TCGMemOp);
+void tcg_gen_qemu_st_i64(TCGv_i64, TCGv, TCGArg, TCGMemOp);
 
 static inline void tcg_gen_qemu_ld8u(TCGv ret, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld8u, ret, addr, mem_index);
+    tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_UB);
 }
 
 static inline void tcg_gen_qemu_ld8s(TCGv ret, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld8s, ret, addr, mem_index);
+    tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_SB);
 }
 
 static inline void tcg_gen_qemu_ld16u(TCGv ret, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld16u, ret, addr, mem_index);
+    tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_TEUW);
 }
 
 static inline void tcg_gen_qemu_ld16s(TCGv ret, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld16s, ret, addr, mem_index);
+    tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_TESW);
 }
 
 static inline void tcg_gen_qemu_ld32u(TCGv ret, TCGv addr, int mem_index)
 {
-#if TARGET_LONG_BITS == 32
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld32, ret, addr, mem_index);
-#else
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld32u, ret, addr, mem_index);
-#endif
+    tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_TEUL);
 }
 
 static inline void tcg_gen_qemu_ld32s(TCGv ret, TCGv addr, int mem_index)
 {
-#if TARGET_LONG_BITS == 32
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld32, ret, addr, mem_index);
-#else
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_ld32s, ret, addr, mem_index);
-#endif
+    tcg_gen_qemu_ld_tl(ret, addr, mem_index, MO_TESL);
 }
 
 static inline void tcg_gen_qemu_ld64(TCGv_i64 ret, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op_i64(INDEX_op_qemu_ld64, ret, addr, mem_index);
+    tcg_gen_qemu_ld_i64(ret, addr, mem_index, MO_TEQ);
 }
 
 static inline void tcg_gen_qemu_st8(TCGv arg, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_st8, arg, addr, mem_index);
+    tcg_gen_qemu_st_tl(arg, addr, mem_index, MO_UB);
 }
 
 static inline void tcg_gen_qemu_st16(TCGv arg, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_st16, arg, addr, mem_index);
+    tcg_gen_qemu_st_tl(arg, addr, mem_index, MO_TEUW);
 }
 
 static inline void tcg_gen_qemu_st32(TCGv arg, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op(INDEX_op_qemu_st32, arg, addr, mem_index);
+    tcg_gen_qemu_st_tl(arg, addr, mem_index, MO_TEUL);
 }
 
 static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 {
-    tcg_gen_qemu_ldst_op_i64(INDEX_op_qemu_st64, arg, addr, mem_index);
+    tcg_gen_qemu_st_i64(arg, addr, mem_index, MO_TEQ);
 }
 
-#define tcg_gen_ld_ptr(R, A, O) tcg_gen_ld_i64(TCGV_PTR_TO_NAT(R), (A), (O))
-#define tcg_gen_discard_ptr(A) tcg_gen_discard_i64(TCGV_PTR_TO_NAT(A))
-
-#endif /* TCG_TARGET_REG_BITS != 32 */
-
 #if TARGET_LONG_BITS == 64
 #define tcg_gen_movi_tl tcg_gen_movi_i64
 #define tcg_gen_mov_tl tcg_gen_mov_i64
@@ -2997,17 +2866,25 @@ static inline void tcg_gen_qemu_st64(TCGv_i64 arg, TCGv addr, int mem_index)
 #endif
 
 #if TCG_TARGET_REG_BITS == 32
-#define tcg_gen_add_ptr(R, A, B) tcg_gen_add_i32(TCGV_PTR_TO_NAT(R), \
-                                               TCGV_PTR_TO_NAT(A), \
-                                               TCGV_PTR_TO_NAT(B))
-#define tcg_gen_addi_ptr(R, A, B) tcg_gen_addi_i32(TCGV_PTR_TO_NAT(R), \
-                                                 TCGV_PTR_TO_NAT(A), (B))
-#define tcg_gen_ext_i32_ptr(R, A) tcg_gen_mov_i32(TCGV_PTR_TO_NAT(R), (A))
-#else /* TCG_TARGET_REG_BITS == 32 */
-#define tcg_gen_add_ptr(R, A, B) tcg_gen_add_i64(TCGV_PTR_TO_NAT(R), \
-                                               TCGV_PTR_TO_NAT(A), \
-                                               TCGV_PTR_TO_NAT(B))
-#define tcg_gen_addi_ptr(R, A, B) tcg_gen_addi_i64(TCGV_PTR_TO_NAT(R),   \
-                                                 TCGV_PTR_TO_NAT(A), (B))
-#define tcg_gen_ext_i32_ptr(R, A) tcg_gen_ext_i32_i64(TCGV_PTR_TO_NAT(R), (A))
-#endif /* TCG_TARGET_REG_BITS != 32 */
+# define tcg_gen_ld_ptr(R, A, O) \
+    tcg_gen_ld_i32(TCGV_PTR_TO_NAT(R), (A), (O))
+# define tcg_gen_discard_ptr(A) \
+    tcg_gen_discard_i32(TCGV_PTR_TO_NAT(A))
+# define tcg_gen_add_ptr(R, A, B) \
+    tcg_gen_add_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), TCGV_PTR_TO_NAT(B))
+# define tcg_gen_addi_ptr(R, A, B) \
+    tcg_gen_addi_i32(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
+# define tcg_gen_ext_i32_ptr(R, A) \
+    tcg_gen_mov_i32(TCGV_PTR_TO_NAT(R), (A))
+#else
+# define tcg_gen_ld_ptr(R, A, O) \
+    tcg_gen_ld_i64(TCGV_PTR_TO_NAT(R), (A), (O))
+# define tcg_gen_discard_ptr(A) \
+    tcg_gen_discard_i64(TCGV_PTR_TO_NAT(A))
+# define tcg_gen_add_ptr(R, A, B) \
+    tcg_gen_add_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), TCGV_PTR_TO_NAT(B))
+# define tcg_gen_addi_ptr(R, A, B) \
+    tcg_gen_addi_i64(TCGV_PTR_TO_NAT(R), TCGV_PTR_TO_NAT(A), (B))
+# define tcg_gen_ext_i32_ptr(R, A) \
+    tcg_gen_ext_i32_i64(TCGV_PTR_TO_NAT(R), (A))
+#endif /* TCG_TARGET_REG_BITS == 32 */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index a75c29d..d71707d 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -180,79 +180,107 @@ DEF(debug_insn_start, 0, 0, 1, TCG_OPF_NOT_PRESENT)
 #endif
 DEF(exit_tb, 0, 0, 1, TCG_OPF_BB_END)
 DEF(goto_tb, 0, 0, 1, TCG_OPF_BB_END)
-/* Note: even if TARGET_LONG_BITS is not defined, the INDEX_op
-   constants must be defined */
+
+#define IMPL_NEW_LDST \
+    (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS \
+     | IMPL(TCG_TARGET_HAS_new_ldst))
+
+#if TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
+DEF(qemu_ld_i32, 1, 1, 2, IMPL_NEW_LDST)
+DEF(qemu_st_i32, 0, 2, 2, IMPL_NEW_LDST)
+# if TCG_TARGET_REG_BITS == 64
+DEF(qemu_ld_i64, 1, 1, 2, IMPL_NEW_LDST | TCG_OPF_64BIT)
+DEF(qemu_st_i64, 0, 2, 2, IMPL_NEW_LDST | TCG_OPF_64BIT)
+# else
+DEF(qemu_ld_i64, 2, 1, 2, IMPL_NEW_LDST | TCG_OPF_64BIT)
+DEF(qemu_st_i64, 0, 3, 2, IMPL_NEW_LDST | TCG_OPF_64BIT)
+# endif
+#else
+DEF(qemu_ld_i32, 1, 2, 2, IMPL_NEW_LDST)
+DEF(qemu_st_i32, 0, 3, 2, IMPL_NEW_LDST)
+DEF(qemu_ld_i64, 2, 2, 2, IMPL_NEW_LDST | TCG_OPF_64BIT)
+DEF(qemu_st_i64, 0, 4, 2, IMPL_NEW_LDST | TCG_OPF_64BIT)
+#endif
+
+#undef IMPL_NEW_LDST
+
+#define IMPL_OLD_LDST \
+    (TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS \
+     | IMPL(!TCG_TARGET_HAS_new_ldst))
+
 #if TCG_TARGET_REG_BITS == 32
 #if TARGET_LONG_BITS == 32
-DEF(qemu_ld8u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld8u, 1, 1, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_ld8u, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld8u, 1, 2, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_ld8s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld8s, 1, 1, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_ld8s, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld8s, 1, 2, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_ld16u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld16u, 1, 1, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_ld16u, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld16u, 1, 2, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_ld16s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld16s, 1, 1, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_ld16s, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld16s, 1, 2, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_ld32, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld32, 1, 1, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_ld32, 1, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld32, 1, 2, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_ld64, 2, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld64, 2, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
 #else
-DEF(qemu_ld64, 2, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld64, 2, 2, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
 #endif
 
 #if TARGET_LONG_BITS == 32
-DEF(qemu_st8, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st8, 0, 2, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_st8, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st8, 0, 3, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_st16, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st16, 0, 2, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_st16, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st16, 0, 3, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_st32, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st32, 0, 2, 1, IMPL_OLD_LDST)
 #else
-DEF(qemu_st32, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st32, 0, 3, 1, IMPL_OLD_LDST)
 #endif
 #if TARGET_LONG_BITS == 32
-DEF(qemu_st64, 0, 3, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st64, 0, 3, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
 #else
-DEF(qemu_st64, 0, 4, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st64, 0, 4, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
 #endif
 
 #else /* TCG_TARGET_REG_BITS == 32 */
 
-DEF(qemu_ld8u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld8s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld16u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld16s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld32, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld32u, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld32s, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_ld64, 1, 1, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_ld8u, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld8s, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld16u, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld16s, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld32, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld32u, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld32s, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_ld64, 1, 1, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
 
-DEF(qemu_st8, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_st16, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_st32, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
-DEF(qemu_st64, 0, 2, 1, TCG_OPF_CALL_CLOBBER | TCG_OPF_SIDE_EFFECTS)
+DEF(qemu_st8, 0, 2, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_st16, 0, 2, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_st32, 0, 2, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
+DEF(qemu_st64, 0, 2, 1, IMPL_OLD_LDST | TCG_OPF_64BIT)
 
 #endif /* TCG_TARGET_REG_BITS != 32 */
 
+#undef IMPL_OLD_LDST
+
 #undef IMPL
 #undef IMPL64
 #undef DEF
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f11b231..99a27e5 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -795,6 +795,188 @@ void tcg_gen_shifti_i64(TCGv_i64 ret, TCGv_i64 arg1,
 }
 #endif
 
+static inline TCGMemOp tcg_canonicalize_memop(TCGMemOp op, bool is64, bool st)
+{
+    switch (op & MO_SIZE) {
+    case MO_8:
+        op &= ~MO_BSWAP;
+        break;
+    case MO_16:
+        break;
+    case MO_32:
+        if (!is64) {
+            op &= ~MO_SIGN;
+        }
+        break;
+    case MO_64:
+        if (!is64) {
+            tcg_abort();
+        }
+        break;
+    }
+    if (st) {
+        op &= ~MO_SIGN;
+    }
+    return op;
+}
+
+static const TCGOpcode old_ld_opc[8] = {
+    [MO_UB] = INDEX_op_qemu_ld8u,
+    [MO_SB] = INDEX_op_qemu_ld8s,
+    [MO_UW] = INDEX_op_qemu_ld16u,
+    [MO_SW] = INDEX_op_qemu_ld16s,
+#if TCG_TARGET_REG_BITS == 32
+    [MO_UL] = INDEX_op_qemu_ld32,
+    [MO_SL] = INDEX_op_qemu_ld32,
+#else
+    [MO_UL] = INDEX_op_qemu_ld32u,
+    [MO_SL] = INDEX_op_qemu_ld32s,
+#endif
+    [MO_Q]  = INDEX_op_qemu_ld64,
+};
+
+static const TCGOpcode old_st_opc[4] = {
+    [MO_UB] = INDEX_op_qemu_st8,
+    [MO_UW] = INDEX_op_qemu_st16,
+    [MO_UL] = INDEX_op_qemu_st32,
+    [MO_Q]  = INDEX_op_qemu_st64,
+};
+
+void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 0, 0);
+
+    if (TCG_TARGET_HAS_new_ldst) {
+        *tcg_ctx.gen_opc_ptr++ = INDEX_op_qemu_ld_i32;
+        tcg_add_param_i32(val);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = memop;
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+        return;
+    }
+
+    /* The old opcodes only support target-endian memory operations.  */
+    assert((memop & MO_BSWAP) == MO_TE || (memop & MO_SIZE) == MO_8);
+    assert(old_ld_opc[memop & MO_SSIZE] != 0);
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        *tcg_ctx.gen_opc_ptr++ = old_ld_opc[memop & MO_SSIZE];
+        tcg_add_param_i32(val);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+    } else {
+        TCGv_i64 val64 = tcg_temp_new_i64();
+
+        *tcg_ctx.gen_opc_ptr++ = old_ld_opc[memop & MO_SSIZE];
+        tcg_add_param_i64(val64);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+
+        tcg_gen_trunc_i64_i32(val, val64);
+        tcg_temp_free_i64(val64);
+    }
+}
+
+void tcg_gen_qemu_st_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 0, 1);
+
+    if (TCG_TARGET_HAS_new_ldst) {
+        *tcg_ctx.gen_opc_ptr++ = INDEX_op_qemu_st_i32;
+        tcg_add_param_i32(val);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = memop;
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+        return;
+    }
+
+    /* The old opcodes only support target-endian memory operations.  */
+    assert((memop & MO_BSWAP) == MO_TE || (memop & MO_SIZE) == MO_8);
+    assert(old_st_opc[memop & MO_SIZE] != 0);
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        *tcg_ctx.gen_opc_ptr++ = old_st_opc[memop & MO_SIZE];
+        tcg_add_param_i32(val);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+    } else {
+        TCGv_i64 val64 = tcg_temp_new_i64();
+
+        tcg_gen_extu_i32_i64(val64, val);
+
+        *tcg_ctx.gen_opc_ptr++ = old_st_opc[memop & MO_SIZE];
+        tcg_add_param_i64(val64);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+
+        tcg_temp_free_i64(val64);
+    }
+}
+
+void tcg_gen_qemu_ld_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 1, 0);
+
+#if TCG_TARGET_REG_BITS == 32
+    if ((memop & MO_SIZE) < MO_64) {
+        tcg_gen_qemu_ld_i32(TCGV_LOW(val), addr, idx, memop);
+        if (memop & MO_SIGN) {
+            tcg_gen_sari_i32(TCGV_HIGH(val), TCGV_LOW(val), 31);
+        } else {
+            tcg_gen_movi_i32(TCGV_HIGH(val), 0);
+        }
+        return;
+    }
+#endif
+
+    if (TCG_TARGET_HAS_new_ldst) {
+        *tcg_ctx.gen_opc_ptr++ = INDEX_op_qemu_ld_i64;
+        tcg_add_param_i64(val);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = memop;
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+        return;
+    }
+
+    /* The old opcodes only support target-endian memory operations.  */
+    assert((memop & MO_BSWAP) == MO_TE || (memop & MO_SIZE) == MO_8);
+    assert(old_ld_opc[memop & MO_SSIZE] != 0);
+
+    *tcg_ctx.gen_opc_ptr++ = old_ld_opc[memop & MO_SSIZE];
+    tcg_add_param_i64(val);
+    tcg_add_param_tl(addr);
+    *tcg_ctx.gen_opparam_ptr++ = idx;
+}
+
+void tcg_gen_qemu_st_i64(TCGv_i64 val, TCGv addr, TCGArg idx, TCGMemOp memop)
+{
+    memop = tcg_canonicalize_memop(memop, 1, 1);
+
+#if TCG_TARGET_REG_BITS == 32
+    if ((memop & MO_SIZE) < MO_64) {
+        tcg_gen_qemu_st_i32(TCGV_LOW(val), addr, idx, memop);
+        return;
+    }
+#endif
+
+    if (TCG_TARGET_HAS_new_ldst) {
+        *tcg_ctx.gen_opc_ptr++ = INDEX_op_qemu_st_i64;
+        tcg_add_param_i64(val);
+        tcg_add_param_tl(addr);
+        *tcg_ctx.gen_opparam_ptr++ = memop;
+        *tcg_ctx.gen_opparam_ptr++ = idx;
+        return;
+    }
+
+    /* The old opcodes only support target-endian memory operations.  */
+    assert((memop & MO_BSWAP) == MO_TE || (memop & MO_SIZE) == MO_8);
+    assert(old_st_opc[memop & MO_SIZE] != 0);
+
+    *tcg_ctx.gen_opc_ptr++ = old_st_opc[memop & MO_SIZE];
+    tcg_add_param_i64(val);
+    tcg_add_param_tl(addr);
+    *tcg_ctx.gen_opparam_ptr++ = idx;
+}
 
 static void tcg_reg_alloc_start(TCGContext *s)
 {
@@ -910,6 +1092,22 @@ static const char * const cond_name[] =
     [TCG_COND_GTU] = "gtu"
 };
 
+static const char * const ldst_name[] =
+{
+    [MO_UB]   = "ub",
+    [MO_SB]   = "sb",
+    [MO_LEUW] = "leuw",
+    [MO_LESW] = "lesw",
+    [MO_LEUL] = "leul",
+    [MO_LESL] = "lesl",
+    [MO_LEQ]  = "leq",
+    [MO_BEUW] = "beuw",
+    [MO_BESW] = "besw",
+    [MO_BEUL] = "beul",
+    [MO_BESL] = "besl",
+    [MO_BEQ]  = "beq",
+};
+
 void tcg_dump_ops(TCGContext *s)
 {
     const uint16_t *opc_ptr;
@@ -1038,6 +1236,17 @@ void tcg_dump_ops(TCGContext *s)
                 }
                 i = 1;
                 break;
+            case INDEX_op_qemu_ld_i32:
+            case INDEX_op_qemu_st_i32:
+            case INDEX_op_qemu_ld_i64:
+            case INDEX_op_qemu_st_i64:
+                if (args[k] < ARRAY_SIZE(ldst_name) && ldst_name[args[k]]) {
+                    qemu_log(",%s", ldst_name[args[k++]]);
+                } else {
+                    qemu_log(",$0x%" TCG_PRIlx, args[k++]);
+                }
+                i = 1;
+                break;
             default:
                 i = 0;
                 break;
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 1dc069c..bd16035 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -122,6 +122,8 @@
 #define TCG_TARGET_HAS_mulsh_i64        0
 #endif /* TCG_TARGET_REG_BITS == 64 */
 
+#define TCG_TARGET_HAS_new_ldst         0
+
 /* Number of registers available.
    For 32 bit hosts, we need more than 8 registers (call arguments). */
 /* #define TCG_TARGET_NB_REGS 8 */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 12/16] exec: Add both big- and little-endian memory helpers
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (10 preceding siblings ...)
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 11/16] tcg: Add qemu_ld_st_i32/64 Richard Henderson
@ 2013-09-04 21:05 ` Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 13/16] tcg-i386: Tidy softmmu routines Richard Henderson
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Step three in the transition: helpers not tied to the target
"default" endianness.  To be used when the guest uses a memory
operation with non-default endianness.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/softmmu_template.h | 286 +++++++++++++++++++++++++++++++++++-----
 tcg/tcg.h                       |  65 ++++++---
 2 files changed, 300 insertions(+), 51 deletions(-)

diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
index 5edac51..c6a5440 100644
--- a/include/exec/softmmu_template.h
+++ b/include/exec/softmmu_template.h
@@ -70,6 +70,48 @@
 #define ADDR_READ addr_read
 #endif
 
+#if DATA_SIZE == 8
+# define BSWAP(X)  bswap64(X)
+#elif DATA_SIZE == 4
+# define BSWAP(X)  bswap32(X)
+#elif DATA_SIZE == 2
+# define BSWAP(X)  bswap16(X)
+#else
+# define BSWAP(X)  (X)
+#endif
+
+#ifdef TARGET_WORDS_BIGENDIAN
+# define TGT_BE(X)  (X)
+# define TGT_LE(X)  BSWAP(X)
+#else
+# define TGT_BE(X)  BSWAP(X)
+# define TGT_LE(X)  (X)
+#endif
+
+#if DATA_SIZE == 1
+# define helper_le_ld_name  glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
+# define helper_be_ld_name  helper_le_ld_name
+# define helper_le_lds_name glue(glue(helper_ret_ld, SSUFFIX), MMUSUFFIX)
+# define helper_be_lds_name helper_le_lds_name
+# define helper_le_st_name  glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)
+# define helper_be_st_name  helper_le_st_name
+#else
+# define helper_le_ld_name  glue(glue(helper_le_ld, USUFFIX), MMUSUFFIX)
+# define helper_be_ld_name  glue(glue(helper_be_ld, USUFFIX), MMUSUFFIX)
+# define helper_le_lds_name glue(glue(helper_le_ld, SSUFFIX), MMUSUFFIX)
+# define helper_be_lds_name glue(glue(helper_be_ld, SSUFFIX), MMUSUFFIX)
+# define helper_le_st_name  glue(glue(helper_le_st, SUFFIX), MMUSUFFIX)
+# define helper_be_st_name  glue(glue(helper_be_st, SUFFIX), MMUSUFFIX)
+#endif
+
+#ifdef TARGET_WORDS_BIGENDIAN
+# define helper_te_ld_name  helper_be_ld_name
+# define helper_te_st_name  helper_be_st_name
+#else
+# define helper_te_ld_name  helper_le_ld_name
+# define helper_te_st_name  helper_le_st_name
+#endif
+
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
                                               hwaddr physaddr,
                                               target_ulong addr,
@@ -89,18 +131,16 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
     return val;
 }
 
-/* handle all cases except unaligned access which span two pages */
 #ifdef SOFTMMU_CODE_ACCESS
-static
+static __attribute__((unused))
 #endif
-WORD_TYPE
-glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                              target_ulong addr, int mmu_idx,
-                                              uintptr_t retaddr)
+WORD_TYPE helper_le_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
+                            uintptr_t retaddr)
 {
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
     uintptr_t haddr;
+    DATA_TYPE res;
 
     /* Adjust the given return address.  */
     retaddr -= GETPC_ADJ;
@@ -124,7 +164,12 @@ glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)(CPUArchState *env,
             goto do_unaligned_access;
         }
         ioaddr = env->iotlb[mmu_idx][index];
-        return glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
+
+        /* ??? Note that the io helpers always read data in the target
+           byte ordering.  We should push the LE/BE request down into io.  */
+        res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
+        res = TGT_LE(res);
+        return res;
     }
 
     /* Handle slow unaligned access (it spans two pages or IO).  */
@@ -132,7 +177,7 @@ glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)(CPUArchState *env,
         && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
                     >= TARGET_PAGE_SIZE)) {
         target_ulong addr1, addr2;
-        DATA_TYPE res1, res2, res;
+        DATA_TYPE res1, res2;
         unsigned shift;
     do_unaligned_access:
 #ifdef ALIGNED_ONLY
@@ -142,16 +187,94 @@ glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)(CPUArchState *env,
         addr2 = addr1 + DATA_SIZE;
         /* Note the adjustment at the beginning of the function.
            Undo that for the recursion.  */
-        res1 = glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
-            (env, addr1, mmu_idx, retaddr + GETPC_ADJ);
-        res2 = glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
-            (env, addr2, mmu_idx, retaddr + GETPC_ADJ);
+        res1 = helper_le_ld_name(env, addr1, mmu_idx, retaddr + GETPC_ADJ);
+        res2 = helper_le_ld_name(env, addr2, mmu_idx, retaddr + GETPC_ADJ);
         shift = (addr & (DATA_SIZE - 1)) * 8;
-#ifdef TARGET_WORDS_BIGENDIAN
-        res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
-#else
+
+        /* Little-endian combine.  */
         res = (res1 >> shift) | (res2 << ((DATA_SIZE * 8) - shift));
+        return res;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+#ifdef ALIGNED_ONLY
+    if ((addr & (DATA_SIZE - 1)) != 0) {
+        do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+    }
+#endif
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+#if DATA_SIZE == 1
+    res = glue(glue(ld, LSUFFIX), _p)((uint8_t *)haddr);
+#else
+    res = glue(glue(ld, LSUFFIX), _le_p)((uint8_t *)haddr);
+#endif
+    return res;
+}
+
+#if DATA_SIZE > 1
+#ifdef SOFTMMU_CODE_ACCESS
+static __attribute__((unused))
+#endif
+WORD_TYPE helper_be_ld_name(CPUArchState *env, target_ulong addr, int mmu_idx,
+                            uintptr_t retaddr)
+{
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
+    uintptr_t haddr;
+    DATA_TYPE res;
+
+    /* Adjust the given return address.  */
+    retaddr -= GETPC_ADJ;
+
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if ((addr & TARGET_PAGE_MASK)
+         != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
+#ifdef ALIGNED_ONLY
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        }
+#endif
+        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        hwaddr ioaddr;
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+        ioaddr = env->iotlb[mmu_idx][index];
+
+        /* ??? Note that the io helpers always read data in the target
+           byte ordering.  We should push the LE/BE request down into io.  */
+        res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
+        res = TGT_BE(res);
+        return res;
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                    >= TARGET_PAGE_SIZE)) {
+        target_ulong addr1, addr2;
+        DATA_TYPE res1, res2;
+        unsigned shift;
+    do_unaligned_access:
+#ifdef ALIGNED_ONLY
+        do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
 #endif
+        addr1 = addr & ~(DATA_SIZE - 1);
+        addr2 = addr1 + DATA_SIZE;
+        /* Note the adjustment at the beginning of the function.
+           Undo that for the recursion.  */
+        res1 = helper_be_ld_name(env, addr1, mmu_idx, retaddr + GETPC_ADJ);
+        res2 = helper_be_ld_name(env, addr2, mmu_idx, retaddr + GETPC_ADJ);
+        shift = (addr & (DATA_SIZE - 1)) * 8;
+
+        /* Big-endian combine.  */
+        res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
         return res;
     }
 
@@ -163,16 +286,16 @@ glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)(CPUArchState *env,
 #endif
 
     haddr = addr + env->tlb_table[mmu_idx][index].addend;
-    /* Note that ldl_raw is defined with type "int".  */
-    return (DATA_TYPE) glue(glue(ld, LSUFFIX), _raw)((uint8_t *)haddr);
+    res = glue(glue(ld, LSUFFIX), _be_p)((uint8_t *)haddr);
+    return res;
 }
+#endif /* DATA_SIZE > 1 */
 
 DATA_TYPE
 glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
                                          int mmu_idx)
 {
-    return glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)(env, addr, mmu_idx,
-                                                         GETRA());
+    return helper_te_ld_name (env, addr, mmu_idx, GETRA());
 }
 
 #ifndef SOFTMMU_CODE_ACCESS
@@ -180,14 +303,19 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
 /* Provide signed versions of the load routines as well.  We can of course
    avoid this for 64-bit data, or for 32-bit data on 32-bit host.  */
 #if DATA_SIZE * 8 < TCG_TARGET_REG_BITS
-WORD_TYPE
-glue(glue(helper_ret_ld, SSUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                              target_ulong addr, int mmu_idx,
-                                              uintptr_t retaddr)
+WORD_TYPE helper_le_lds_name(CPUArchState *env, target_ulong addr,
+                             int mmu_idx, uintptr_t retaddr)
+{
+    return (SDATA_TYPE)helper_le_ld_name(env, addr, mmu_idx, retaddr);
+}
+
+# if DATA_SIZE > 1
+WORD_TYPE helper_be_lds_name(CPUArchState *env, target_ulong addr,
+                             int mmu_idx, uintptr_t retaddr)
 {
-    return (SDATA_TYPE) glue(glue(helper_ret_ld, USUFFIX), MMUSUFFIX)
-        (env, addr, mmu_idx, retaddr);
+    return (SDATA_TYPE)helper_be_ld_name(env, addr, mmu_idx, retaddr);
 }
+# endif
 #endif
 
 static inline void glue(io_write, SUFFIX)(CPUArchState *env,
@@ -208,10 +336,8 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     io_mem_write(mr, physaddr, val, 1 << SHIFT);
 }
 
-void
-glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                             target_ulong addr, DATA_TYPE val,
-                                             int mmu_idx, uintptr_t retaddr)
+void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
+                       int mmu_idx, uintptr_t retaddr)
 {
     int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
     target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
@@ -239,6 +365,10 @@ glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
             goto do_unaligned_access;
         }
         ioaddr = env->iotlb[mmu_idx][index];
+
+        /* ??? Note that the io helpers always read data in the target
+           byte ordering.  We should push the LE/BE request down into io.  */
+        val = TGT_LE(val);
         glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
         return;
     }
@@ -256,11 +386,84 @@ glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
         /* Note: relies on the fact that tlb_fill() does not remove the
          * previous page from the TLB cache.  */
         for (i = DATA_SIZE - 1; i >= 0; i--) {
-#ifdef TARGET_WORDS_BIGENDIAN
-            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
-#else
+            /* Little-endian extract.  */
             uint8_t val8 = val >> (i * 8);
+            /* Note the adjustment at the beginning of the function.
+               Undo that for the recursion.  */
+            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
+                                            mmu_idx, retaddr + GETPC_ADJ);
+        }
+        return;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+#ifdef ALIGNED_ONLY
+    if ((addr & (DATA_SIZE - 1)) != 0) {
+        do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
+    }
+#endif
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+#if DATA_SIZE == 1
+    glue(glue(st, SUFFIX), _p)((uint8_t *)haddr, val);
+#else
+    glue(glue(st, SUFFIX), _le_p)((uint8_t *)haddr, val);
 #endif
+}
+
+#if DATA_SIZE > 1
+void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
+                       int mmu_idx, uintptr_t retaddr)
+{
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    uintptr_t haddr;
+
+    /* Adjust the given return address.  */
+    retaddr -= GETPC_ADJ;
+
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if ((addr & TARGET_PAGE_MASK)
+        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
+#ifdef ALIGNED_ONLY
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
+        }
+#endif
+        tlb_fill(env, addr, 1, mmu_idx, retaddr);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        hwaddr ioaddr;
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+        ioaddr = env->iotlb[mmu_idx][index];
+
+        /* ??? Note that the io helpers always read data in the target
+           byte ordering.  We should push the LE/BE request down into io.  */
+        val = TGT_BE(val);
+        glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
+        return;
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                     >= TARGET_PAGE_SIZE)) {
+        int i;
+    do_unaligned_access:
+#ifdef ALIGNED_ONLY
+        do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
+#endif
+        /* XXX: not efficient, but simple */
+        /* Note: relies on the fact that tlb_fill() does not remove the
+         * previous page from the TLB cache.  */
+        for (i = DATA_SIZE - 1; i >= 0; i--) {
+            /* Big-endian extract.  */
+            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
             /* Note the adjustment at the beginning of the function.
                Undo that for the recursion.  */
             glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
@@ -277,15 +480,15 @@ glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
 #endif
 
     haddr = addr + env->tlb_table[mmu_idx][index].addend;
-    glue(glue(st, SUFFIX), _raw)((uint8_t *)haddr, val);
+    glue(glue(st, SUFFIX), _be_p)((uint8_t *)haddr, val);
 }
+#endif /* DATA_SIZE > 1 */
 
 void
 glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
                                          DATA_TYPE val, int mmu_idx)
 {
-    glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, val, mmu_idx,
-                                                 GETRA());
+    helper_te_st_name(env, addr, val, mmu_idx, GETRA());
 }
 
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
@@ -301,3 +504,16 @@ glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
 #undef SDATA_TYPE
 #undef USUFFIX
 #undef SSUFFIX
+#undef BSWAP
+#undef TGT_BE
+#undef TGT_LE
+#undef CPU_BE
+#undef CPU_LE
+#undef helper_le_ld_name
+#undef helper_be_ld_name
+#undef helper_le_lds_name
+#undef helper_be_lds_name
+#undef helper_le_st_name
+#undef helper_be_st_name
+#undef helper_te_ld_name
+#undef helper_te_st_name
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 91dcd92..60e858c 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -812,29 +812,62 @@ void tcg_out_tb_finalize(TCGContext *s);
 /* Value zero-extended to tcg register size.  */
 tcg_target_ulong helper_ret_ldub_mmu(CPUArchState *env, target_ulong addr,
                                      int mmu_idx, uintptr_t retaddr);
-tcg_target_ulong helper_ret_lduw_mmu(CPUArchState *env, target_ulong addr,
-                                     int mmu_idx, uintptr_t retaddr);
-tcg_target_ulong helper_ret_ldul_mmu(CPUArchState *env, target_ulong addr,
-                                     int mmu_idx, uintptr_t retaddr);
-uint64_t helper_ret_ldq_mmu(CPUArchState *env, target_ulong addr,
-                            int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_lduw_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldul_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+uint64_t helper_le_ldq_mmu(CPUArchState *env, target_ulong addr,
+                           int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_be_lduw_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_be_ldul_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+uint64_t helper_be_ldq_mmu(CPUArchState *env, target_ulong addr,
+                           int mmu_idx, uintptr_t retaddr);
 
 /* Value sign-extended to tcg register size.  */
 tcg_target_ulong helper_ret_ldsb_mmu(CPUArchState *env, target_ulong addr,
                                      int mmu_idx, uintptr_t retaddr);
-tcg_target_ulong helper_ret_ldsw_mmu(CPUArchState *env, target_ulong addr,
-                                     int mmu_idx, uintptr_t retaddr);
-tcg_target_ulong helper_ret_ldsl_mmu(CPUArchState *env, target_ulong addr,
-                                     int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldsw_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_le_ldsl_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_be_ldsw_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
+tcg_target_ulong helper_be_ldsl_mmu(CPUArchState *env, target_ulong addr,
+                                    int mmu_idx, uintptr_t retaddr);
 
 void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
                         int mmu_idx, uintptr_t retaddr);
-void helper_ret_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
-                        int mmu_idx, uintptr_t retaddr);
-void helper_ret_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
-                        int mmu_idx, uintptr_t retaddr);
-void helper_ret_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
-                        int mmu_idx, uintptr_t retaddr);
+void helper_le_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
+                       int mmu_idx, uintptr_t retaddr);
+void helper_le_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
+                       int mmu_idx, uintptr_t retaddr);
+void helper_le_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
+                       int mmu_idx, uintptr_t retaddr);
+void helper_be_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
+                       int mmu_idx, uintptr_t retaddr);
+void helper_be_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
+                       int mmu_idx, uintptr_t retaddr);
+void helper_be_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
+                       int mmu_idx, uintptr_t retaddr);
+
+/* Temporary aliases until backends are converted.  */
+#ifdef TARGET_WORDS_BIGENDIAN
+# define helper_ret_lduw_mmu  helper_be_lduw_mmu
+# define helper_ret_ldul_mmu  helper_be_ldul_mmu
+# define helper_ret_ldq_mmu   helper_be_ldq_mmu
+# define helper_ret_stw_mmu   helper_be_stw_mmu
+# define helper_ret_stl_mmu   helper_be_stl_mmu
+# define helper_ret_stq_mmu   helper_be_stq_mmu
+#else
+# define helper_ret_lduw_mmu  helper_le_lduw_mmu
+# define helper_ret_ldul_mmu  helper_le_ldul_mmu
+# define helper_ret_ldq_mmu   helper_le_ldq_mmu
+# define helper_ret_stw_mmu   helper_le_stw_mmu
+# define helper_ret_stl_mmu   helper_le_stl_mmu
+# define helper_ret_stq_mmu   helper_le_stq_mmu
+#endif
 
 uint8_t helper_ldb_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
 uint16_t helper_ldw_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 13/16] tcg-i386: Tidy softmmu routines
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (11 preceding siblings ...)
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 12/16] exec: Add both big- and little-endian memory helpers Richard Henderson
@ 2013-09-04 21:05 ` Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 14/16] tcg-i386: Remove "cb" output restriction from qemu_st8 for i386 Richard Henderson
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Pass two TCGReg to tcg_out_tlb_load, rather than idx+args.

Move ldst_optimization routines just below tcg_out_tlb_load to avoid
the need for forward declarations.

Use TCGReg enum in preference to int where apprpriate.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 509 +++++++++++++++++++++++---------------------------
 1 file changed, 234 insertions(+), 275 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ba24ec9..89fe121 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1041,22 +1041,10 @@ static const void * const qemu_st_helpers[4] = {
     helper_ret_stq_mmu,
 };
 
-static void add_qemu_ldst_label(TCGContext *s,
-                                int is_ld,
-                                int opc,
-                                int data_reg,
-                                int data_reg2,
-                                int addrlo_reg,
-                                int addrhi_reg,
-                                int mem_index,
-                                uint8_t *raddr,
-                                uint8_t **label_ptr);
-
 /* Perform the TLB load and compare.
 
    Inputs:
-   ADDRLO_IDX contains the index into ARGS of the low part of the
-   address; the high part of the address is at ADDR_LOW_IDX+1.
+   ADDRLO and ADDRHI contain the low and high part of the address.
 
    MEM_INDEX and S_BITS are the memory context and log2 size of the load.
 
@@ -1074,14 +1062,12 @@ static void add_qemu_ldst_label(TCGContext *s,
 
    First argument register is clobbered.  */
 
-static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx,
+static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
                                     int mem_index, TCGMemOp s_bits,
-                                    const TCGArg *args,
                                     uint8_t **label_ptr, int which)
 {
-    const int addrlo = args[addrlo_idx];
-    const int r0 = TCG_REG_L0;
-    const int r1 = TCG_REG_L1;
+    const TCGReg r0 = TCG_REG_L0;
+    const TCGReg r1 = TCG_REG_L1;
     TCGType ttype = TCG_TYPE_I32;
     TCGType htype = TCG_TYPE_I32;
     int trexw = 0, hrexw = 0;
@@ -1130,7 +1116,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx,
 
     if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
         /* cmp 4(r0), addrhi */
-        tcg_out_modrm_offset(s, OPC_CMP_GvEv, args[addrlo_idx+1], r0, 4);
+        tcg_out_modrm_offset(s, OPC_CMP_GvEv, addrhi, r0, 4);
 
         /* jne slow_path */
         tcg_out_opc(s, OPC_JCC_long + JCC_JNE, 0, 0, 0);
@@ -1144,6 +1130,209 @@ static inline void tcg_out_tlb_load(TCGContext *s, int addrlo_idx,
     tcg_out_modrm_offset(s, OPC_ADD_GvEv + hrexw, r1, r0,
                          offsetof(CPUTLBEntry, addend) - which);
 }
+
+/*
+ * Record the context of a call to the out of line helper code for the slow path
+ * for a load or store, so that we can later generate the correct helper code
+ */
+static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOp opc,
+                                TCGReg datalo, TCGReg datahi,
+                                TCGReg addrlo, TCGReg addrhi,
+                                int mem_index, uint8_t *raddr,
+                                uint8_t **label_ptr)
+{
+    int idx;
+    TCGLabelQemuLdst *label;
+
+    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST) {
+        tcg_abort();
+    }
+
+    idx = s->nb_qemu_ldst_labels++;
+    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
+    label->is_ld = is_ld;
+    label->opc = opc;
+    label->datalo_reg = datalo;
+    label->datahi_reg = datahi;
+    label->addrlo_reg = addrlo;
+    label->addrhi_reg = addrhi;
+    label->mem_index = mem_index;
+    label->raddr = raddr;
+    label->label_ptr[0] = label_ptr[0];
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        label->label_ptr[1] = label_ptr[1];
+    }
+}
+
+/*
+ * Generate code for the slow path for a load at the end of block
+ */
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+    TCGMemOp opc = l->opc;
+    TCGMemOp s_bits = opc & MO_SIZE;
+    TCGReg data_reg;
+    uint8_t **label_ptr = &l->label_ptr[0];
+
+    /* resolve label address */
+    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        int ofs = 0;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+        }
+
+        tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, l->mem_index);
+        ofs += 4;
+
+        tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, (uintptr_t)l->raddr);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+        /* The second argument is already loaded with addrlo.  */
+        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2],
+                     l->mem_index);
+        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
+                     (uintptr_t)l->raddr);
+    }
+
+    tcg_out_calli(s, (uintptr_t)qemu_ld_helpers[s_bits]);
+
+    data_reg = l->datalo_reg;
+    switch (opc & MO_SSIZE) {
+    case MO_SB:
+        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
+        break;
+    case MO_SW:
+        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
+        break;
+#if TCG_TARGET_REG_BITS == 64
+    case MO_SL:
+        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
+        break;
+#endif
+    case MO_UB:
+    case MO_UW:
+        /* Note that the helpers have zero-extended to tcg_target_long.  */
+    case MO_UL:
+        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
+        break;
+    case MO_Q:
+        if (TCG_TARGET_REG_BITS == 64) {
+            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
+        } else if (data_reg == TCG_REG_EDX) {
+            /* xchg %edx, %eax */
+            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
+            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
+        } else {
+            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
+            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
+        }
+        break;
+    default:
+        tcg_abort();
+    }
+
+    /* Jump to the code corresponding to next IR of qemu_st */
+    tcg_out_jmp(s, (uintptr_t)l->raddr);
+}
+
+/*
+ * Generate code for the slow path for a store at the end of block
+ */
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
+{
+    TCGMemOp opc = l->opc;
+    TCGMemOp s_bits = opc & MO_SIZE;
+    uint8_t **label_ptr = &l->label_ptr[0];
+    TCGReg retaddr;
+
+    /* resolve label address */
+    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
+    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
+        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
+    }
+
+    if (TCG_TARGET_REG_BITS == 32) {
+        int ofs = 0;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+        }
+
+        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        if (s_bits == MO_64) {
+            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+        }
+
+        tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, l->mem_index);
+        ofs += 4;
+
+        retaddr = TCG_REG_EAX;
+        tcg_out_movi(s, TCG_TYPE_I32, retaddr, (uintptr_t)l->raddr);
+        tcg_out_st(s, TCG_TYPE_I32, retaddr, TCG_REG_ESP, ofs);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
+        /* The second argument is already loaded with addrlo.  */
+        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+                    tcg_target_call_iarg_regs[2], l->datalo_reg);
+        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
+                     l->mem_index);
+
+        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
+            retaddr = tcg_target_call_iarg_regs[4];
+            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
+        } else {
+            retaddr = TCG_REG_RAX;
+            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
+            tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, 0);
+        }
+    }
+
+    /* "Tail call" to the helper, with the return address back inline.  */
+    tcg_out_push(s, retaddr);
+    tcg_out_jmp(s, (uintptr_t)qemu_st_helpers[s_bits]);
+}
+
+/*
+ * Generate TB finalization at the end of block
+ */
+void tcg_out_tb_finalize(TCGContext *s)
+{
+    int i;
+    TCGLabelQemuLdst *label;
+
+    /* qemu_ld/st slow paths */
+    for (i = 0; i < s->nb_qemu_ldst_labels; i++) {
+        label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[i];
+        if (label->is_ld) {
+            tcg_out_qemu_ld_slow_path(s, label);
+        } else {
+            tcg_out_qemu_st_slow_path(s, label);
+        }
+    }
+}
 #elif defined(__x86_64__) && defined(__linux__)
 # include <asm/prctl.h>
 # include <sys/prctl.h>
@@ -1248,46 +1437,36 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
    common. */
 static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
-    int data_reg, data_reg2 = 0;
-    int addrlo_idx;
+    TCGReg datalo, datahi, addrlo;
 #if defined(CONFIG_SOFTMMU)
+    TCGReg addrhi;
     int mem_index;
     TCGMemOp s_bits;
     uint8_t *label_ptr[2];
 #endif
 
-    data_reg = args[0];
-    addrlo_idx = 1;
-    if (TCG_TARGET_REG_BITS == 32 && (opc & MO_SIZE) == MO_64) {
-        data_reg2 = args[1];
-        addrlo_idx = 2;
-    }
+    datalo = *args++;
+    datahi = (TCG_TARGET_REG_BITS == 32 && opc == 3 ? *args++ : 0);
+    addrlo = *args++;
 
 #if defined(CONFIG_SOFTMMU)
-    mem_index = args[addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS)];
+    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
+    mem_index = *args++;
     s_bits = opc & MO_SIZE;
 
-    tcg_out_tlb_load(s, addrlo_idx, mem_index, s_bits, args,
+    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
                      label_ptr, offsetof(CPUTLBEntry, addr_read));
 
     /* TLB Hit.  */
-    tcg_out_qemu_ld_direct(s, data_reg, data_reg2, TCG_REG_L1, 0, 0, opc);
+    tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
 
     /* Record the current context of a load into ldst label */
-    add_qemu_ldst_label(s,
-                        1,
-                        opc,
-                        data_reg,
-                        data_reg2,
-                        args[addrlo_idx],
-                        args[addrlo_idx + 1],
-                        mem_index,
-                        s->code_ptr,
-                        label_ptr);
+    add_qemu_ldst_label(s, 1, opc, datalo, datahi, addrlo, addrhi,
+                        mem_index, s->code_ptr, label_ptr);
 #else
     {
         int32_t offset = GUEST_BASE;
-        int base = args[addrlo_idx];
+        TCGReg base = addrlo;
         int seg = 0;
 
         /* ??? We assume all operations have left us with register contents
@@ -1305,7 +1484,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
             offset = 0;
         }
 
-        tcg_out_qemu_ld_direct(s, data_reg, data_reg2, base, offset, seg, opc);
+        tcg_out_qemu_ld_direct(s, datalo, datahi, base, offset, seg, opc);
     }
 #endif
 }
@@ -1372,46 +1551,36 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 
 static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
 {
-    int data_reg, data_reg2 = 0;
-    int addrlo_idx;
+    TCGReg datalo, datahi, addrlo;
 #if defined(CONFIG_SOFTMMU)
+    TCGReg addrhi;
     int mem_index;
     TCGMemOp s_bits;
     uint8_t *label_ptr[2];
 #endif
 
-    data_reg = args[0];
-    addrlo_idx = 1;
-    if (TCG_TARGET_REG_BITS == 32 && opc == 3) {
-        data_reg2 = args[1];
-        addrlo_idx = 2;
-    }
+    datalo = *args++;
+    datahi = (TCG_TARGET_REG_BITS == 32 && opc == 3 ? *args++ : 0);
+    addrlo = *args++;
 
 #if defined(CONFIG_SOFTMMU)
-    mem_index = args[addrlo_idx + 1 + (TARGET_LONG_BITS > TCG_TARGET_REG_BITS)];
+    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
+    mem_index = *args++;
     s_bits = opc & MO_SIZE;
 
-    tcg_out_tlb_load(s, addrlo_idx, mem_index, s_bits, args,
+    tcg_out_tlb_load(s, addrlo, addrhi, mem_index, s_bits,
                      label_ptr, offsetof(CPUTLBEntry, addr_write));
 
     /* TLB Hit.  */
-    tcg_out_qemu_st_direct(s, data_reg, data_reg2, TCG_REG_L1, 0, 0, opc);
+    tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
 
     /* Record the current context of a store into ldst label */
-    add_qemu_ldst_label(s,
-                        0,
-                        opc,
-                        data_reg,
-                        data_reg2,
-                        args[addrlo_idx],
-                        args[addrlo_idx + 1],
-                        mem_index,
-                        s->code_ptr,
-                        label_ptr);
+    add_qemu_ldst_label(s, 0, opc, datalo, datahi, addrlo, addrhi,
+                        mem_index, s->code_ptr, label_ptr);
 #else
     {
         int32_t offset = GUEST_BASE;
-        int base = args[addrlo_idx];
+        TCGReg base = addrlo;
         int seg = 0;
 
         /* ??? We assume all operations have left us with register contents
@@ -1429,221 +1598,11 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
             offset = 0;
         }
 
-        tcg_out_qemu_st_direct(s, data_reg, data_reg2, base, offset, seg, opc);
+        tcg_out_qemu_st_direct(s, datalo, datahi, base, offset, seg, opc);
     }
 #endif
 }
 
-#if defined(CONFIG_SOFTMMU)
-/*
- * Record the context of a call to the out of line helper code for the slow path
- * for a load or store, so that we can later generate the correct helper code
- */
-static void add_qemu_ldst_label(TCGContext *s,
-                                int is_ld,
-                                int opc,
-                                int data_reg,
-                                int data_reg2,
-                                int addrlo_reg,
-                                int addrhi_reg,
-                                int mem_index,
-                                uint8_t *raddr,
-                                uint8_t **label_ptr)
-{
-    int idx;
-    TCGLabelQemuLdst *label;
-
-    if (s->nb_qemu_ldst_labels >= TCG_MAX_QEMU_LDST) {
-        tcg_abort();
-    }
-
-    idx = s->nb_qemu_ldst_labels++;
-    label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[idx];
-    label->is_ld = is_ld;
-    label->opc = opc;
-    label->datalo_reg = data_reg;
-    label->datahi_reg = data_reg2;
-    label->addrlo_reg = addrlo_reg;
-    label->addrhi_reg = addrhi_reg;
-    label->mem_index = mem_index;
-    label->raddr = raddr;
-    label->label_ptr[0] = label_ptr[0];
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        label->label_ptr[1] = label_ptr[1];
-    }
-}
-
-/*
- * Generate code for the slow path for a load at the end of block
- */
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-{
-    TCGMemOp opc = l->opc;
-    TCGMemOp s_bits = opc & MO_SIZE;
-    TCGReg data_reg;
-    uint8_t **label_ptr = &l->label_ptr[0];
-
-    /* resolve label address */
-    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        int ofs = 0;
-
-        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, l->mem_index);
-        ofs += 4;
-
-        tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, (uintptr_t)l->raddr);
-    } else {
-        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-        /* The second argument is already loaded with addrlo.  */
-        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2],
-                     l->mem_index);
-        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
-                     (uintptr_t)l->raddr);
-    }
-
-    tcg_out_calli(s, (uintptr_t)qemu_ld_helpers[s_bits]);
-
-    data_reg = l->datalo_reg;
-    switch (opc & MO_SSIZE) {
-    case MO_SB:
-        tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
-        break;
-    case MO_SW:
-        tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
-        break;
-#if TCG_TARGET_REG_BITS == 64
-    case MO_SL:
-        tcg_out_ext32s(s, data_reg, TCG_REG_EAX);
-        break;
-#endif
-    case MO_UB:
-    case MO_UW:
-        /* Note that the helpers have zero-extended to tcg_target_long.  */
-    case MO_UL:
-        tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-        break;
-    case MO_Q:
-        if (TCG_TARGET_REG_BITS == 64) {
-            tcg_out_mov(s, TCG_TYPE_I64, data_reg, TCG_REG_RAX);
-        } else if (data_reg == TCG_REG_EDX) {
-            /* xchg %edx, %eax */
-            tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
-            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
-        } else {
-            tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
-        }
-        break;
-    default:
-        tcg_abort();
-    }
-
-    /* Jump to the code corresponding to next IR of qemu_st */
-    tcg_out_jmp(s, (uintptr_t)l->raddr);
-}
-
-/*
- * Generate code for the slow path for a store at the end of block
- */
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
-{
-    TCGMemOp opc = l->opc;
-    TCGMemOp s_bits = opc & MO_SIZE;
-    uint8_t **label_ptr = &l->label_ptr[0];
-    TCGReg retaddr;
-
-    /* resolve label address */
-    *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
-    if (TARGET_LONG_BITS > TCG_TARGET_REG_BITS) {
-        *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
-    }
-
-    if (TCG_TARGET_REG_BITS == 32) {
-        int ofs = 0;
-
-        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (TARGET_LONG_BITS == 64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
-        ofs += 4;
-
-        if (s_bits == MO_64) {
-            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
-            ofs += 4;
-        }
-
-        tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, l->mem_index);
-        ofs += 4;
-
-        retaddr = TCG_REG_EAX;
-        tcg_out_movi(s, TCG_TYPE_I32, retaddr, (uintptr_t)l->raddr);
-        tcg_out_st(s, TCG_TYPE_I32, retaddr, TCG_REG_ESP, ofs);
-    } else {
-        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
-        /* The second argument is already loaded with addrlo.  */
-        tcg_out_mov(s, (s_bits == MO_64 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-                    tcg_target_call_iarg_regs[2], l->datalo_reg);
-        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
-                     l->mem_index);
-
-        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
-            retaddr = tcg_target_call_iarg_regs[4];
-            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-        } else {
-            retaddr = TCG_REG_RAX;
-            tcg_out_movi(s, TCG_TYPE_PTR, retaddr, (uintptr_t)l->raddr);
-            tcg_out_st(s, TCG_TYPE_PTR, retaddr, TCG_REG_ESP, 0);
-        }
-    }
-
-    /* "Tail call" to the helper, with the return address back inline.  */
-    tcg_out_push(s, retaddr);
-    tcg_out_jmp(s, (uintptr_t)qemu_st_helpers[s_bits]);
-}
-
-/*
- * Generate TB finalization at the end of block
- */
-void tcg_out_tb_finalize(TCGContext *s)
-{
-    int i;
-    TCGLabelQemuLdst *label;
-
-    /* qemu_ld/st slow paths */
-    for (i = 0; i < s->nb_qemu_ldst_labels; i++) {
-        label = (TCGLabelQemuLdst *)&s->qemu_ldst_labels[i];
-        if (label->is_ld) {
-            tcg_out_qemu_ld_slow_path(s, label);
-        } else {
-            tcg_out_qemu_st_slow_path(s, label);
-        }
-    }
-}
-#endif  /* CONFIG_SOFTMMU */
-
 static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                               const TCGArg *args, const int *const_args)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 14/16] tcg-i386: Remove "cb" output restriction from qemu_st8 for i386
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (12 preceding siblings ...)
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 13/16] tcg-i386: Tidy softmmu routines Richard Henderson
@ 2013-09-04 21:05 ` Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 15/16] tcg-i386: Support new ldst opcodes Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 16/16] target-ppc: Convert to " Richard Henderson
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Once we form a combined qemu_st_i32 opcode, we won't be able to
have separate constraints based on size.  This one is fairly easy
to work around, since eax is available as a scratch register.

When storing variable data, this tends to merely exchange one mov
for another.  E.g.

-:  mov    %esi,%ecx
...
-:  mov    %cl,(%edx)
+:  mov    %esi,%eax
+:  mov    %al,(%edx)

Where we do have a regression is when storing constant data, in which
we may load the constant into edi, when only ecx/ebx ought to be used.

The proper way to recover this regression is to allow constants as
arguments to qemu_st_i32, so that we never load the constant data into
a register at all, must less the wrong register.  TBD.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 89fe121..a3bf885 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1503,6 +1503,12 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 
     switch (memop & MO_SIZE) {
     case MO_8:
+        /* In 32-bit mode, 8-byte stores can only happen from [abcd]x.
+           Use the scratch register if necessary.  */
+        if (TCG_TARGET_REG_BITS == 32 && datalo >= 4) {
+            tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
+            datalo = scratch;
+        }
         tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R + seg,
                              datalo, base, ofs);
         break;
@@ -2108,7 +2114,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_qemu_ld32, { "r", "L" } },
     { INDEX_op_qemu_ld64, { "r", "r", "L" } },
 
-    { INDEX_op_qemu_st8, { "cb", "L" } },
+    { INDEX_op_qemu_st8, { "L", "L" } },
     { INDEX_op_qemu_st16, { "L", "L" } },
     { INDEX_op_qemu_st32, { "L", "L" } },
     { INDEX_op_qemu_st64, { "L", "L", "L" } },
@@ -2120,7 +2126,7 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_qemu_ld32, { "r", "L", "L" } },
     { INDEX_op_qemu_ld64, { "r", "r", "L", "L" } },
 
-    { INDEX_op_qemu_st8, { "cb", "L", "L" } },
+    { INDEX_op_qemu_st8, { "L", "L", "L" } },
     { INDEX_op_qemu_st16, { "L", "L", "L" } },
     { INDEX_op_qemu_st32, { "L", "L", "L" } },
     { INDEX_op_qemu_st64, { "L", "L", "L", "L" } },
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 15/16] tcg-i386: Support new ldst opcodes
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (13 preceding siblings ...)
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 14/16] tcg-i386: Remove "cb" output restriction from qemu_st8 for i386 Richard Henderson
@ 2013-09-04 21:05 ` Richard Henderson
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 16/16] target-ppc: Convert to " Richard Henderson
  15 siblings, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

No support for helpers with non-default endianness yet,
but good enough to test the opcodes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 139 ++++++++++++++++++--------------------------------
 tcg/i386/tcg-target.h |   2 +-
 2 files changed, 51 insertions(+), 90 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index a3bf885..17ba13d 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1024,21 +1024,27 @@ static void tcg_out_jmp(TCGContext *s, uintptr_t dest)
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
  */
-static const void * const qemu_ld_helpers[4] = {
-    helper_ret_ldub_mmu,
-    helper_ret_lduw_mmu,
-    helper_ret_ldul_mmu,
-    helper_ret_ldq_mmu,
+static const void * const qemu_ld_helpers[16] = {
+    [MO_UB]   = helper_ret_ldub_mmu,
+    [MO_LEUW] = helper_le_lduw_mmu,
+    [MO_LEUL] = helper_le_ldul_mmu,
+    [MO_LEQ]  = helper_le_ldq_mmu,
+    [MO_BEUW] = helper_be_lduw_mmu,
+    [MO_BEUL] = helper_be_ldul_mmu,
+    [MO_BEQ]  = helper_be_ldq_mmu,
 };
 
 /* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
  *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
  */
-static const void * const qemu_st_helpers[4] = {
-    helper_ret_stb_mmu,
-    helper_ret_stw_mmu,
-    helper_ret_stl_mmu,
-    helper_ret_stq_mmu,
+static const void * const qemu_st_helpers[16] = {
+    [MO_UB]   = helper_ret_stb_mmu,
+    [MO_LEUW] = helper_le_stw_mmu,
+    [MO_LEUL] = helper_le_stl_mmu,
+    [MO_LEQ]  = helper_le_stq_mmu,
+    [MO_BEUW] = helper_be_stw_mmu,
+    [MO_BEUL] = helper_be_stl_mmu,
+    [MO_BEQ]  = helper_be_stq_mmu,
 };
 
 /* Perform the TLB load and compare.
@@ -1170,7 +1176,6 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, TCGMemOp opc,
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
     TCGMemOp opc = l->opc;
-    TCGMemOp s_bits = opc & MO_SIZE;
     TCGReg data_reg;
     uint8_t **label_ptr = &l->label_ptr[0];
 
@@ -1207,7 +1212,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
                      (uintptr_t)l->raddr);
     }
 
-    tcg_out_calli(s, (uintptr_t)qemu_ld_helpers[s_bits]);
+    tcg_out_calli(s, (uintptr_t)qemu_ld_helpers[opc & ~MO_SIGN]);
 
     data_reg = l->datalo_reg;
     switch (opc & MO_SSIZE) {
@@ -1312,7 +1317,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 
     /* "Tail call" to the helper, with the return address back inline.  */
     tcg_out_push(s, retaddr);
-    tcg_out_jmp(s, (uintptr_t)qemu_st_helpers[s_bits]);
+    tcg_out_jmp(s, (uintptr_t)qemu_st_helpers[opc]);
 }
 
 /*
@@ -1435,22 +1440,24 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
 /* XXX: qemu_ld and qemu_st could be modified to clobber only EDX and
    EAX. It will be useful once fixed registers globals are less
    common. */
-static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, TCGMemOp opc)
+static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, bool is64)
 {
     TCGReg datalo, datahi, addrlo;
+    TCGReg addrhi __attribute__((unused));
+    TCGMemOp opc;
 #if defined(CONFIG_SOFTMMU)
-    TCGReg addrhi;
     int mem_index;
     TCGMemOp s_bits;
     uint8_t *label_ptr[2];
 #endif
 
     datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && opc == 3 ? *args++ : 0);
+    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
     addrlo = *args++;
+    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
+    opc = *args++;
 
 #if defined(CONFIG_SOFTMMU)
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
     mem_index = *args++;
     s_bits = opc & MO_SIZE;
 
@@ -1555,22 +1562,24 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
     }
 }
 
-static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, TCGMemOp opc)
+static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, bool is64)
 {
     TCGReg datalo, datahi, addrlo;
+    TCGReg addrhi __attribute__((unused));
+    TCGMemOp opc;
 #if defined(CONFIG_SOFTMMU)
-    TCGReg addrhi;
     int mem_index;
     TCGMemOp s_bits;
     uint8_t *label_ptr[2];
 #endif
 
     datalo = *args++;
-    datahi = (TCG_TARGET_REG_BITS == 32 && opc == 3 ? *args++ : 0);
+    datahi = (TCG_TARGET_REG_BITS == 32 && is64 ? *args++ : 0);
     addrlo = *args++;
+    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
+    opc = *args++;
 
 #if defined(CONFIG_SOFTMMU)
-    addrhi = (TARGET_LONG_BITS > TCG_TARGET_REG_BITS ? *args++ : 0);
     mem_index = *args++;
     s_bits = opc & MO_SIZE;
 
@@ -1834,39 +1843,17 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_ext16u(s, args[0], args[1]);
         break;
 
-    case INDEX_op_qemu_ld8u:
-        tcg_out_qemu_ld(s, args, MO_UB);
-        break;
-    case INDEX_op_qemu_ld8s:
-        tcg_out_qemu_ld(s, args, MO_SB);
+    case INDEX_op_qemu_ld_i32:
+        tcg_out_qemu_ld(s, args, 0);
         break;
-    case INDEX_op_qemu_ld16u:
-        tcg_out_qemu_ld(s, args, MO_TEUW);
+    case INDEX_op_qemu_ld_i64:
+        tcg_out_qemu_ld(s, args, 1);
         break;
-    case INDEX_op_qemu_ld16s:
-        tcg_out_qemu_ld(s, args, MO_TESW);
+    case INDEX_op_qemu_st_i32:
+        tcg_out_qemu_st(s, args, 0);
         break;
-#if TCG_TARGET_REG_BITS == 64
-    case INDEX_op_qemu_ld32u:
-#endif
-    case INDEX_op_qemu_ld32:
-        tcg_out_qemu_ld(s, args, MO_TEUL);
-        break;
-    case INDEX_op_qemu_ld64:
-        tcg_out_qemu_ld(s, args, MO_TEQ);
-        break;
-
-    case INDEX_op_qemu_st8:
-        tcg_out_qemu_st(s, args, MO_UB);
-        break;
-    case INDEX_op_qemu_st16:
-        tcg_out_qemu_st(s, args, MO_TEUW);
-        break;
-    case INDEX_op_qemu_st32:
-        tcg_out_qemu_st(s, args, MO_TEUL);
-        break;
-    case INDEX_op_qemu_st64:
-        tcg_out_qemu_st(s, args, MO_TEQ);
+    case INDEX_op_qemu_st_i64:
+        tcg_out_qemu_st(s, args, 1);
         break;
 
     OP_32_64(mulu2):
@@ -1926,9 +1913,6 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
             tcg_out_st(s, TCG_TYPE_I64, args[0], args[1], args[2]);
         }
         break;
-    case INDEX_op_qemu_ld32s:
-        tcg_out_qemu_ld(s, args, MO_TESL);
-        break;
 
     case INDEX_op_brcond_i64:
         tcg_out_brcond64(s, args[2], args[0], args[1], const_args[1],
@@ -2093,43 +2077,20 @@ static const TCGTargetOpDef x86_op_defs[] = {
 #endif
 
 #if TCG_TARGET_REG_BITS == 64
-    { INDEX_op_qemu_ld8u, { "r", "L" } },
-    { INDEX_op_qemu_ld8s, { "r", "L" } },
-    { INDEX_op_qemu_ld16u, { "r", "L" } },
-    { INDEX_op_qemu_ld16s, { "r", "L" } },
-    { INDEX_op_qemu_ld32, { "r", "L" } },
-    { INDEX_op_qemu_ld32u, { "r", "L" } },
-    { INDEX_op_qemu_ld32s, { "r", "L" } },
-    { INDEX_op_qemu_ld64, { "r", "L" } },
-
-    { INDEX_op_qemu_st8, { "L", "L" } },
-    { INDEX_op_qemu_st16, { "L", "L" } },
-    { INDEX_op_qemu_st32, { "L", "L" } },
-    { INDEX_op_qemu_st64, { "L", "L" } },
+    { INDEX_op_qemu_ld_i32, { "r", "L" } },
+    { INDEX_op_qemu_st_i32, { "L", "L" } },
+    { INDEX_op_qemu_ld_i64, { "r", "L" } },
+    { INDEX_op_qemu_st_i64, { "L", "L" } },
 #elif TARGET_LONG_BITS <= TCG_TARGET_REG_BITS
-    { INDEX_op_qemu_ld8u, { "r", "L" } },
-    { INDEX_op_qemu_ld8s, { "r", "L" } },
-    { INDEX_op_qemu_ld16u, { "r", "L" } },
-    { INDEX_op_qemu_ld16s, { "r", "L" } },
-    { INDEX_op_qemu_ld32, { "r", "L" } },
-    { INDEX_op_qemu_ld64, { "r", "r", "L" } },
-
-    { INDEX_op_qemu_st8, { "L", "L" } },
-    { INDEX_op_qemu_st16, { "L", "L" } },
-    { INDEX_op_qemu_st32, { "L", "L" } },
-    { INDEX_op_qemu_st64, { "L", "L", "L" } },
+    { INDEX_op_qemu_ld_i32, { "r", "L" } },
+    { INDEX_op_qemu_st_i32, { "L", "L" } },
+    { INDEX_op_qemu_ld_i64, { "r", "r", "L" } },
+    { INDEX_op_qemu_st_i64, { "L", "L", "L" } },
 #else
-    { INDEX_op_qemu_ld8u, { "r", "L", "L" } },
-    { INDEX_op_qemu_ld8s, { "r", "L", "L" } },
-    { INDEX_op_qemu_ld16u, { "r", "L", "L" } },
-    { INDEX_op_qemu_ld16s, { "r", "L", "L" } },
-    { INDEX_op_qemu_ld32, { "r", "L", "L" } },
-    { INDEX_op_qemu_ld64, { "r", "r", "L", "L" } },
-
-    { INDEX_op_qemu_st8, { "L", "L", "L" } },
-    { INDEX_op_qemu_st16, { "L", "L", "L" } },
-    { INDEX_op_qemu_st32, { "L", "L", "L" } },
-    { INDEX_op_qemu_st64, { "L", "L", "L", "L" } },
+    { INDEX_op_qemu_ld_i32, { "r", "L", "L" } },
+    { INDEX_op_qemu_st_i32, { "L", "L", "L" } },
+    { INDEX_op_qemu_ld_i64, { "r", "r", "L", "L" } },
+    { INDEX_op_qemu_st_i64, { "L", "L", "L", "L" } },
 #endif
     { -1 },
 };
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 47fdb81..bcf8ac4 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -131,7 +131,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mulsh_i64        0
 #endif
 
-#define TCG_TARGET_HAS_new_ldst         0
+#define TCG_TARGET_HAS_new_ldst         1
 
 #define TCG_TARGET_deposit_i32_valid(ofs, len) \
     (((ofs) == 0 && (len) == 8) || ((ofs) == 8 && (len) == 8) || \
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [Qemu-devel] [PATCH 16/16] target-ppc: Convert to new ldst opcodes
  2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
                   ` (14 preceding siblings ...)
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 15/16] tcg-i386: Support new ldst opcodes Richard Henderson
@ 2013-09-04 21:05 ` Richard Henderson
  2013-09-05  9:08   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
  15 siblings, 1 reply; 22+ messages in thread
From: Richard Henderson @ 2013-09-04 21:05 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, aurelien

This lets us change "le_mode" to "end_mode" and fold away nearly all
of the tests for the current cpu endianness, and removing all of the
explicitly generated bswap opcodes.

Cc: qemu-ppc@nongnu.org
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-ppc/translate.c | 147 +++++++++++++------------------------------------
 1 file changed, 39 insertions(+), 108 deletions(-)

diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 2da7bc7..b56ab87 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -192,7 +192,7 @@ typedef struct DisasContext {
     int mem_idx;
     int access_type;
     /* Translation flags */
-    int le_mode;
+    TCGMemOp end_mode;
 #if defined(TARGET_PPC64)
     int sf_mode;
     int has_cfar;
@@ -2514,99 +2514,57 @@ static inline void gen_check_align(DisasContext *ctx, TCGv EA, int mask)
 /***                             Integer load                              ***/
 static inline void gen_qemu_ld8u(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld8u(arg1, arg2, ctx->mem_idx);
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, MO_UB);
 }
 
 static inline void gen_qemu_ld8s(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld8s(arg1, arg2, ctx->mem_idx);
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, MO_SB);
 }
 
 static inline void gen_qemu_ld16u(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld16u(arg1, arg2, ctx->mem_idx);
-    if (unlikely(ctx->le_mode)) {
-        tcg_gen_bswap16_tl(arg1, arg1);
-    }
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, MO_UW | ctx->end_mode);
 }
 
 static inline void gen_qemu_ld16s(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (unlikely(ctx->le_mode)) {
-        tcg_gen_qemu_ld16u(arg1, arg2, ctx->mem_idx);
-        tcg_gen_bswap16_tl(arg1, arg1);
-        tcg_gen_ext16s_tl(arg1, arg1);
-    } else {
-        tcg_gen_qemu_ld16s(arg1, arg2, ctx->mem_idx);
-    }
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, MO_SW | ctx->end_mode);
 }
 
 static inline void gen_qemu_ld32u(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld32u(arg1, arg2, ctx->mem_idx);
-    if (unlikely(ctx->le_mode)) {
-        tcg_gen_bswap32_tl(arg1, arg1);
-    }
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, MO_UL | ctx->end_mode);
 }
 
 static inline void gen_qemu_ld32s(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (unlikely(ctx->le_mode)) {
-        tcg_gen_qemu_ld32u(arg1, arg2, ctx->mem_idx);
-        tcg_gen_bswap32_tl(arg1, arg1);
-        tcg_gen_ext32s_tl(arg1, arg1);
-    } else
-        tcg_gen_qemu_ld32s(arg1, arg2, ctx->mem_idx);
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx, MO_SL | ctx->end_mode);
 }
 
 static inline void gen_qemu_ld64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld64(arg1, arg2, ctx->mem_idx);
-    if (unlikely(ctx->le_mode)) {
-        tcg_gen_bswap64_i64(arg1, arg1);
-    }
+    tcg_gen_qemu_ld_i64(arg1, arg2, ctx->mem_idx, MO_Q | ctx->end_mode);
 }
 
 static inline void gen_qemu_st8(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_st8(arg1, arg2, ctx->mem_idx);
+    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, MO_UB);
 }
 
 static inline void gen_qemu_st16(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (unlikely(ctx->le_mode)) {
-        TCGv t0 = tcg_temp_new();
-        tcg_gen_ext16u_tl(t0, arg1);
-        tcg_gen_bswap16_tl(t0, t0);
-        tcg_gen_qemu_st16(t0, arg2, ctx->mem_idx);
-        tcg_temp_free(t0);
-    } else {
-        tcg_gen_qemu_st16(arg1, arg2, ctx->mem_idx);
-    }
+    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, MO_UW | ctx->end_mode);
 }
 
 static inline void gen_qemu_st32(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (unlikely(ctx->le_mode)) {
-        TCGv t0 = tcg_temp_new();
-        tcg_gen_ext32u_tl(t0, arg1);
-        tcg_gen_bswap32_tl(t0, t0);
-        tcg_gen_qemu_st32(t0, arg2, ctx->mem_idx);
-        tcg_temp_free(t0);
-    } else {
-        tcg_gen_qemu_st32(arg1, arg2, ctx->mem_idx);
-    }
+    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx, MO_UL | ctx->end_mode);
 }
 
 static inline void gen_qemu_st64(DisasContext *ctx, TCGv_i64 arg1, TCGv arg2)
 {
-    if (unlikely(ctx->le_mode)) {
-        TCGv_i64 t0 = tcg_temp_new_i64();
-        tcg_gen_bswap64_i64(t0, arg1);
-        tcg_gen_qemu_st64(t0, arg2, ctx->mem_idx);
-        tcg_temp_free_i64(t0);
-    } else
-        tcg_gen_qemu_st64(arg1, arg2, ctx->mem_idx);
+    tcg_gen_qemu_st_i64(arg1, arg2, ctx->mem_idx, MO_Q | ctx->end_mode);
 }
 
 #define GEN_LD(name, ldop, opc, type)                                         \
@@ -2739,7 +2697,7 @@ static void gen_lq(DisasContext *ctx)
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
         return;
     }
-    if (unlikely(ctx->le_mode)) {
+    if (unlikely(ctx->end_mode == MO_LE)) {
         /* Little-endian mode is not handled */
         gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
         return;
@@ -2850,7 +2808,7 @@ static void gen_std(DisasContext *ctx)
             gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
             return;
         }
-        if (unlikely(ctx->le_mode)) {
+        if (unlikely(ctx->end_mode == MO_LE)) {
             /* Little-endian mode is not handled */
             gen_exception_err(ctx, POWERPC_EXCP_ALIGN, POWERPC_EXCP_ALIGN_LE);
             return;
@@ -2885,20 +2843,16 @@ static void gen_std(DisasContext *ctx)
 /* lhbrx */
 static inline void gen_qemu_ld16ur(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld16u(arg1, arg2, ctx->mem_idx);
-    if (likely(!ctx->le_mode)) {
-        tcg_gen_bswap16_tl(arg1, arg1);
-    }
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx,
+                       MO_UW | (ctx->end_mode ^ MO_BSWAP));
 }
 GEN_LDX(lhbr, ld16ur, 0x16, 0x18, PPC_INTEGER);
 
 /* lwbrx */
 static inline void gen_qemu_ld32ur(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld32u(arg1, arg2, ctx->mem_idx);
-    if (likely(!ctx->le_mode)) {
-        tcg_gen_bswap32_tl(arg1, arg1);
-    }
+    tcg_gen_qemu_ld_tl(arg1, arg2, ctx->mem_idx,
+                       MO_UL | (ctx->end_mode ^ MO_BSWAP));
 }
 GEN_LDX(lwbr, ld32ur, 0x16, 0x10, PPC_INTEGER);
 
@@ -2906,10 +2860,8 @@ GEN_LDX(lwbr, ld32ur, 0x16, 0x10, PPC_INTEGER);
 /* ldbrx */
 static inline void gen_qemu_ld64ur(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    tcg_gen_qemu_ld64(arg1, arg2, ctx->mem_idx);
-    if (likely(!ctx->le_mode)) {
-        tcg_gen_bswap64_tl(arg1, arg1);
-    }
+    tcg_gen_qemu_ld_i64(arg1, arg2, ctx->mem_idx,
+                        MO_Q | (ctx->end_mode ^ MO_BSWAP));
 }
 GEN_LDX_E(ldbr, ld64ur, 0x14, 0x10, PPC_NONE, PPC2_DBRX);
 #endif  /* TARGET_PPC64 */
@@ -2917,30 +2869,16 @@ GEN_LDX_E(ldbr, ld64ur, 0x14, 0x10, PPC_NONE, PPC2_DBRX);
 /* sthbrx */
 static inline void gen_qemu_st16r(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (likely(!ctx->le_mode)) {
-        TCGv t0 = tcg_temp_new();
-        tcg_gen_ext16u_tl(t0, arg1);
-        tcg_gen_bswap16_tl(t0, t0);
-        tcg_gen_qemu_st16(t0, arg2, ctx->mem_idx);
-        tcg_temp_free(t0);
-    } else {
-        tcg_gen_qemu_st16(arg1, arg2, ctx->mem_idx);
-    }
+    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx,
+                       MO_UW | (ctx->end_mode ^ MO_BSWAP));
 }
 GEN_STX(sthbr, st16r, 0x16, 0x1C, PPC_INTEGER);
 
 /* stwbrx */
 static inline void gen_qemu_st32r(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (likely(!ctx->le_mode)) {
-        TCGv t0 = tcg_temp_new();
-        tcg_gen_ext32u_tl(t0, arg1);
-        tcg_gen_bswap32_tl(t0, t0);
-        tcg_gen_qemu_st32(t0, arg2, ctx->mem_idx);
-        tcg_temp_free(t0);
-    } else {
-        tcg_gen_qemu_st32(arg1, arg2, ctx->mem_idx);
-    }
+    tcg_gen_qemu_st_tl(arg1, arg2, ctx->mem_idx,
+                       MO_UL | (ctx->end_mode ^ MO_BSWAP));
 }
 GEN_STX(stwbr, st32r, 0x16, 0x14, PPC_INTEGER);
 
@@ -2948,14 +2886,8 @@ GEN_STX(stwbr, st32r, 0x16, 0x14, PPC_INTEGER);
 /* stdbrx */
 static inline void gen_qemu_st64r(DisasContext *ctx, TCGv arg1, TCGv arg2)
 {
-    if (likely(!ctx->le_mode)) {
-        TCGv t0 = tcg_temp_new();
-        tcg_gen_bswap64_tl(t0, arg1);
-        tcg_gen_qemu_st64(t0, arg2, ctx->mem_idx);
-        tcg_temp_free(t0);
-    } else {
-        tcg_gen_qemu_st64(arg1, arg2, ctx->mem_idx);
-    }
+    tcg_gen_qemu_st_i64(arg1, arg2, ctx->mem_idx,
+                        MO_Q | (ctx->end_mode ^ MO_BSWAP));
 }
 GEN_STX_E(stdbr, st64r, 0x14, 0x14, PPC_NONE, PPC2_DBRX);
 #endif  /* TARGET_PPC64 */
@@ -3327,7 +3259,7 @@ static void gen_lfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    if (unlikely(ctx->le_mode)) {
+    if (unlikely(ctx->end_mode == MO_LE)) {
         gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
         gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
@@ -3350,7 +3282,7 @@ static void gen_lfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    if (unlikely(ctx->le_mode)) {
+    if (unlikely(ctx->end_mode == MO_LE)) {
         gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
         gen_qemu_ld64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
@@ -3485,7 +3417,7 @@ static void gen_stfdp(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0);                                           \
-    if (unlikely(ctx->le_mode)) {
+    if (unlikely(ctx->end_mode == MO_LE)) {
         gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
         gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
@@ -3508,7 +3440,7 @@ static void gen_stfdpx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_FLOAT);
     EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    if (unlikely(ctx->le_mode)) {
+    if (unlikely(ctx->end_mode == MO_LE)) {
         gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode) + 1], EA);
         tcg_gen_addi_tl(EA, EA, 8);
         gen_qemu_st64(ctx, cpu_fpr[rD(ctx->opcode)], EA);
@@ -6453,7 +6385,7 @@ static void glue(gen_, name)(DisasContext *ctx)
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
-    if (ctx->le_mode) {                                                       \
+    if (ctx->end_mode == MO_LE) {                                             \
         gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                    \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
         gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                    \
@@ -6477,7 +6409,7 @@ static void gen_st##name(DisasContext *ctx)                                   \
     EA = tcg_temp_new();                                                      \
     gen_addr_reg_index(ctx, EA);                                              \
     tcg_gen_andi_tl(EA, EA, ~0xf);                                            \
-    if (ctx->le_mode) {                                                       \
+    if (ctx->end_mode == MO_LE) {                                             \
         gen_qemu_st64(ctx, cpu_avrl[rD(ctx->opcode)], EA);                    \
         tcg_gen_addi_tl(EA, EA, 8);                                           \
         gen_qemu_st64(ctx, cpu_avrh[rD(ctx->opcode)], EA);                    \
@@ -9751,7 +9683,7 @@ static inline void gen_intermediate_code_internal(PowerPCCPU *cpu,
     ctx.insns_flags = env->insns_flags;
     ctx.insns_flags2 = env->insns_flags2;
     ctx.access_type = -1;
-    ctx.le_mode = env->hflags & (1 << MSR_LE) ? 1 : 0;
+    ctx.end_mode = (env->hflags & (1 << MSR_LE) ? MO_LE : MO_BE);
 #if defined(TARGET_PPC64)
     ctx.sf_mode = msr_is_64bit(env, env->msr);
     ctx.has_cfar = !!(env->flags & POWERPC_FLAG_CFAR);
@@ -9811,14 +9743,13 @@ static inline void gen_intermediate_code_internal(PowerPCCPU *cpu,
                   ctx.nip, ctx.mem_idx, (int)msr_ir);
         if (num_insns + 1 == max_insns && (tb->cflags & CF_LAST_IO))
             gen_io_start();
-        if (unlikely(ctx.le_mode)) {
-            ctx.opcode = bswap32(cpu_ldl_code(env, ctx.nip));
-        } else {
-            ctx.opcode = cpu_ldl_code(env, ctx.nip);
+        ctx.opcode = cpu_ldl_code(env, ctx.nip);
+        if (unlikely(ctx.end_mode == MO_LE)) {
+            ctx.opcode = bswap32(ctx.opcode);
         }
         LOG_DISAS("translate opcode %08x (%02x %02x %02x) (%s)\n",
-                    ctx.opcode, opc1(ctx.opcode), opc2(ctx.opcode),
-                    opc3(ctx.opcode), ctx.le_mode ? "little" : "big");
+                  ctx.opcode, opc1(ctx.opcode), opc2(ctx.opcode),
+                  opc3(ctx.opcode), ctx.end_mode == MO_LE ? "little" : "big");
         if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP | CPU_LOG_TB_OP_OPT))) {
             tcg_gen_debug_insn_start(ctx.nip);
         }
@@ -9910,7 +9841,7 @@ static inline void gen_intermediate_code_internal(PowerPCCPU *cpu,
     if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)) {
         int flags;
         flags = env->bfd_mach;
-        flags |= ctx.le_mode << 16;
+        flags |= (ctx.end_mode == MO_LE) << 16;
         qemu_log("IN: %s\n", lookup_symbol(pc_start));
         log_target_disas(env, pc_start, ctx.nip - pc_start, flags);
         qemu_log("\n");
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/16] target-ppc: Convert to new ldst opcodes
  2013-09-04 21:05 ` [Qemu-devel] [PATCH 16/16] target-ppc: Convert to " Richard Henderson
@ 2013-09-05  9:08   ` Alexander Graf
  2013-09-05 11:40     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Graf @ 2013-09-05  9:08 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-ppc@nongnu.org list:PowerPC,
	qemu-devel@nongnu.org qemu-devel


On 04.09.2013, at 23:05, Richard Henderson wrote:

> This lets us change "le_mode" to "end_mode" and fold away nearly all
> of the tests for the current cpu endianness, and removing all of the
> explicitly generated bswap opcodes.
> 
> Cc: qemu-ppc@nongnu.org
> Signed-off-by: Richard Henderson <rth@twiddle.net>

No complaints from me, apart from the usual "LE mode isn't necessarily what you think it is on PPC" one. But the code would be as broken as before IIUC.

Ben, you had some insight in how LE mode on different PPC flavors work. Could you please make sure we're not walking into the wrong direction here?


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/16] target-ppc: Convert to new ldst opcodes
  2013-09-05  9:08   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
@ 2013-09-05 11:40     ` Benjamin Herrenschmidt
  2013-09-05 12:59       ` Alexander Graf
  2013-09-05 15:35       ` Richard Henderson
  0 siblings, 2 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-09-05 11:40 UTC (permalink / raw)
  To: Alexander Graf
  Cc: qemu-ppc@nongnu.org list:PowerPC,
	qemu-devel@nongnu.org qemu-devel, Richard Henderson

On Thu, 2013-09-05 at 11:08 +0200, Alexander Graf wrote:
> On 04.09.2013, at 23:05, Richard Henderson wrote:
> 
> > This lets us change "le_mode" to "end_mode" and fold away nearly all
> > of the tests for the current cpu endianness, and removing all of the
> > explicitly generated bswap opcodes.

Only nit: I find "end_mode" a very confusing identifier :-) "end"
usually means something else ! Why not endian_mode ?
> > 
> > Cc: qemu-ppc@nongnu.org
> > Signed-off-by: Richard Henderson <rth@twiddle.net>
> 
> No complaints from me, apart from the usual "LE mode isn't necessarily what you think it is on PPC" one. But the code would be as broken as before IIUC.
> 
> Ben, you had some insight in how LE mode on different PPC flavors work. Could you please make sure we're not walking into the wrong direction here?

I haven't seen the patch itself for some reason (and I'm about to go off
for a few days). The early day powerpc endian mode can be safely ignored
I think, I don't even remember the details myself, I think it induced
some changes to the byte lanes ordering on the bus and thus required the
host bridge to be adjusted.

The embedded PPCs have simply a per-page E bit in the TLB controlling
the endianness of accesses through the translation, the endianness is
"clean" in that case, and the bus doesn't flip around so it's akin to
what P7 does but with a finer granularity.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/16] target-ppc: Convert to new ldst opcodes
  2013-09-05 11:40     ` Benjamin Herrenschmidt
@ 2013-09-05 12:59       ` Alexander Graf
  2013-09-05 13:37         ` Benjamin Herrenschmidt
  2013-09-05 15:35       ` Richard Henderson
  1 sibling, 1 reply; 22+ messages in thread
From: Alexander Graf @ 2013-09-05 12:59 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: qemu-ppc@nongnu.org list:PowerPC,
	qemu-devel@nongnu.org qemu-devel, Richard Henderson


On 05.09.2013, at 13:40, Benjamin Herrenschmidt wrote:

> On Thu, 2013-09-05 at 11:08 +0200, Alexander Graf wrote:
>> On 04.09.2013, at 23:05, Richard Henderson wrote:
>> 
>>> This lets us change "le_mode" to "end_mode" and fold away nearly all
>>> of the tests for the current cpu endianness, and removing all of the
>>> explicitly generated bswap opcodes.
> 
> Only nit: I find "end_mode" a very confusing identifier :-) "end"
> usually means something else ! Why not endian_mode ?
>>> 
>>> Cc: qemu-ppc@nongnu.org
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> 
>> No complaints from me, apart from the usual "LE mode isn't necessarily what you think it is on PPC" one. But the code would be as broken as before IIUC.
>> 
>> Ben, you had some insight in how LE mode on different PPC flavors work. Could you please make sure we're not walking into the wrong direction here?
> 
> I haven't seen the patch itself for some reason (and I'm about to go off
> for a few days). The early day powerpc endian mode can be safely ignored
> I think, I don't even remember the details myself, I think it induced
> some changes to the byte lanes ordering on the bus and thus required the
> host bridge to be adjusted.
> 
> The embedded PPCs have simply a per-page E bit in the TLB controlling
> the endianness of accesses through the translation, the endianness is
> "clean" in that case, and the bus doesn't flip around so it's akin to
> what P7 does but with a finer granularity.

So on P7 basically everything that goes from registers out is byte-swapped, including any RAM access and MMIOs? I think that's basically what the current little endian mode implements (though it might miss a few places, like FPU or Altivec, but I'd consider that bugs).


Alex

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/16] target-ppc: Convert to new ldst opcodes
  2013-09-05 12:59       ` Alexander Graf
@ 2013-09-05 13:37         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 22+ messages in thread
From: Benjamin Herrenschmidt @ 2013-09-05 13:37 UTC (permalink / raw)
  To: Alexander Graf
  Cc: qemu-ppc@nongnu.org list:PowerPC,
	qemu-devel@nongnu.org qemu-devel, Richard Henderson

On Thu, 2013-09-05 at 14:59 +0200, Alexander Graf wrote:

> > The embedded PPCs have simply a per-page E bit in the TLB
> controlling
> > the endianness of accesses through the translation, the endianness
> is
> > "clean" in that case, and the bus doesn't flip around so it's akin
> to
> > what P7 does but with a finer granularity.
> 
> So on P7 basically everything that goes from registers out is
> byte-swapped, including any RAM access and MMIOs? I think that's
> basically what the current little endian mode implements (though it
> might miss a few places, like FPU or Altivec, but I'd consider that
> bugs).

Yes. There are some oddities with VSX though (it does PDP endian iirc).

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/16] target-ppc: Convert to new ldst opcodes
  2013-09-05 11:40     ` Benjamin Herrenschmidt
  2013-09-05 12:59       ` Alexander Graf
@ 2013-09-05 15:35       ` Richard Henderson
  1 sibling, 0 replies; 22+ messages in thread
From: Richard Henderson @ 2013-09-05 15:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: qemu-ppc@nongnu.org list:PowerPC, Alexander Graf,
	qemu-devel@nongnu.org qemu-devel

On 09/05/2013 04:40 AM, Benjamin Herrenschmidt wrote:
> Only nit: I find "end_mode" a very confusing identifier :-) "end"
> usually means something else ! Why not endian_mode ?

80 column wrapping.  A poor excuse, I know...

> I haven't seen the patch itself for some reason (and I'm about to go off
> for a few days). The early day powerpc endian mode can be safely ignored
> I think, I don't even remember the details myself, I think it induced
> some changes to the byte lanes ordering on the bus and thus required the
> host bridge to be adjusted.

This sounds like the ARM BE32 mode on early arms.  Now discontinued, afaik.

> The embedded PPCs have simply a per-page E bit in the TLB controlling
> the endianness of accesses through the translation, the endianness is
> "clean" in that case, and the bus doesn't flip around so it's akin to
> what P7 does but with a finer granularity.

This would be significantly harder (and slower) to emulate.  It would
require playing games at the cputlb level, and really nothing to do with
the code generation from tcg at all.


r~

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2013-09-05 15:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-04 21:04 [Qemu-devel] [PATCH 00/16] Streamlining endian handling in TCG Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 01/16] tcg: Add TCGMemOp Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 02/16] tcg-i386: Use TCGMemOp within qemu_ldst routines Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 03/16] tcg-aarch64: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 04/16] tcg-arm: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 05/16] tcg-s390: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 06/16] tcg-ppc: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 07/16] tcg-ppc64: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 08/16] tcg-hppa: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 09/16] tcg-mips: " Richard Henderson
2013-09-04 21:04 ` [Qemu-devel] [PATCH 10/16] tcg-sparc: " Richard Henderson
2013-09-04 21:05 ` [Qemu-devel] [PATCH 11/16] tcg: Add qemu_ld_st_i32/64 Richard Henderson
2013-09-04 21:05 ` [Qemu-devel] [PATCH 12/16] exec: Add both big- and little-endian memory helpers Richard Henderson
2013-09-04 21:05 ` [Qemu-devel] [PATCH 13/16] tcg-i386: Tidy softmmu routines Richard Henderson
2013-09-04 21:05 ` [Qemu-devel] [PATCH 14/16] tcg-i386: Remove "cb" output restriction from qemu_st8 for i386 Richard Henderson
2013-09-04 21:05 ` [Qemu-devel] [PATCH 15/16] tcg-i386: Support new ldst opcodes Richard Henderson
2013-09-04 21:05 ` [Qemu-devel] [PATCH 16/16] target-ppc: Convert to " Richard Henderson
2013-09-05  9:08   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2013-09-05 11:40     ` Benjamin Herrenschmidt
2013-09-05 12:59       ` Alexander Graf
2013-09-05 13:37         ` Benjamin Herrenschmidt
2013-09-05 15:35       ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).