[Qemu-devel] [PATCH v2 0/7] target-mips improvements

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v2 0/7] target-mips improvements
@ 2012-09-17 21:35 Richard Henderson
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple Richard Henderson
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

The thread that Aurelien replied to was from March.  Going back to
revive that patch I found I'd done some further work in April, which
I may never have got around to posting.

The first three patches fix compilation errors when MIPS_DEBUG_DISAS
is defined.

The second three patches change the mips target to use TCG registers
for the FPU.  While they help the quality of the generated code for
LMI, they are not required.

The final patch implements LMI.  Except for the existance of the
gen_load/store_fpr_pair functions introduced in patch 6, the final
patch is independent of the transition to TCG registers.

I've addressed the issues raised by Aurelien in his v1 review.

r~

Richard Henderson (7):
  target-mips: Set opn in gen_ldst_multiple.
  target-mips: Fix MIPS_DEBUG.
  target-mips: Always evaluate debugging macro arguments
  target-mips: Pass DisasContext to fpr32 load/store routines
  target-mips: Use TCG registers for the FPU.
  target-mips: Add accessors for the two 32-bit halves of a 64-bit FPR
  target-mips: Implement Loongson Multimedia Instructions

 target-mips/Makefile.objs |   2 +-
 target-mips/helper.h      |  59 +++
 target-mips/lmi_helper.c  | 744 +++++++++++++++++++++++++++++++++++
 target-mips/translate.c   | 962 +++++++++++++++++++++++++++++++++-------------
 4 files changed, 1492 insertions(+), 275 deletions(-)
 create mode 100644 target-mips/lmi_helper.c

-- 
1.7.11.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple.
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-18 16:38   ` Aurelien Jarno
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 2/7] target-mips: Fix MIPS_DEBUG Richard Henderson
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Used by MIPS_DEBUG, when enabled.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 52eeb2b..50153a9 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -9855,6 +9855,7 @@ static void gen_andi16 (CPUMIPSState *env, DisasContext *ctx)
 static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
                                int base, int16_t offset)
 {
+    const char *opn = "ldst_multiple";
     TCGv t0, t1;
     TCGv_i32 t2;
 
@@ -9874,19 +9875,24 @@ static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
     switch (opc) {
     case LWM32:
         gen_helper_lwm(cpu_env, t0, t1, t2);
+        opn = "lwm";
         break;
     case SWM32:
         gen_helper_swm(cpu_env, t0, t1, t2);
+        opn = "swm";
         break;
 #ifdef TARGET_MIPS64
     case LDM:
         gen_helper_ldm(cpu_env, t0, t1, t2);
+        opn = "ldm";
         break;
     case SDM:
         gen_helper_sdm(cpu_env, t0, t1, t2);
+        opn = "sdm";
         break;
 #endif
     }
+    (void)opn;
     MIPS_DEBUG("%s, %x, %d(%s)", opn, reglist, offset, regnames[base]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple.
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple Richard Henderson
@ 2012-09-18 16:38   ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2012-09-18 16:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 17, 2012 at 02:35:07PM -0700, Richard Henderson wrote:
> Used by MIPS_DEBUG, when enabled.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/translate.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index 52eeb2b..50153a9 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -9855,6 +9855,7 @@ static void gen_andi16 (CPUMIPSState *env, DisasContext *ctx)
>  static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
>                                 int base, int16_t offset)
>  {
> +    const char *opn = "ldst_multiple";
>      TCGv t0, t1;
>      TCGv_i32 t2;
>  
> @@ -9874,19 +9875,24 @@ static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
>      switch (opc) {
>      case LWM32:
>          gen_helper_lwm(cpu_env, t0, t1, t2);
> +        opn = "lwm";
>          break;
>      case SWM32:
>          gen_helper_swm(cpu_env, t0, t1, t2);
> +        opn = "swm";
>          break;
>  #ifdef TARGET_MIPS64
>      case LDM:
>          gen_helper_ldm(cpu_env, t0, t1, t2);
> +        opn = "ldm";
>          break;
>      case SDM:
>          gen_helper_sdm(cpu_env, t0, t1, t2);
> +        opn = "sdm";
>          break;
>  #endif
>      }
> +    (void)opn;
>      MIPS_DEBUG("%s, %x, %d(%s)", opn, reglist, offset, regnames[base]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> -- 
> 1.7.11.4
> 

Looks fine to me.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 2/7] target-mips: Fix MIPS_DEBUG.
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-18 16:38   ` Aurelien Jarno
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 3/7] target-mips: Always evaluate debugging macro arguments Richard Henderson
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

The macro uses the DisasContext.  Pass it around as needed.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 74 +++++++++++++++++++++++++------------------------
 1 file changed, 38 insertions(+), 36 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 50153a9..f93b444 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1431,7 +1431,8 @@ static void gen_arith_imm (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
 }
 
 /* Logic with immediate operand */
-static void gen_logic_imm (CPUMIPSState *env, uint32_t opc, int rt, int rs, int16_t imm)
+static void gen_logic_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
+                          int rt, int rs, int16_t imm)
 {
     target_ulong uimm;
     const char *opn = "imm logic";
@@ -1474,7 +1475,8 @@ static void gen_logic_imm (CPUMIPSState *env, uint32_t opc, int rt, int rs, int1
 }
 
 /* Set on less than with immediate operand */
-static void gen_slt_imm (CPUMIPSState *env, uint32_t opc, int rt, int rs, int16_t imm)
+static void gen_slt_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
+                        int rt, int rs, int16_t imm)
 {
     target_ulong uimm = (target_long)imm; /* Sign extend to 32/64 bits */
     const char *opn = "imm arith";
@@ -1775,7 +1777,8 @@ static void gen_arith (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
 }
 
 /* Conditional move */
-static void gen_cond_move (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
+static void gen_cond_move(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
+                          int rd, int rs, int rt)
 {
     const char *opn = "cond move";
     int l1;
@@ -1813,7 +1816,8 @@ static void gen_cond_move (CPUMIPSState *env, uint32_t opc, int rd, int rs, int
 }
 
 /* Logic */
-static void gen_logic (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
+static void gen_logic(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
+                      int rd, int rs, int rt)
 {
     const char *opn = "logic";
 
@@ -1874,7 +1878,8 @@ static void gen_logic (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
 }
 
 /* Set on lower than */
-static void gen_slt (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
+static void gen_slt(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
+                    int rd, int rs, int rt)
 {
     const char *opn = "slt";
     TCGv t0, t1;
@@ -8778,10 +8783,10 @@ static int decode_extended_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
         gen_arith_imm(env, ctx, OPC_ADDIU, rx, rx, imm);
         break;
     case M16_OPC_SLTI:
-        gen_slt_imm(env, OPC_SLTI, 24, rx, imm);
+        gen_slt_imm(env, ctx, OPC_SLTI, 24, rx, imm);
         break;
     case M16_OPC_SLTIU:
-        gen_slt_imm(env, OPC_SLTIU, 24, rx, imm);
+        gen_slt_imm(env, ctx, OPC_SLTIU, 24, rx, imm);
         break;
     case M16_OPC_I8:
         switch (funct) {
@@ -8992,15 +8997,13 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
     case M16_OPC_SLTI:
         {
             int16_t imm = (uint8_t) ctx->opcode;
-
-            gen_slt_imm(env, OPC_SLTI, 24, rx, imm);
+            gen_slt_imm(env, ctx, OPC_SLTI, 24, rx, imm);
         }
         break;
     case M16_OPC_SLTIU:
         {
             int16_t imm = (uint8_t) ctx->opcode;
-
-            gen_slt_imm(env, OPC_SLTIU, 24, rx, imm);
+            gen_slt_imm(env, ctx, OPC_SLTIU, 24, rx, imm);
         }
         break;
     case M16_OPC_I8:
@@ -9075,8 +9078,7 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
     case M16_OPC_CMPI:
         {
             int16_t imm = (uint8_t) ctx->opcode;
-
-            gen_logic_imm(env, OPC_XORI, 24, rx, imm);
+            gen_logic_imm(env, ctx, OPC_XORI, 24, rx, imm);
         }
         break;
 #if defined(TARGET_MIPS64)
@@ -9188,10 +9190,10 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
             }
             break;
         case RR_SLT:
-            gen_slt(env, OPC_SLT, 24, rx, ry);
+            gen_slt(env, ctx, OPC_SLT, 24, rx, ry);
             break;
         case RR_SLTU:
-            gen_slt(env, OPC_SLTU, 24, rx, ry);
+            gen_slt(env, ctx, OPC_SLTU, 24, rx, ry);
             break;
         case RR_BREAK:
             generate_exception(ctx, EXCP_BREAK);
@@ -9212,22 +9214,22 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
             break;
 #endif
         case RR_CMP:
-            gen_logic(env, OPC_XOR, 24, rx, ry);
+            gen_logic(env, ctx, OPC_XOR, 24, rx, ry);
             break;
         case RR_NEG:
             gen_arith(env, ctx, OPC_SUBU, rx, 0, ry);
             break;
         case RR_AND:
-            gen_logic(env, OPC_AND, rx, rx, ry);
+            gen_logic(env, ctx, OPC_AND, rx, rx, ry);
             break;
         case RR_OR:
-            gen_logic(env, OPC_OR, rx, rx, ry);
+            gen_logic(env, ctx, OPC_OR, rx, rx, ry);
             break;
         case RR_XOR:
-            gen_logic(env, OPC_XOR, rx, rx, ry);
+            gen_logic(env, ctx, OPC_XOR, rx, rx, ry);
             break;
         case RR_NOT:
-            gen_logic(env, OPC_NOR, rx, ry, 0);
+            gen_logic(env, ctx, OPC_NOR, rx, ry, 0);
             break;
         case RR_MFHI:
             gen_HILO(ctx, OPC_MFHI, rx);
@@ -9849,7 +9851,7 @@ static void gen_andi16 (CPUMIPSState *env, DisasContext *ctx)
     int rs = mmreg(uMIPS_RS(ctx->opcode));
     int encoded = ZIMM(ctx->opcode, 0, 4);
 
-    gen_logic_imm(env, OPC_ANDI, rd, rs, decoded_imm[encoded]);
+    gen_logic_imm(env, ctx, OPC_ANDI, rd, rs, decoded_imm[encoded]);
 }
 
 static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
@@ -9911,25 +9913,25 @@ static void gen_pool16c_insn (CPUMIPSState *env, DisasContext *ctx, int *is_bran
     case NOT16 + 1:
     case NOT16 + 2:
     case NOT16 + 3:
-        gen_logic(env, OPC_NOR, rd, rs, 0);
+        gen_logic(env, ctx, OPC_NOR, rd, rs, 0);
         break;
     case XOR16 + 0:
     case XOR16 + 1:
     case XOR16 + 2:
     case XOR16 + 3:
-        gen_logic(env, OPC_XOR, rd, rd, rs);
+        gen_logic(env, ctx, OPC_XOR, rd, rd, rs);
         break;
     case AND16 + 0:
     case AND16 + 1:
     case AND16 + 2:
     case AND16 + 3:
-        gen_logic(env, OPC_AND, rd, rd, rs);
+        gen_logic(env, ctx, OPC_AND, rd, rd, rs);
         break;
     case OR16 + 0:
     case OR16 + 1:
     case OR16 + 2:
     case OR16 + 3:
-        gen_logic(env, OPC_OR, rd, rd, rs);
+        gen_logic(env, ctx, OPC_OR, rd, rd, rs);
         break;
     case LWM16 + 0:
     case LWM16 + 1:
@@ -10743,7 +10745,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
             case XOR32:
                 mips32_op = OPC_XOR;
             do_logic:
-                gen_logic(env, mips32_op, rd, rs, rt);
+                gen_logic(env, ctx, mips32_op, rd, rs, rt);
                 break;
                 /* Set less than */
             case SLT:
@@ -10752,7 +10754,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
             case SLTU:
                 mips32_op = OPC_SLTU;
             do_slt:
-                gen_slt(env, mips32_op, rd, rs, rt);
+                gen_slt(env, ctx, mips32_op, rd, rs, rt);
                 break;
             default:
                 goto pool32a_invalid;
@@ -10768,7 +10770,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
             case MOVZ:
                 mips32_op = OPC_MOVZ;
             do_cmov:
-                gen_cond_move(env, mips32_op, rd, rs, rt);
+                gen_cond_move(env, ctx, mips32_op, rd, rs, rt);
                 break;
             case LWXS:
                 gen_ldxs(ctx, rs, rt, rd);
@@ -11181,7 +11183,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
                target. */
             break;
         case LUI:
-            gen_logic_imm(env, OPC_LUI, rs, -1, imm);
+            gen_logic_imm(env, ctx, OPC_LUI, rs, -1, imm);
             break;
         case SYNCI:
             break;
@@ -11300,7 +11302,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
     case ANDI32:
         mips32_op = OPC_ANDI;
     do_logici:
-        gen_logic_imm(env, mips32_op, rt, rs, imm);
+        gen_logic_imm(env, ctx, mips32_op, rt, rs, imm);
         break;
 
         /* Set less than immediate */
@@ -11310,7 +11312,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
     case SLTIU32:
         mips32_op = OPC_SLTIU;
     do_slti:
-        gen_slt_imm(env, mips32_op, rt, rs, imm);
+        gen_slt_imm(env, ctx, mips32_op, rt, rs, imm);
         break;
     case JALX32:
         offset = (int32_t)(ctx->opcode & 0x3FFFFFF) << 2;
@@ -11787,7 +11789,7 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
         case OPC_MOVZ:
             check_insn(env, ctx, ISA_MIPS4 | ISA_MIPS32 |
                                  INSN_LOONGSON2E | INSN_LOONGSON2F);
-            gen_cond_move(env, op1, rd, rs, rt);
+            gen_cond_move(env, ctx, op1, rd, rs, rt);
             break;
         case OPC_ADD ... OPC_SUBU:
             gen_arith(env, ctx, op1, rd, rs, rt);
@@ -11814,13 +11816,13 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
             break;
         case OPC_SLT:          /* Set on less than */
         case OPC_SLTU:
-            gen_slt(env, op1, rd, rs, rt);
+            gen_slt(env, ctx, op1, rd, rs, rt);
             break;
         case OPC_AND:          /* Logic*/
         case OPC_OR:
         case OPC_NOR:
         case OPC_XOR:
-            gen_logic(env, op1, rd, rs, rt);
+            gen_logic(env, ctx, op1, rd, rs, rt);
             break;
         case OPC_MULT ... OPC_DIVU:
             if (sa) {
@@ -12221,13 +12223,13 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
          break;
     case OPC_SLTI: /* Set on less than with immediate opcode */
     case OPC_SLTIU:
-         gen_slt_imm(env, op, rt, rs, imm);
+         gen_slt_imm(env, ctx, op, rt, rs, imm);
          break;
     case OPC_ANDI: /* Arithmetic with immediate opcode */
     case OPC_LUI:
     case OPC_ORI:
     case OPC_XORI:
-         gen_logic_imm(env, op, rt, rs, imm);
+         gen_logic_imm(env, ctx, op, rt, rs, imm);
          break;
     case OPC_J ... OPC_JAL: /* Jump */
          offset = (int32_t)(ctx->opcode & 0x3FFFFFF) << 2;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] target-mips: Fix MIPS_DEBUG.
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 2/7] target-mips: Fix MIPS_DEBUG Richard Henderson
@ 2012-09-18 16:38   ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2012-09-18 16:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 17, 2012 at 02:35:08PM -0700, Richard Henderson wrote:
> The macro uses the DisasContext.  Pass it around as needed.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/translate.c | 74 +++++++++++++++++++++++++------------------------
>  1 file changed, 38 insertions(+), 36 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index 50153a9..f93b444 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -1431,7 +1431,8 @@ static void gen_arith_imm (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>  }
>  
>  /* Logic with immediate operand */
> -static void gen_logic_imm (CPUMIPSState *env, uint32_t opc, int rt, int rs, int16_t imm)
> +static void gen_logic_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
> +                          int rt, int rs, int16_t imm)
>  {
>      target_ulong uimm;
>      const char *opn = "imm logic";
> @@ -1474,7 +1475,8 @@ static void gen_logic_imm (CPUMIPSState *env, uint32_t opc, int rt, int rs, int1
>  }
>  
>  /* Set on less than with immediate operand */
> -static void gen_slt_imm (CPUMIPSState *env, uint32_t opc, int rt, int rs, int16_t imm)
> +static void gen_slt_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
> +                        int rt, int rs, int16_t imm)
>  {
>      target_ulong uimm = (target_long)imm; /* Sign extend to 32/64 bits */
>      const char *opn = "imm arith";
> @@ -1775,7 +1777,8 @@ static void gen_arith (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>  }
>  
>  /* Conditional move */
> -static void gen_cond_move (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
> +static void gen_cond_move(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
> +                          int rd, int rs, int rt)
>  {
>      const char *opn = "cond move";
>      int l1;
> @@ -1813,7 +1816,8 @@ static void gen_cond_move (CPUMIPSState *env, uint32_t opc, int rd, int rs, int
>  }
>  
>  /* Logic */
> -static void gen_logic (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
> +static void gen_logic(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
> +                      int rd, int rs, int rt)
>  {
>      const char *opn = "logic";
>  
> @@ -1874,7 +1878,8 @@ static void gen_logic (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
>  }
>  
>  /* Set on lower than */
> -static void gen_slt (CPUMIPSState *env, uint32_t opc, int rd, int rs, int rt)
> +static void gen_slt(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
> +                    int rd, int rs, int rt)
>  {
>      const char *opn = "slt";
>      TCGv t0, t1;
> @@ -8778,10 +8783,10 @@ static int decode_extended_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
>          gen_arith_imm(env, ctx, OPC_ADDIU, rx, rx, imm);
>          break;
>      case M16_OPC_SLTI:
> -        gen_slt_imm(env, OPC_SLTI, 24, rx, imm);
> +        gen_slt_imm(env, ctx, OPC_SLTI, 24, rx, imm);
>          break;
>      case M16_OPC_SLTIU:
> -        gen_slt_imm(env, OPC_SLTIU, 24, rx, imm);
> +        gen_slt_imm(env, ctx, OPC_SLTIU, 24, rx, imm);
>          break;
>      case M16_OPC_I8:
>          switch (funct) {
> @@ -8992,15 +8997,13 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
>      case M16_OPC_SLTI:
>          {
>              int16_t imm = (uint8_t) ctx->opcode;
> -
> -            gen_slt_imm(env, OPC_SLTI, 24, rx, imm);
> +            gen_slt_imm(env, ctx, OPC_SLTI, 24, rx, imm);
>          }
>          break;
>      case M16_OPC_SLTIU:
>          {
>              int16_t imm = (uint8_t) ctx->opcode;
> -
> -            gen_slt_imm(env, OPC_SLTIU, 24, rx, imm);
> +            gen_slt_imm(env, ctx, OPC_SLTIU, 24, rx, imm);
>          }
>          break;
>      case M16_OPC_I8:
> @@ -9075,8 +9078,7 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
>      case M16_OPC_CMPI:
>          {
>              int16_t imm = (uint8_t) ctx->opcode;
> -
> -            gen_logic_imm(env, OPC_XORI, 24, rx, imm);
> +            gen_logic_imm(env, ctx, OPC_XORI, 24, rx, imm);
>          }
>          break;
>  #if defined(TARGET_MIPS64)
> @@ -9188,10 +9190,10 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
>              }
>              break;
>          case RR_SLT:
> -            gen_slt(env, OPC_SLT, 24, rx, ry);
> +            gen_slt(env, ctx, OPC_SLT, 24, rx, ry);
>              break;
>          case RR_SLTU:
> -            gen_slt(env, OPC_SLTU, 24, rx, ry);
> +            gen_slt(env, ctx, OPC_SLTU, 24, rx, ry);
>              break;
>          case RR_BREAK:
>              generate_exception(ctx, EXCP_BREAK);
> @@ -9212,22 +9214,22 @@ static int decode_mips16_opc (CPUMIPSState *env, DisasContext *ctx,
>              break;
>  #endif
>          case RR_CMP:
> -            gen_logic(env, OPC_XOR, 24, rx, ry);
> +            gen_logic(env, ctx, OPC_XOR, 24, rx, ry);
>              break;
>          case RR_NEG:
>              gen_arith(env, ctx, OPC_SUBU, rx, 0, ry);
>              break;
>          case RR_AND:
> -            gen_logic(env, OPC_AND, rx, rx, ry);
> +            gen_logic(env, ctx, OPC_AND, rx, rx, ry);
>              break;
>          case RR_OR:
> -            gen_logic(env, OPC_OR, rx, rx, ry);
> +            gen_logic(env, ctx, OPC_OR, rx, rx, ry);
>              break;
>          case RR_XOR:
> -            gen_logic(env, OPC_XOR, rx, rx, ry);
> +            gen_logic(env, ctx, OPC_XOR, rx, rx, ry);
>              break;
>          case RR_NOT:
> -            gen_logic(env, OPC_NOR, rx, ry, 0);
> +            gen_logic(env, ctx, OPC_NOR, rx, ry, 0);
>              break;
>          case RR_MFHI:
>              gen_HILO(ctx, OPC_MFHI, rx);
> @@ -9849,7 +9851,7 @@ static void gen_andi16 (CPUMIPSState *env, DisasContext *ctx)
>      int rs = mmreg(uMIPS_RS(ctx->opcode));
>      int encoded = ZIMM(ctx->opcode, 0, 4);
>  
> -    gen_logic_imm(env, OPC_ANDI, rd, rs, decoded_imm[encoded]);
> +    gen_logic_imm(env, ctx, OPC_ANDI, rd, rs, decoded_imm[encoded]);
>  }
>  
>  static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
> @@ -9911,25 +9913,25 @@ static void gen_pool16c_insn (CPUMIPSState *env, DisasContext *ctx, int *is_bran
>      case NOT16 + 1:
>      case NOT16 + 2:
>      case NOT16 + 3:
> -        gen_logic(env, OPC_NOR, rd, rs, 0);
> +        gen_logic(env, ctx, OPC_NOR, rd, rs, 0);
>          break;
>      case XOR16 + 0:
>      case XOR16 + 1:
>      case XOR16 + 2:
>      case XOR16 + 3:
> -        gen_logic(env, OPC_XOR, rd, rd, rs);
> +        gen_logic(env, ctx, OPC_XOR, rd, rd, rs);
>          break;
>      case AND16 + 0:
>      case AND16 + 1:
>      case AND16 + 2:
>      case AND16 + 3:
> -        gen_logic(env, OPC_AND, rd, rd, rs);
> +        gen_logic(env, ctx, OPC_AND, rd, rd, rs);
>          break;
>      case OR16 + 0:
>      case OR16 + 1:
>      case OR16 + 2:
>      case OR16 + 3:
> -        gen_logic(env, OPC_OR, rd, rd, rs);
> +        gen_logic(env, ctx, OPC_OR, rd, rd, rs);
>          break;
>      case LWM16 + 0:
>      case LWM16 + 1:
> @@ -10743,7 +10745,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>              case XOR32:
>                  mips32_op = OPC_XOR;
>              do_logic:
> -                gen_logic(env, mips32_op, rd, rs, rt);
> +                gen_logic(env, ctx, mips32_op, rd, rs, rt);
>                  break;
>                  /* Set less than */
>              case SLT:
> @@ -10752,7 +10754,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>              case SLTU:
>                  mips32_op = OPC_SLTU;
>              do_slt:
> -                gen_slt(env, mips32_op, rd, rs, rt);
> +                gen_slt(env, ctx, mips32_op, rd, rs, rt);
>                  break;
>              default:
>                  goto pool32a_invalid;
> @@ -10768,7 +10770,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>              case MOVZ:
>                  mips32_op = OPC_MOVZ;
>              do_cmov:
> -                gen_cond_move(env, mips32_op, rd, rs, rt);
> +                gen_cond_move(env, ctx, mips32_op, rd, rs, rt);
>                  break;
>              case LWXS:
>                  gen_ldxs(ctx, rs, rt, rd);
> @@ -11181,7 +11183,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>                 target. */
>              break;
>          case LUI:
> -            gen_logic_imm(env, OPC_LUI, rs, -1, imm);
> +            gen_logic_imm(env, ctx, OPC_LUI, rs, -1, imm);
>              break;
>          case SYNCI:
>              break;
> @@ -11300,7 +11302,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>      case ANDI32:
>          mips32_op = OPC_ANDI;
>      do_logici:
> -        gen_logic_imm(env, mips32_op, rt, rs, imm);
> +        gen_logic_imm(env, ctx, mips32_op, rt, rs, imm);
>          break;
>  
>          /* Set less than immediate */
> @@ -11310,7 +11312,7 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>      case SLTIU32:
>          mips32_op = OPC_SLTIU;
>      do_slti:
> -        gen_slt_imm(env, mips32_op, rt, rs, imm);
> +        gen_slt_imm(env, ctx, mips32_op, rt, rs, imm);
>          break;
>      case JALX32:
>          offset = (int32_t)(ctx->opcode & 0x3FFFFFF) << 2;
> @@ -11787,7 +11789,7 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
>          case OPC_MOVZ:
>              check_insn(env, ctx, ISA_MIPS4 | ISA_MIPS32 |
>                                   INSN_LOONGSON2E | INSN_LOONGSON2F);
> -            gen_cond_move(env, op1, rd, rs, rt);
> +            gen_cond_move(env, ctx, op1, rd, rs, rt);
>              break;
>          case OPC_ADD ... OPC_SUBU:
>              gen_arith(env, ctx, op1, rd, rs, rt);
> @@ -11814,13 +11816,13 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
>              break;
>          case OPC_SLT:          /* Set on less than */
>          case OPC_SLTU:
> -            gen_slt(env, op1, rd, rs, rt);
> +            gen_slt(env, ctx, op1, rd, rs, rt);
>              break;
>          case OPC_AND:          /* Logic*/
>          case OPC_OR:
>          case OPC_NOR:
>          case OPC_XOR:
> -            gen_logic(env, op1, rd, rs, rt);
> +            gen_logic(env, ctx, op1, rd, rs, rt);
>              break;
>          case OPC_MULT ... OPC_DIVU:
>              if (sa) {
> @@ -12221,13 +12223,13 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
>           break;
>      case OPC_SLTI: /* Set on less than with immediate opcode */
>      case OPC_SLTIU:
> -         gen_slt_imm(env, op, rt, rs, imm);
> +         gen_slt_imm(env, ctx, op, rt, rs, imm);
>           break;
>      case OPC_ANDI: /* Arithmetic with immediate opcode */
>      case OPC_LUI:
>      case OPC_ORI:
>      case OPC_XORI:
> -         gen_logic_imm(env, op, rt, rs, imm);
> +         gen_logic_imm(env, ctx, op, rt, rs, imm);
>           break;
>      case OPC_J ... OPC_JAL: /* Jump */
>           offset = (int32_t)(ctx->opcode & 0x3FFFFFF) << 2;
> -- 
> 1.7.11.4
> 

Looks fine to me.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 3/7] target-mips: Always evaluate debugging macro arguments
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple Richard Henderson
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 2/7] target-mips: Fix MIPS_DEBUG Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-18 16:38   ` Aurelien Jarno
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 4/7] target-mips: Pass DisasContext to fpr32 load/store routines Richard Henderson
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Done inside an if (0) so that it is easily eliminated as dead code.
But this will prevent some of the compilation errors with debugging
enabled from creeping back in.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 48 +++++++++++-------------------------------------
 1 file changed, 11 insertions(+), 37 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index f93b444..775c3a1 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -567,14 +567,18 @@ static const char *fregnames[] =
       "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", };
 
 #ifdef MIPS_DEBUG_DISAS
-#define MIPS_DEBUG(fmt, ...)                         \
-        qemu_log_mask(CPU_LOG_TB_IN_ASM,                \
-                       TARGET_FMT_lx ": %08x " fmt "\n", \
-                       ctx->pc, ctx->opcode , ## __VA_ARGS__)
+#define MIPS_DEBUG(fmt, ...)                                                  \
+    qemu_log_mask(CPU_LOG_TB_IN_ASM,                                          \
+                  TARGET_FMT_lx ": %08x " fmt "\n",                           \
+                  ctx->pc, ctx->opcode , ## __VA_ARGS__)
 #define LOG_DISAS(...) qemu_log_mask(CPU_LOG_TB_IN_ASM, ## __VA_ARGS__)
 #else
-#define MIPS_DEBUG(fmt, ...) do { } while(0)
-#define LOG_DISAS(...) do { } while (0)
+#define MIPS_DEBUG(fmt, ...)                                                  \
+    do { if (0) {                                                             \
+        qemu_log_mask(0, "%x" fmt, ctx->opcode, ## __VA_ARGS__);              \
+    } } while(0)
+#define LOG_DISAS(...) \
+    do { if (0) { qemu_log_mask(0, ## __VA_ARGS__); } } while (0)
 #endif
 
 #define MIPS_INVAL(op)                                                        \
@@ -1163,7 +1167,6 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "ll";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
@@ -1223,7 +1226,6 @@ static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
         opn = "swr";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
@@ -1259,7 +1261,6 @@ static void gen_st_cond (DisasContext *ctx, uint32_t opc, int rt,
         opn = "sc";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
     tcg_temp_free(t1);
     tcg_temp_free(t0);
@@ -1325,7 +1326,6 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
         generate_exception(ctx, EXCP_RI);
         goto out;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %d(%s)", opn, fregnames[ft], offset, regnames[base]);
  out:
     tcg_temp_free(t0);
@@ -1426,7 +1426,6 @@ static void gen_arith_imm (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         break;
 #endif
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
 }
 
@@ -1470,7 +1469,6 @@ static void gen_logic_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "lui";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
 }
 
@@ -1499,7 +1497,6 @@ static void gen_slt_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "sltiu";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
     tcg_temp_free(t0);
 }
@@ -1591,7 +1588,6 @@ static void gen_shift_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         break;
 #endif
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
     tcg_temp_free(t0);
 }
@@ -1772,7 +1768,6 @@ static void gen_arith (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "mul";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
 }
 
@@ -1811,7 +1806,6 @@ static void gen_cond_move(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         tcg_gen_movi_tl(cpu_gpr[rd], 0);
     gen_set_label(l1);
 
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
 }
 
@@ -1873,7 +1867,6 @@ static void gen_logic(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "xor";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
 }
 
@@ -1904,7 +1897,6 @@ static void gen_slt(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         opn = "sltu";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
@@ -1985,7 +1977,6 @@ static void gen_shift (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
         break;
 #endif
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
@@ -2025,7 +2016,6 @@ static void gen_HILO (DisasContext *ctx, uint32_t opc, int reg)
         opn = "mtlo";
         break;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s", opn, regnames[reg]);
 }
 
@@ -2258,7 +2248,6 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
         generate_exception(ctx, EXCP_RI);
         goto out;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s %s", opn, regnames[rs], regnames[rt]);
  out:
     tcg_temp_free(t0);
@@ -2338,7 +2327,6 @@ static void gen_mul_vr54xx (DisasContext *ctx, uint32_t opc,
         goto out;
     }
     gen_store_gpr(t0, rd);
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
 
  out:
@@ -2379,7 +2367,6 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
         break;
 #endif
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s", opn, regnames[rd], regnames[rs]);
     tcg_temp_free(t0);
 }
@@ -2593,8 +2580,7 @@ static void gen_loongson_integer (DisasContext *ctx, uint32_t opc,
 #endif
     }
 
-    (void)opn; /* avoid a compiler warning */
-    MIPS_DEBUG("%s %s, %s", opn, regnames[rd], regnames[rs]);
+    MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
 }
@@ -3765,7 +3751,6 @@ static void gen_mfc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg, i
     default:
        goto die;
     }
-    (void)rn; /* avoid a compiler warning */
     LOG_DISAS("mfc0 %s (reg %d sel %d)\n", rn, reg, sel);
     return;
 
@@ -4356,7 +4341,6 @@ static void gen_mtc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg, i
     default:
        goto die;
     }
-    (void)rn; /* avoid a compiler warning */
     LOG_DISAS("mtc0 %s (reg %d sel %d)\n", rn, reg, sel);
     /* For simplicity assume that all writes can cause interrupts.  */
     if (use_icount) {
@@ -4931,7 +4915,6 @@ static void gen_dmfc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg,
     default:
         goto die;
     }
-    (void)rn; /* avoid a compiler warning */
     LOG_DISAS("dmfc0 %s (reg %d sel %d)\n", rn, reg, sel);
     return;
 
@@ -5523,7 +5506,6 @@ static void gen_dmtc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg,
     default:
         goto die;
     }
-    (void)rn; /* avoid a compiler warning */
     LOG_DISAS("dmtc0 %s (reg %d sel %d)\n", rn, reg, sel);
     /* For simplicity assume that all writes can cause interrupts.  */
     if (use_icount) {
@@ -6071,7 +6053,6 @@ static void gen_cp0 (CPUMIPSState *env, DisasContext *ctx, uint32_t opc, int rt,
         generate_exception(ctx, EXCP_RI);
         return;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s %d", opn, regnames[rt], rd);
 }
 #endif /* !CONFIG_USER_ONLY */
@@ -6181,7 +6162,6 @@ static void gen_compute_branch1 (CPUMIPSState *env, DisasContext *ctx, uint32_t
         generate_exception (ctx, EXCP_RI);
         goto out;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s: cond %02x target " TARGET_FMT_lx, opn,
                ctx->hflags, btarget);
     ctx->btarget = btarget;
@@ -6411,7 +6391,6 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
         generate_exception (ctx, EXCP_RI);
         goto out;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s %s", opn, regnames[rt], fregnames[fs]);
 
  out:
@@ -7739,7 +7718,6 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         generate_exception (ctx, EXCP_RI);
         return;
     }
-    (void)opn; /* avoid a compiler warning */
     switch (optype) {
     case BINOP:
         MIPS_DEBUG("%s %s, %s, %s", opn, fregnames[fd], fregnames[fs], fregnames[ft]);
@@ -7851,7 +7829,6 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
         break;
     }
     tcg_temp_free(t0);
-    (void)opn; (void)store; /* avoid compiler warnings */
     MIPS_DEBUG("%s %s, %s(%s)", opn, fregnames[store ? fs : fd],
                regnames[index], regnames[base]);
 }
@@ -8125,7 +8102,6 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
         generate_exception (ctx, EXCP_RI);
         return;
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s %s, %s, %s, %s", opn, fregnames[fd], fregnames[fr],
                fregnames[fs], fregnames[ft]);
 }
@@ -9894,7 +9870,6 @@ static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
         break;
 #endif
     }
-    (void)opn;
     MIPS_DEBUG("%s, %x, %d(%s)", opn, reglist, offset, regnames[base]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
@@ -10119,7 +10094,6 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd,
         break;
 #endif
     }
-    (void)opn; /* avoid a compiler warning */
     MIPS_DEBUG("%s, %s, %d(%s)", opn, regnames[rd], offset, regnames[base]);
     tcg_temp_free(t0);
     tcg_temp_free(t1);
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 3/7] target-mips: Always evaluate debugging macro arguments
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 3/7] target-mips: Always evaluate debugging macro arguments Richard Henderson
@ 2012-09-18 16:38   ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2012-09-18 16:38 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 17, 2012 at 02:35:09PM -0700, Richard Henderson wrote:
> Done inside an if (0) so that it is easily eliminated as dead code.
> But this will prevent some of the compilation errors with debugging
> enabled from creeping back in.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/translate.c | 48 +++++++++++-------------------------------------
>  1 file changed, 11 insertions(+), 37 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index f93b444..775c3a1 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -567,14 +567,18 @@ static const char *fregnames[] =
>        "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", };
>  
>  #ifdef MIPS_DEBUG_DISAS
> -#define MIPS_DEBUG(fmt, ...)                         \
> -        qemu_log_mask(CPU_LOG_TB_IN_ASM,                \
> -                       TARGET_FMT_lx ": %08x " fmt "\n", \
> -                       ctx->pc, ctx->opcode , ## __VA_ARGS__)
> +#define MIPS_DEBUG(fmt, ...)                                                  \
> +    qemu_log_mask(CPU_LOG_TB_IN_ASM,                                          \
> +                  TARGET_FMT_lx ": %08x " fmt "\n",                           \
> +                  ctx->pc, ctx->opcode , ## __VA_ARGS__)
>  #define LOG_DISAS(...) qemu_log_mask(CPU_LOG_TB_IN_ASM, ## __VA_ARGS__)
>  #else
> -#define MIPS_DEBUG(fmt, ...) do { } while(0)
> -#define LOG_DISAS(...) do { } while (0)
> +#define MIPS_DEBUG(fmt, ...)                                                  \
> +    do { if (0) {                                                             \
> +        qemu_log_mask(0, "%x" fmt, ctx->opcode, ## __VA_ARGS__);              \
> +    } } while(0)
> +#define LOG_DISAS(...) \
> +    do { if (0) { qemu_log_mask(0, ## __VA_ARGS__); } } while (0)
>  #endif

Instead of having almost twice the same code, couldn't we use something
like "if (MIPS_DEBUG_DISAS)" instead of "if (0)". Of course it means
MIPS_DEBUG_DISAS has to be defined to 0/1 instead of defined/not
defined, but I don't think it's a problem.

>  #define MIPS_INVAL(op)                                                        \
> @@ -1163,7 +1167,6 @@ static void gen_ld (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          opn = "ll";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> @@ -1223,7 +1226,6 @@ static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
>          opn = "swr";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> @@ -1259,7 +1261,6 @@ static void gen_st_cond (DisasContext *ctx, uint32_t opc, int rt,
>          opn = "sc";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %d(%s)", opn, regnames[rt], offset, regnames[base]);
>      tcg_temp_free(t1);
>      tcg_temp_free(t0);
> @@ -1325,7 +1326,6 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
>          generate_exception(ctx, EXCP_RI);
>          goto out;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %d(%s)", opn, fregnames[ft], offset, regnames[base]);
>   out:
>      tcg_temp_free(t0);
> @@ -1426,7 +1426,6 @@ static void gen_arith_imm (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          break;
>  #endif
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
>  }
>  
> @@ -1470,7 +1469,6 @@ static void gen_logic_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          opn = "lui";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
>  }
>  
> @@ -1499,7 +1497,6 @@ static void gen_slt_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          opn = "sltiu";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
>      tcg_temp_free(t0);
>  }
> @@ -1591,7 +1588,6 @@ static void gen_shift_imm(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          break;
>  #endif
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, " TARGET_FMT_lx, opn, regnames[rt], regnames[rs], uimm);
>      tcg_temp_free(t0);
>  }
> @@ -1772,7 +1768,6 @@ static void gen_arith (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          opn = "mul";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>  }
>  
> @@ -1811,7 +1806,6 @@ static void gen_cond_move(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          tcg_gen_movi_tl(cpu_gpr[rd], 0);
>      gen_set_label(l1);
>  
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>  }
>  
> @@ -1873,7 +1867,6 @@ static void gen_logic(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          opn = "xor";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>  }
>  
> @@ -1904,7 +1897,6 @@ static void gen_slt(CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          opn = "sltu";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> @@ -1985,7 +1977,6 @@ static void gen_shift (CPUMIPSState *env, DisasContext *ctx, uint32_t opc,
>          break;
>  #endif
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> @@ -2025,7 +2016,6 @@ static void gen_HILO (DisasContext *ctx, uint32_t opc, int reg)
>          opn = "mtlo";
>          break;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s", opn, regnames[reg]);
>  }
>  
> @@ -2258,7 +2248,6 @@ static void gen_muldiv (DisasContext *ctx, uint32_t opc,
>          generate_exception(ctx, EXCP_RI);
>          goto out;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s %s", opn, regnames[rs], regnames[rt]);
>   out:
>      tcg_temp_free(t0);
> @@ -2338,7 +2327,6 @@ static void gen_mul_vr54xx (DisasContext *ctx, uint32_t opc,
>          goto out;
>      }
>      gen_store_gpr(t0, rd);
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>  
>   out:
> @@ -2379,7 +2367,6 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
>          break;
>  #endif
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s", opn, regnames[rd], regnames[rs]);
>      tcg_temp_free(t0);
>  }
> @@ -2593,8 +2580,7 @@ static void gen_loongson_integer (DisasContext *ctx, uint32_t opc,
>  #endif
>      }
>  
> -    (void)opn; /* avoid a compiler warning */
> -    MIPS_DEBUG("%s %s, %s", opn, regnames[rd], regnames[rs]);
> +    MIPS_DEBUG("%s %s, %s, %s", opn, regnames[rd], regnames[rs], regnames[rt]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
>  }
> @@ -3765,7 +3751,6 @@ static void gen_mfc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg, i
>      default:
>         goto die;
>      }
> -    (void)rn; /* avoid a compiler warning */
>      LOG_DISAS("mfc0 %s (reg %d sel %d)\n", rn, reg, sel);
>      return;
>  
> @@ -4356,7 +4341,6 @@ static void gen_mtc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg, i
>      default:
>         goto die;
>      }
> -    (void)rn; /* avoid a compiler warning */
>      LOG_DISAS("mtc0 %s (reg %d sel %d)\n", rn, reg, sel);
>      /* For simplicity assume that all writes can cause interrupts.  */
>      if (use_icount) {
> @@ -4931,7 +4915,6 @@ static void gen_dmfc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg,
>      default:
>          goto die;
>      }
> -    (void)rn; /* avoid a compiler warning */
>      LOG_DISAS("dmfc0 %s (reg %d sel %d)\n", rn, reg, sel);
>      return;
>  
> @@ -5523,7 +5506,6 @@ static void gen_dmtc0 (CPUMIPSState *env, DisasContext *ctx, TCGv arg, int reg,
>      default:
>          goto die;
>      }
> -    (void)rn; /* avoid a compiler warning */
>      LOG_DISAS("dmtc0 %s (reg %d sel %d)\n", rn, reg, sel);
>      /* For simplicity assume that all writes can cause interrupts.  */
>      if (use_icount) {
> @@ -6071,7 +6053,6 @@ static void gen_cp0 (CPUMIPSState *env, DisasContext *ctx, uint32_t opc, int rt,
>          generate_exception(ctx, EXCP_RI);
>          return;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s %d", opn, regnames[rt], rd);
>  }
>  #endif /* !CONFIG_USER_ONLY */
> @@ -6181,7 +6162,6 @@ static void gen_compute_branch1 (CPUMIPSState *env, DisasContext *ctx, uint32_t
>          generate_exception (ctx, EXCP_RI);
>          goto out;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s: cond %02x target " TARGET_FMT_lx, opn,
>                 ctx->hflags, btarget);
>      ctx->btarget = btarget;
> @@ -6411,7 +6391,6 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
>          generate_exception (ctx, EXCP_RI);
>          goto out;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s %s", opn, regnames[rt], fregnames[fs]);
>  
>   out:
> @@ -7739,7 +7718,6 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          generate_exception (ctx, EXCP_RI);
>          return;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      switch (optype) {
>      case BINOP:
>          MIPS_DEBUG("%s %s, %s, %s", opn, fregnames[fd], fregnames[fs], fregnames[ft]);
> @@ -7851,7 +7829,6 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
>          break;
>      }
>      tcg_temp_free(t0);
> -    (void)opn; (void)store; /* avoid compiler warnings */
>      MIPS_DEBUG("%s %s, %s(%s)", opn, fregnames[store ? fs : fd],
>                 regnames[index], regnames[base]);
>  }
> @@ -8125,7 +8102,6 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
>          generate_exception (ctx, EXCP_RI);
>          return;
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s %s, %s, %s, %s", opn, fregnames[fd], fregnames[fr],
>                 fregnames[fs], fregnames[ft]);
>  }
> @@ -9894,7 +9870,6 @@ static void gen_ldst_multiple (DisasContext *ctx, uint32_t opc, int reglist,
>          break;
>  #endif
>      }
> -    (void)opn;
>      MIPS_DEBUG("%s, %x, %d(%s)", opn, reglist, offset, regnames[base]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> @@ -10119,7 +10094,6 @@ static void gen_ldst_pair (DisasContext *ctx, uint32_t opc, int rd,
>          break;
>  #endif
>      }
> -    (void)opn; /* avoid a compiler warning */
>      MIPS_DEBUG("%s, %s, %d(%s)", opn, regnames[rd], offset, regnames[base]);
>      tcg_temp_free(t0);
>      tcg_temp_free(t1);
> -- 
> 1.7.11.4
> 

The remaining looks fine and is a nice cleanup.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 4/7] target-mips: Pass DisasContext to fpr32 load/store routines
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
                   ` (2 preceding siblings ...)
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 3/7] target-mips: Always evaluate debugging macro arguments Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-18 16:39   ` Aurelien Jarno
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 5/7] target-mips: Use TCG registers for the FPU Richard Henderson
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

The large mechanical change in support of a follow-on patch
that changes the representation of the fp registers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 312 ++++++++++++++++++++++++------------------------
 1 file changed, 153 insertions(+), 159 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 775c3a1..b4301e9 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -662,42 +662,42 @@ static inline void gen_store_srsgpr (int from, int to)
 }
 
 /* Floating point register moves. */
-static inline void gen_load_fpr32 (TCGv_i32 t, int reg)
+static inline void gen_load_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
 {
     tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
 }
 
-static inline void gen_store_fpr32 (TCGv_i32 t, int reg)
+static inline void gen_store_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
 {
     tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
 }
 
-static inline void gen_load_fpr32h (TCGv_i32 t, int reg)
+static inline void gen_load_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
 {
     tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
 }
 
-static inline void gen_store_fpr32h (TCGv_i32 t, int reg)
+static inline void gen_store_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
 {
     tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
 }
 
-static inline void gen_load_fpr64 (DisasContext *ctx, TCGv_i64 t, int reg)
+static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
     if (ctx->hflags & MIPS_HFLAG_F64) {
         tcg_gen_ld_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
     } else {
         TCGv_i32 t0 = tcg_temp_new_i32();
         TCGv_i32 t1 = tcg_temp_new_i32();
-        gen_load_fpr32(t0, reg & ~1);
-        gen_load_fpr32(t1, reg | 1);
+        gen_load_fpr32(ctx, t0, reg & ~1);
+        gen_load_fpr32(ctx, t1, reg | 1);
         tcg_gen_concat_i32_i64(t, t0, t1);
         tcg_temp_free_i32(t0);
         tcg_temp_free_i32(t1);
     }
 }
 
-static inline void gen_store_fpr64 (DisasContext *ctx, TCGv_i64 t, int reg)
+static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
     if (ctx->hflags & MIPS_HFLAG_F64) {
         tcg_gen_st_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
@@ -705,10 +705,10 @@ static inline void gen_store_fpr64 (DisasContext *ctx, TCGv_i64 t, int reg)
         TCGv_i64 t0 = tcg_temp_new_i64();
         TCGv_i32 t1 = tcg_temp_new_i32();
         tcg_gen_trunc_i64_i32(t1, t);
-        gen_store_fpr32(t1, reg & ~1);
+        gen_store_fpr32(ctx, t1, reg & ~1);
         tcg_gen_shri_i64(t0, t, 32);
         tcg_gen_trunc_i64_i32(t1, t0);
-        gen_store_fpr32(t1, reg | 1);
+        gen_store_fpr32(ctx, t1, reg | 1);
         tcg_temp_free_i32(t1);
         tcg_temp_free_i64(t0);
     }
@@ -862,12 +862,6 @@ static inline void check_mips_64(DisasContext *ctx)
         generate_exception(ctx, EXCP_RI);
 }
 
-/* Define small wrappers for gen_load_fpr* so that we have a uniform
-   calling interface for 32 and 64-bit FPRs.  No sense in changing
-   all callers for gen_load_fpr32 when we need the CTX parameter for
-   this one use.  */
-#define gen_ldcmp_fpr32(ctx, x, y) gen_load_fpr32(x, y)
-#define gen_ldcmp_fpr64(ctx, x, y) gen_load_fpr64(ctx, x, y)
 #define FOP_CONDS(type, abs, fmt, ifmt, bits)                                 \
 static inline void gen_cmp ## type ## _ ## fmt(DisasContext *ctx, int n,      \
                                                int ft, int fs, int cc)        \
@@ -890,8 +884,8 @@ static inline void gen_cmp ## type ## _ ## fmt(DisasContext *ctx, int n,      \
         }                                                                     \
         break;                                                                \
     }                                                                         \
-    gen_ldcmp_fpr##bits (ctx, fp0, fs);                                       \
-    gen_ldcmp_fpr##bits (ctx, fp1, ft);                                       \
+    gen_load_fpr##bits(ctx, fp0, fs);                                         \
+    gen_load_fpr##bits(ctx, fp1, ft);                                         \
     switch (n) {                                                              \
     case  0: gen_helper_0e2i(cmp ## type ## _ ## fmt ## _f, fp0, fp1, cc);    break;\
     case  1: gen_helper_0e2i(cmp ## type ## _ ## fmt ## _un, fp0, fp1, cc);   break;\
@@ -1283,7 +1277,7 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
 
             tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
             tcg_gen_trunc_tl_i32(fp0, t0);
-            gen_store_fpr32(fp0, ft);
+            gen_store_fpr32(ctx, fp0, ft);
             tcg_temp_free_i32(fp0);
         }
         opn = "lwc1";
@@ -1293,7 +1287,7 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv t1 = tcg_temp_new();
 
-            gen_load_fpr32(fp0, ft);
+            gen_load_fpr32(ctx, fp0, ft);
             tcg_gen_extu_i32_tl(t1, fp0);
             tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
             tcg_temp_free(t1);
@@ -5704,13 +5698,13 @@ static void gen_mftr(CPUMIPSState *env, DisasContext *ctx, int rt, int rd,
         if (h == 0) {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, rt);
+            gen_load_fpr32(ctx, fp0, rt);
             tcg_gen_ext_i32_tl(t0, fp0);
             tcg_temp_free_i32(fp0);
         } else {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32h(fp0, rt);
+            gen_load_fpr32h(ctx, fp0, rt);
             tcg_gen_ext_i32_tl(t0, fp0);
             tcg_temp_free_i32(fp0);
         }
@@ -5903,13 +5897,13 @@ static void gen_mttr(CPUMIPSState *env, DisasContext *ctx, int rd, int rt,
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
             tcg_gen_trunc_tl_i32(fp0, t0);
-            gen_store_fpr32(fp0, rd);
+            gen_store_fpr32(ctx, fp0, rd);
             tcg_temp_free_i32(fp0);
         } else {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
             tcg_gen_trunc_tl_i32(fp0, t0);
-            gen_store_fpr32h(fp0, rd);
+            gen_store_fpr32h(ctx, fp0, rd);
             tcg_temp_free_i32(fp0);
         }
         break;
@@ -6324,7 +6318,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             tcg_gen_ext_i32_tl(t0, fp0);
             tcg_temp_free_i32(fp0);
         }
@@ -6337,7 +6331,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
             tcg_gen_trunc_tl_i32(fp0, t0);
-            gen_store_fpr32(fp0, fs);
+            gen_store_fpr32(ctx, fp0, fs);
             tcg_temp_free_i32(fp0);
         }
         opn = "mtc1";
@@ -6368,7 +6362,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32h(fp0, fs);
+            gen_load_fpr32h(ctx, fp0, fs);
             tcg_gen_ext_i32_tl(t0, fp0);
             tcg_temp_free_i32(fp0);
         }
@@ -6381,7 +6375,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
             tcg_gen_trunc_tl_i32(fp0, t0);
-            gen_store_fpr32h(fp0, fs);
+            gen_store_fpr32h(ctx, fp0, fs);
             tcg_temp_free_i32(fp0);
         }
         opn = "mthc1";
@@ -6426,7 +6420,7 @@ static void gen_movci (DisasContext *ctx, int rd, int rs, int cc, int tf)
     gen_set_label(l1);
 }
 
-static inline void gen_movcf_s (int fs, int fd, int cc, int tf)
+static void gen_movcf_s(DisasContext *ctx, int fs, int fd, int cc, int tf)
 {
     int cond;
     TCGv_i32 t0 = tcg_temp_new_i32();
@@ -6439,13 +6433,13 @@ static inline void gen_movcf_s (int fs, int fd, int cc, int tf)
 
     tcg_gen_andi_i32(t0, fpu_fcr31, 1 << get_fp_bit(cc));
     tcg_gen_brcondi_i32(cond, t0, 0, l1);
-    gen_load_fpr32(t0, fs);
-    gen_store_fpr32(t0, fd);
+    gen_load_fpr32(ctx, t0, fs);
+    gen_store_fpr32(ctx, t0, fd);
     gen_set_label(l1);
     tcg_temp_free_i32(t0);
 }
 
-static inline void gen_movcf_d (DisasContext *ctx, int fs, int fd, int cc, int tf)
+static void gen_movcf_d(DisasContext *ctx, int fs, int fd, int cc, int tf)
 {
     int cond;
     TCGv_i32 t0 = tcg_temp_new_i32();
@@ -6467,7 +6461,7 @@ static inline void gen_movcf_d (DisasContext *ctx, int fs, int fd, int cc, int t
     gen_set_label(l1);
 }
 
-static inline void gen_movcf_ps (int fs, int fd, int cc, int tf)
+static void gen_movcf_ps(DisasContext *ctx, int fs, int fd, int cc, int tf)
 {
     int cond;
     TCGv_i32 t0 = tcg_temp_new_i32();
@@ -6481,14 +6475,14 @@ static inline void gen_movcf_ps (int fs, int fd, int cc, int tf)
 
     tcg_gen_andi_i32(t0, fpu_fcr31, 1 << get_fp_bit(cc));
     tcg_gen_brcondi_i32(cond, t0, 0, l1);
-    gen_load_fpr32(t0, fs);
-    gen_store_fpr32(t0, fd);
+    gen_load_fpr32(ctx, t0, fs);
+    gen_store_fpr32(ctx, t0, fd);
     gen_set_label(l1);
 
     tcg_gen_andi_i32(t0, fpu_fcr31, 1 << get_fp_bit(cc+1));
     tcg_gen_brcondi_i32(cond, t0, 0, l2);
-    gen_load_fpr32h(t0, fs);
-    gen_store_fpr32h(t0, fd);
+    gen_load_fpr32h(ctx, t0, fs);
+    gen_store_fpr32h(ctx, t0, fd);
     tcg_temp_free_i32(t0);
     gen_set_label(l2);
 }
@@ -6543,11 +6537,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
             gen_helper_float_add_s(fp0, cpu_env, fp0, fp1);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "add.s";
@@ -6558,11 +6552,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
             gen_helper_float_sub_s(fp0, cpu_env, fp0, fp1);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "sub.s";
@@ -6573,11 +6567,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
             gen_helper_float_mul_s(fp0, cpu_env, fp0, fp1);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "mul.s";
@@ -6588,11 +6582,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
             gen_helper_float_div_s(fp0, cpu_env, fp0, fp1);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "div.s";
@@ -6602,9 +6596,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_sqrt_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "sqrt.s";
@@ -6613,9 +6607,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_abs_s(fp0, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "abs.s";
@@ -6624,8 +6618,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_store_fpr32(fp0, fd);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "mov.s";
@@ -6634,9 +6628,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_chs_s(fp0, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "neg.s";
@@ -6647,7 +6641,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_roundl_s(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -6661,7 +6655,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_truncl_s(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -6675,7 +6669,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_ceill_s(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -6689,7 +6683,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_floorl_s(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -6701,9 +6695,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_roundw_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "round.w.s";
@@ -6712,9 +6706,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_truncw_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "trunc.w.s";
@@ -6723,9 +6717,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_ceilw_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "ceil.w.s";
@@ -6734,15 +6728,15 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_floorw_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "floor.w.s";
         break;
     case OPC_MOVCF_S:
-        gen_movcf_s(fs, fd, (ft >> 2) & 0x7, ft & 0x1);
+        gen_movcf_s(ctx, fs, fd, (ft >> 2) & 0x7, ft & 0x1);
         opn = "movcf.s";
         break;
     case OPC_MOVZ_S:
@@ -6754,8 +6748,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
                 tcg_gen_brcondi_tl(TCG_COND_NE, cpu_gpr[ft], 0, l1);
             }
             fp0 = tcg_temp_new_i32();
-            gen_load_fpr32(fp0, fs);
-            gen_store_fpr32(fp0, fd);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
             gen_set_label(l1);
         }
@@ -6769,8 +6763,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             if (ft != 0) {
                 tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[ft], 0, l1);
                 fp0 = tcg_temp_new_i32();
-                gen_load_fpr32(fp0, fs);
-                gen_store_fpr32(fp0, fd);
+                gen_load_fpr32(ctx, fp0, fs);
+                gen_store_fpr32(ctx, fp0, fd);
                 tcg_temp_free_i32(fp0);
                 gen_set_label(l1);
             }
@@ -6782,9 +6776,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_recip_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "recip.s";
@@ -6794,9 +6788,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_rsqrt_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "rsqrt.s";
@@ -6807,11 +6801,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, fd);
             gen_helper_float_recip2_s(fp0, cpu_env, fp0, fp1);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "recip2.s";
@@ -6821,9 +6815,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_recip1_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "recip1.s";
@@ -6833,9 +6827,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_rsqrt1_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "rsqrt1.s";
@@ -6846,11 +6840,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
             gen_helper_float_rsqrt2_s(fp0, cpu_env, fp0, fp1);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "rsqrt2.s";
@@ -6861,7 +6855,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_cvtd_s(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -6873,9 +6867,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_cvtw_s(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "cvt.w.s";
@@ -6886,7 +6880,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_cvtl_s(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -6901,8 +6895,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32_0 = tcg_temp_new_i32();
             TCGv_i32 fp32_1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp32_0, fs);
-            gen_load_fpr32(fp32_1, ft);
+            gen_load_fpr32(ctx, fp32_0, fs);
+            gen_load_fpr32(ctx, fp32_1, ft);
             tcg_gen_concat_i32_i64(fp64, fp32_1, fp32_0);
             tcg_temp_free_i32(fp32_1);
             tcg_temp_free_i32(fp32_0);
@@ -7103,7 +7097,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_roundw_d(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "round.w.d";
@@ -7117,7 +7111,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_truncw_d(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "trunc.w.d";
@@ -7131,7 +7125,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_ceilw_d(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "ceil.w.d";
@@ -7145,7 +7139,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_floorw_d(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "floor.w.d";
@@ -7297,7 +7291,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_cvts_d(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "cvt.s.d";
@@ -7311,7 +7305,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_cvtw_d(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "cvt.w.d";
@@ -7332,9 +7326,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_cvts_w(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "cvt.s.w";
@@ -7345,7 +7339,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp32 = tcg_temp_new_i32();
             TCGv_i64 fp64 = tcg_temp_new_i64();
 
-            gen_load_fpr32(fp32, fs);
+            gen_load_fpr32(ctx, fp32, fs);
             gen_helper_float_cvtd_w(fp64, cpu_env, fp32);
             tcg_temp_free_i32(fp32);
             gen_store_fpr64(ctx, fp64, fd);
@@ -7362,7 +7356,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             gen_load_fpr64(ctx, fp64, fs);
             gen_helper_float_cvts_l(fp32, cpu_env, fp64);
             tcg_temp_free_i64(fp64);
-            gen_store_fpr32(fp32, fd);
+            gen_store_fpr32(ctx, fp32, fd);
             tcg_temp_free_i32(fp32);
         }
         opn = "cvt.s.l";
@@ -7473,7 +7467,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         break;
     case OPC_MOVCF_PS:
         check_cp1_64bitmode(ctx);
-        gen_movcf_ps(fs, fd, (ft >> 2) & 0x7, ft & 0x1);
+        gen_movcf_ps(ctx, fs, fd, (ft >> 2) & 0x7, ft & 0x1);
         opn = "movcf.ps";
         break;
     case OPC_MOVZ_PS:
@@ -7598,9 +7592,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32h(fp0, fs);
+            gen_load_fpr32h(ctx, fp0, fs);
             gen_helper_float_cvts_pu(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "cvt.s.pu";
@@ -7622,9 +7616,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
         {
             TCGv_i32 fp0 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             gen_helper_float_cvts_pl(fp0, cpu_env, fp0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "cvt.s.pl";
@@ -7635,10 +7629,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
-            gen_store_fpr32h(fp0, fd);
-            gen_store_fpr32(fp1, fd);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
+            gen_store_fpr32h(ctx, fp0, fd);
+            gen_store_fpr32(ctx, fp1, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7650,10 +7644,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32h(fp1, ft);
-            gen_store_fpr32(fp1, fd);
-            gen_store_fpr32h(fp0, fd);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32h(ctx, fp1, ft);
+            gen_store_fpr32(ctx, fp1, fd);
+            gen_store_fpr32h(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7665,10 +7659,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32h(fp0, fs);
-            gen_load_fpr32(fp1, ft);
-            gen_store_fpr32(fp1, fd);
-            gen_store_fpr32h(fp0, fd);
+            gen_load_fpr32h(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
+            gen_store_fpr32(ctx, fp1, fd);
+            gen_store_fpr32h(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7680,10 +7674,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv_i32 fp1 = tcg_temp_new_i32();
 
-            gen_load_fpr32h(fp0, fs);
-            gen_load_fpr32h(fp1, ft);
-            gen_store_fpr32(fp1, fd);
-            gen_store_fpr32h(fp0, fd);
+            gen_load_fpr32h(ctx, fp0, fs);
+            gen_load_fpr32h(ctx, fp1, ft);
+            gen_store_fpr32(ctx, fp1, fd);
+            gen_store_fpr32h(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7757,7 +7751,7 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
 
             tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
             tcg_gen_trunc_tl_i32(fp0, t0);
-            gen_store_fpr32(fp0, fd);
+            gen_store_fpr32(ctx, fp0, fd);
             tcg_temp_free_i32(fp0);
         }
         opn = "lwxc1";
@@ -7792,7 +7786,7 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
             TCGv_i32 fp0 = tcg_temp_new_i32();
             TCGv t1 = tcg_temp_new();
 
-            gen_load_fpr32(fp0, fs);
+            gen_load_fpr32(ctx, fp0, fs);
             tcg_gen_extu_i32_tl(t1, fp0);
             tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
             tcg_temp_free_i32(fp0);
@@ -7852,24 +7846,24 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             tcg_gen_andi_tl(t0, t0, 0x7);
 
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            gen_load_fpr32(fp, fs);
-            gen_load_fpr32h(fph, fs);
-            gen_store_fpr32(fp, fd);
-            gen_store_fpr32h(fph, fd);
+            gen_load_fpr32(ctx, fp, fs);
+            gen_load_fpr32h(ctx, fph, fs);
+            gen_store_fpr32(ctx, fp, fd);
+            gen_store_fpr32h(ctx, fph, fd);
             tcg_gen_br(l2);
             gen_set_label(l1);
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 4, l2);
             tcg_temp_free(t0);
 #ifdef TARGET_WORDS_BIGENDIAN
-            gen_load_fpr32(fp, fs);
-            gen_load_fpr32h(fph, ft);
-            gen_store_fpr32h(fp, fd);
-            gen_store_fpr32(fph, fd);
+            gen_load_fpr32(ctx, fp, fs);
+            gen_load_fpr32h(ctx, fph, ft);
+            gen_store_fpr32h(ctx, fp, fd);
+            gen_store_fpr32(ctx, fph, fd);
 #else
-            gen_load_fpr32h(fph, fs);
-            gen_load_fpr32(fp, ft);
-            gen_store_fpr32(fph, fd);
-            gen_store_fpr32h(fp, fd);
+            gen_load_fpr32h(ctx, fph, fs);
+            gen_load_fpr32(ctx, fp, ft);
+            gen_store_fpr32(ctx, fph, fd);
+            gen_store_fpr32h(ctx, fp, fd);
 #endif
             gen_set_label(l2);
             tcg_temp_free_i32(fp);
@@ -7884,13 +7878,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             TCGv_i32 fp1 = tcg_temp_new_i32();
             TCGv_i32 fp2 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
-            gen_load_fpr32(fp2, fr);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
+            gen_load_fpr32(ctx, fp2, fr);
             gen_helper_float_muladd_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp2, fd);
+            gen_store_fpr32(ctx, fp2, fd);
             tcg_temp_free_i32(fp2);
         }
         opn = "madd.s";
@@ -7939,13 +7933,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             TCGv_i32 fp1 = tcg_temp_new_i32();
             TCGv_i32 fp2 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
-            gen_load_fpr32(fp2, fr);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
+            gen_load_fpr32(ctx, fp2, fr);
             gen_helper_float_mulsub_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp2, fd);
+            gen_store_fpr32(ctx, fp2, fd);
             tcg_temp_free_i32(fp2);
         }
         opn = "msub.s";
@@ -7994,13 +7988,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             TCGv_i32 fp1 = tcg_temp_new_i32();
             TCGv_i32 fp2 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
-            gen_load_fpr32(fp2, fr);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
+            gen_load_fpr32(ctx, fp2, fr);
             gen_helper_float_nmuladd_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp2, fd);
+            gen_store_fpr32(ctx, fp2, fd);
             tcg_temp_free_i32(fp2);
         }
         opn = "nmadd.s";
@@ -8049,13 +8043,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             TCGv_i32 fp1 = tcg_temp_new_i32();
             TCGv_i32 fp2 = tcg_temp_new_i32();
 
-            gen_load_fpr32(fp0, fs);
-            gen_load_fpr32(fp1, ft);
-            gen_load_fpr32(fp2, fr);
+            gen_load_fpr32(ctx, fp0, fs);
+            gen_load_fpr32(ctx, fp1, ft);
+            gen_load_fpr32(ctx, fp2, fr);
             gen_helper_float_nmulsub_s(fp2, cpu_env, fp0, fp1, fp2);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
-            gen_store_fpr32(fp2, fd);
+            gen_store_fpr32(ctx, fp2, fd);
             tcg_temp_free_i32(fp2);
         }
         opn = "nmsub.s";
@@ -10996,13 +10990,13 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
                 case MOVF_FMT:
                     switch (fmt) {
                     case FMT_SDPS_S:
-                        gen_movcf_s(rs, rt, cc, 0);
+                        gen_movcf_s(ctx, rs, rt, cc, 0);
                         break;
                     case FMT_SDPS_D:
                         gen_movcf_d(ctx, rs, rt, cc, 0);
                         break;
                     case FMT_SDPS_PS:
-                        gen_movcf_ps(rs, rt, cc, 0);
+                        gen_movcf_ps(ctx, rs, rt, cc, 0);
                         break;
                     default:
                         goto pool32f_invalid;
@@ -11011,13 +11005,13 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
                 case MOVT_FMT:
                     switch (fmt) {
                     case FMT_SDPS_S:
-                        gen_movcf_s(rs, rt, cc, 1);
+                        gen_movcf_s(ctx, rs, rt, cc, 1);
                         break;
                     case FMT_SDPS_D:
                         gen_movcf_d(ctx, rs, rt, cc, 1);
                         break;
                     case FMT_SDPS_PS:
-                        gen_movcf_ps(rs, rt, cc, 1);
+                        gen_movcf_ps(ctx, rs, rt, cc, 1);
                         break;
                     default:
                         goto pool32f_invalid;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 4/7] target-mips: Pass DisasContext to fpr32 load/store routines
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 4/7] target-mips: Pass DisasContext to fpr32 load/store routines Richard Henderson
@ 2012-09-18 16:39   ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2012-09-18 16:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 17, 2012 at 02:35:10PM -0700, Richard Henderson wrote:
> The large mechanical change in support of a follow-on patch
> that changes the representation of the fp registers.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/translate.c | 312 ++++++++++++++++++++++++------------------------
>  1 file changed, 153 insertions(+), 159 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index 775c3a1..b4301e9 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -662,42 +662,42 @@ static inline void gen_store_srsgpr (int from, int to)
>  }
>  
>  /* Floating point register moves. */
> -static inline void gen_load_fpr32 (TCGv_i32 t, int reg)
> +static inline void gen_load_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
>      tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
>  }
>  
> -static inline void gen_store_fpr32 (TCGv_i32 t, int reg)
> +static inline void gen_store_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
>      tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
>  }
>  
> -static inline void gen_load_fpr32h (TCGv_i32 t, int reg)
> +static inline void gen_load_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
>      tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
>  }
>  
> -static inline void gen_store_fpr32h (TCGv_i32 t, int reg)
> +static inline void gen_store_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
>      tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
>  }
>  
> -static inline void gen_load_fpr64 (DisasContext *ctx, TCGv_i64 t, int reg)
> +static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
>  {
>      if (ctx->hflags & MIPS_HFLAG_F64) {
>          tcg_gen_ld_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
>      } else {
>          TCGv_i32 t0 = tcg_temp_new_i32();
>          TCGv_i32 t1 = tcg_temp_new_i32();
> -        gen_load_fpr32(t0, reg & ~1);
> -        gen_load_fpr32(t1, reg | 1);
> +        gen_load_fpr32(ctx, t0, reg & ~1);
> +        gen_load_fpr32(ctx, t1, reg | 1);
>          tcg_gen_concat_i32_i64(t, t0, t1);
>          tcg_temp_free_i32(t0);
>          tcg_temp_free_i32(t1);
>      }
>  }
>  
> -static inline void gen_store_fpr64 (DisasContext *ctx, TCGv_i64 t, int reg)
> +static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
>  {
>      if (ctx->hflags & MIPS_HFLAG_F64) {
>          tcg_gen_st_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
> @@ -705,10 +705,10 @@ static inline void gen_store_fpr64 (DisasContext *ctx, TCGv_i64 t, int reg)
>          TCGv_i64 t0 = tcg_temp_new_i64();
>          TCGv_i32 t1 = tcg_temp_new_i32();
>          tcg_gen_trunc_i64_i32(t1, t);
> -        gen_store_fpr32(t1, reg & ~1);
> +        gen_store_fpr32(ctx, t1, reg & ~1);
>          tcg_gen_shri_i64(t0, t, 32);
>          tcg_gen_trunc_i64_i32(t1, t0);
> -        gen_store_fpr32(t1, reg | 1);
> +        gen_store_fpr32(ctx, t1, reg | 1);
>          tcg_temp_free_i32(t1);
>          tcg_temp_free_i64(t0);
>      }
> @@ -862,12 +862,6 @@ static inline void check_mips_64(DisasContext *ctx)
>          generate_exception(ctx, EXCP_RI);
>  }
>  
> -/* Define small wrappers for gen_load_fpr* so that we have a uniform
> -   calling interface for 32 and 64-bit FPRs.  No sense in changing
> -   all callers for gen_load_fpr32 when we need the CTX parameter for
> -   this one use.  */
> -#define gen_ldcmp_fpr32(ctx, x, y) gen_load_fpr32(x, y)
> -#define gen_ldcmp_fpr64(ctx, x, y) gen_load_fpr64(ctx, x, y)
>  #define FOP_CONDS(type, abs, fmt, ifmt, bits)                                 \
>  static inline void gen_cmp ## type ## _ ## fmt(DisasContext *ctx, int n,      \
>                                                 int ft, int fs, int cc)        \
> @@ -890,8 +884,8 @@ static inline void gen_cmp ## type ## _ ## fmt(DisasContext *ctx, int n,      \
>          }                                                                     \
>          break;                                                                \
>      }                                                                         \
> -    gen_ldcmp_fpr##bits (ctx, fp0, fs);                                       \
> -    gen_ldcmp_fpr##bits (ctx, fp1, ft);                                       \
> +    gen_load_fpr##bits(ctx, fp0, fs);                                         \
> +    gen_load_fpr##bits(ctx, fp1, ft);                                         \
>      switch (n) {                                                              \
>      case  0: gen_helper_0e2i(cmp ## type ## _ ## fmt ## _f, fp0, fp1, cc);    break;\
>      case  1: gen_helper_0e2i(cmp ## type ## _ ## fmt ## _un, fp0, fp1, cc);   break;\
> @@ -1283,7 +1277,7 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
>  
>              tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
>              tcg_gen_trunc_tl_i32(fp0, t0);
> -            gen_store_fpr32(fp0, ft);
> +            gen_store_fpr32(ctx, fp0, ft);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "lwc1";
> @@ -1293,7 +1287,7 @@ static void gen_flt_ldst (DisasContext *ctx, uint32_t opc, int ft,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv t1 = tcg_temp_new();
>  
> -            gen_load_fpr32(fp0, ft);
> +            gen_load_fpr32(ctx, fp0, ft);
>              tcg_gen_extu_i32_tl(t1, fp0);
>              tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
>              tcg_temp_free(t1);
> @@ -5704,13 +5698,13 @@ static void gen_mftr(CPUMIPSState *env, DisasContext *ctx, int rt, int rd,
>          if (h == 0) {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, rt);
> +            gen_load_fpr32(ctx, fp0, rt);
>              tcg_gen_ext_i32_tl(t0, fp0);
>              tcg_temp_free_i32(fp0);
>          } else {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32h(fp0, rt);
> +            gen_load_fpr32h(ctx, fp0, rt);
>              tcg_gen_ext_i32_tl(t0, fp0);
>              tcg_temp_free_i32(fp0);
>          }
> @@ -5903,13 +5897,13 @@ static void gen_mttr(CPUMIPSState *env, DisasContext *ctx, int rd, int rt,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
>              tcg_gen_trunc_tl_i32(fp0, t0);
> -            gen_store_fpr32(fp0, rd);
> +            gen_store_fpr32(ctx, fp0, rd);
>              tcg_temp_free_i32(fp0);
>          } else {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
>              tcg_gen_trunc_tl_i32(fp0, t0);
> -            gen_store_fpr32h(fp0, rd);
> +            gen_store_fpr32h(ctx, fp0, rd);
>              tcg_temp_free_i32(fp0);
>          }
>          break;
> @@ -6324,7 +6318,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              tcg_gen_ext_i32_tl(t0, fp0);
>              tcg_temp_free_i32(fp0);
>          }
> @@ -6337,7 +6331,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
>              tcg_gen_trunc_tl_i32(fp0, t0);
> -            gen_store_fpr32(fp0, fs);
> +            gen_store_fpr32(ctx, fp0, fs);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "mtc1";
> @@ -6368,7 +6362,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32h(fp0, fs);
> +            gen_load_fpr32h(ctx, fp0, fs);
>              tcg_gen_ext_i32_tl(t0, fp0);
>              tcg_temp_free_i32(fp0);
>          }
> @@ -6381,7 +6375,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
>              tcg_gen_trunc_tl_i32(fp0, t0);
> -            gen_store_fpr32h(fp0, fs);
> +            gen_store_fpr32h(ctx, fp0, fs);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "mthc1";
> @@ -6426,7 +6420,7 @@ static void gen_movci (DisasContext *ctx, int rd, int rs, int cc, int tf)
>      gen_set_label(l1);
>  }
>  
> -static inline void gen_movcf_s (int fs, int fd, int cc, int tf)
> +static void gen_movcf_s(DisasContext *ctx, int fs, int fd, int cc, int tf)
>  {
>      int cond;
>      TCGv_i32 t0 = tcg_temp_new_i32();
> @@ -6439,13 +6433,13 @@ static inline void gen_movcf_s (int fs, int fd, int cc, int tf)
>  
>      tcg_gen_andi_i32(t0, fpu_fcr31, 1 << get_fp_bit(cc));
>      tcg_gen_brcondi_i32(cond, t0, 0, l1);
> -    gen_load_fpr32(t0, fs);
> -    gen_store_fpr32(t0, fd);
> +    gen_load_fpr32(ctx, t0, fs);
> +    gen_store_fpr32(ctx, t0, fd);
>      gen_set_label(l1);
>      tcg_temp_free_i32(t0);
>  }
>  
> -static inline void gen_movcf_d (DisasContext *ctx, int fs, int fd, int cc, int tf)
> +static void gen_movcf_d(DisasContext *ctx, int fs, int fd, int cc, int tf)
>  {
>      int cond;
>      TCGv_i32 t0 = tcg_temp_new_i32();
> @@ -6467,7 +6461,7 @@ static inline void gen_movcf_d (DisasContext *ctx, int fs, int fd, int cc, int t
>      gen_set_label(l1);
>  }
>  
> -static inline void gen_movcf_ps (int fs, int fd, int cc, int tf)
> +static void gen_movcf_ps(DisasContext *ctx, int fs, int fd, int cc, int tf)
>  {
>      int cond;
>      TCGv_i32 t0 = tcg_temp_new_i32();
> @@ -6481,14 +6475,14 @@ static inline void gen_movcf_ps (int fs, int fd, int cc, int tf)
>  
>      tcg_gen_andi_i32(t0, fpu_fcr31, 1 << get_fp_bit(cc));
>      tcg_gen_brcondi_i32(cond, t0, 0, l1);
> -    gen_load_fpr32(t0, fs);
> -    gen_store_fpr32(t0, fd);
> +    gen_load_fpr32(ctx, t0, fs);
> +    gen_store_fpr32(ctx, t0, fd);
>      gen_set_label(l1);
>  
>      tcg_gen_andi_i32(t0, fpu_fcr31, 1 << get_fp_bit(cc+1));
>      tcg_gen_brcondi_i32(cond, t0, 0, l2);
> -    gen_load_fpr32h(t0, fs);
> -    gen_store_fpr32h(t0, fd);
> +    gen_load_fpr32h(ctx, t0, fs);
> +    gen_store_fpr32h(ctx, t0, fd);
>      tcg_temp_free_i32(t0);
>      gen_set_label(l2);
>  }
> @@ -6543,11 +6537,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
>              gen_helper_float_add_s(fp0, cpu_env, fp0, fp1);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "add.s";
> @@ -6558,11 +6552,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
>              gen_helper_float_sub_s(fp0, cpu_env, fp0, fp1);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "sub.s";
> @@ -6573,11 +6567,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
>              gen_helper_float_mul_s(fp0, cpu_env, fp0, fp1);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "mul.s";
> @@ -6588,11 +6582,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
>              gen_helper_float_div_s(fp0, cpu_env, fp0, fp1);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "div.s";
> @@ -6602,9 +6596,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_sqrt_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "sqrt.s";
> @@ -6613,9 +6607,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_abs_s(fp0, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "abs.s";
> @@ -6624,8 +6618,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_store_fpr32(fp0, fd);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "mov.s";
> @@ -6634,9 +6628,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_chs_s(fp0, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "neg.s";
> @@ -6647,7 +6641,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_roundl_s(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -6661,7 +6655,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_truncl_s(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -6675,7 +6669,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_ceill_s(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -6689,7 +6683,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_floorl_s(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -6701,9 +6695,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_roundw_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "round.w.s";
> @@ -6712,9 +6706,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_truncw_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "trunc.w.s";
> @@ -6723,9 +6717,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_ceilw_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "ceil.w.s";
> @@ -6734,15 +6728,15 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_floorw_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "floor.w.s";
>          break;
>      case OPC_MOVCF_S:
> -        gen_movcf_s(fs, fd, (ft >> 2) & 0x7, ft & 0x1);
> +        gen_movcf_s(ctx, fs, fd, (ft >> 2) & 0x7, ft & 0x1);
>          opn = "movcf.s";
>          break;
>      case OPC_MOVZ_S:
> @@ -6754,8 +6748,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>                  tcg_gen_brcondi_tl(TCG_COND_NE, cpu_gpr[ft], 0, l1);
>              }
>              fp0 = tcg_temp_new_i32();
> -            gen_load_fpr32(fp0, fs);
> -            gen_store_fpr32(fp0, fd);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>              gen_set_label(l1);
>          }
> @@ -6769,8 +6763,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              if (ft != 0) {
>                  tcg_gen_brcondi_tl(TCG_COND_EQ, cpu_gpr[ft], 0, l1);
>                  fp0 = tcg_temp_new_i32();
> -                gen_load_fpr32(fp0, fs);
> -                gen_store_fpr32(fp0, fd);
> +                gen_load_fpr32(ctx, fp0, fs);
> +                gen_store_fpr32(ctx, fp0, fd);
>                  tcg_temp_free_i32(fp0);
>                  gen_set_label(l1);
>              }
> @@ -6782,9 +6776,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_recip_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "recip.s";
> @@ -6794,9 +6788,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_rsqrt_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "rsqrt.s";
> @@ -6807,11 +6801,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, fd);
>              gen_helper_float_recip2_s(fp0, cpu_env, fp0, fp1);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "recip2.s";
> @@ -6821,9 +6815,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_recip1_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "recip1.s";
> @@ -6833,9 +6827,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_rsqrt1_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "rsqrt1.s";
> @@ -6846,11 +6840,11 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
>              gen_helper_float_rsqrt2_s(fp0, cpu_env, fp0, fp1);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "rsqrt2.s";
> @@ -6861,7 +6855,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_cvtd_s(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -6873,9 +6867,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_cvtw_s(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "cvt.w.s";
> @@ -6886,7 +6880,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_cvtl_s(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -6901,8 +6895,8 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32_0 = tcg_temp_new_i32();
>              TCGv_i32 fp32_1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp32_0, fs);
> -            gen_load_fpr32(fp32_1, ft);
> +            gen_load_fpr32(ctx, fp32_0, fs);
> +            gen_load_fpr32(ctx, fp32_1, ft);
>              tcg_gen_concat_i32_i64(fp64, fp32_1, fp32_0);
>              tcg_temp_free_i32(fp32_1);
>              tcg_temp_free_i32(fp32_0);
> @@ -7103,7 +7097,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_roundw_d(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "round.w.d";
> @@ -7117,7 +7111,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_truncw_d(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "trunc.w.d";
> @@ -7131,7 +7125,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_ceilw_d(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "ceil.w.d";
> @@ -7145,7 +7139,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_floorw_d(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "floor.w.d";
> @@ -7297,7 +7291,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_cvts_d(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "cvt.s.d";
> @@ -7311,7 +7305,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_cvtw_d(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "cvt.w.d";
> @@ -7332,9 +7326,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_cvts_w(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "cvt.s.w";
> @@ -7345,7 +7339,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp32 = tcg_temp_new_i32();
>              TCGv_i64 fp64 = tcg_temp_new_i64();
>  
> -            gen_load_fpr32(fp32, fs);
> +            gen_load_fpr32(ctx, fp32, fs);
>              gen_helper_float_cvtd_w(fp64, cpu_env, fp32);
>              tcg_temp_free_i32(fp32);
>              gen_store_fpr64(ctx, fp64, fd);
> @@ -7362,7 +7356,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              gen_load_fpr64(ctx, fp64, fs);
>              gen_helper_float_cvts_l(fp32, cpu_env, fp64);
>              tcg_temp_free_i64(fp64);
> -            gen_store_fpr32(fp32, fd);
> +            gen_store_fpr32(ctx, fp32, fd);
>              tcg_temp_free_i32(fp32);
>          }
>          opn = "cvt.s.l";
> @@ -7473,7 +7467,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          break;
>      case OPC_MOVCF_PS:
>          check_cp1_64bitmode(ctx);
> -        gen_movcf_ps(fs, fd, (ft >> 2) & 0x7, ft & 0x1);
> +        gen_movcf_ps(ctx, fs, fd, (ft >> 2) & 0x7, ft & 0x1);
>          opn = "movcf.ps";
>          break;
>      case OPC_MOVZ_PS:
> @@ -7598,9 +7592,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32h(fp0, fs);
> +            gen_load_fpr32h(ctx, fp0, fs);
>              gen_helper_float_cvts_pu(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "cvt.s.pu";
> @@ -7622,9 +7616,9 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>          {
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              gen_helper_float_cvts_pl(fp0, cpu_env, fp0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "cvt.s.pl";
> @@ -7635,10 +7629,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> -            gen_store_fpr32h(fp0, fd);
> -            gen_store_fpr32(fp1, fd);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
> +            gen_store_fpr32h(ctx, fp0, fd);
> +            gen_store_fpr32(ctx, fp1, fd);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
>          }
> @@ -7650,10 +7644,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32h(fp1, ft);
> -            gen_store_fpr32(fp1, fd);
> -            gen_store_fpr32h(fp0, fd);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32h(ctx, fp1, ft);
> +            gen_store_fpr32(ctx, fp1, fd);
> +            gen_store_fpr32h(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
>          }
> @@ -7665,10 +7659,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32h(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> -            gen_store_fpr32(fp1, fd);
> -            gen_store_fpr32h(fp0, fd);
> +            gen_load_fpr32h(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
> +            gen_store_fpr32(ctx, fp1, fd);
> +            gen_store_fpr32h(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
>          }
> @@ -7680,10 +7674,10 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32h(fp0, fs);
> -            gen_load_fpr32h(fp1, ft);
> -            gen_store_fpr32(fp1, fd);
> -            gen_store_fpr32h(fp0, fd);
> +            gen_load_fpr32h(ctx, fp0, fs);
> +            gen_load_fpr32h(ctx, fp1, ft);
> +            gen_store_fpr32(ctx, fp1, fd);
> +            gen_store_fpr32h(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
>          }
> @@ -7757,7 +7751,7 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
>  
>              tcg_gen_qemu_ld32s(t0, t0, ctx->mem_idx);
>              tcg_gen_trunc_tl_i32(fp0, t0);
> -            gen_store_fpr32(fp0, fd);
> +            gen_store_fpr32(ctx, fp0, fd);
>              tcg_temp_free_i32(fp0);
>          }
>          opn = "lwxc1";
> @@ -7792,7 +7786,7 @@ static void gen_flt3_ldst (DisasContext *ctx, uint32_t opc,
>              TCGv_i32 fp0 = tcg_temp_new_i32();
>              TCGv t1 = tcg_temp_new();
>  
> -            gen_load_fpr32(fp0, fs);
> +            gen_load_fpr32(ctx, fp0, fs);
>              tcg_gen_extu_i32_tl(t1, fp0);
>              tcg_gen_qemu_st32(t1, t0, ctx->mem_idx);
>              tcg_temp_free_i32(fp0);
> @@ -7852,24 +7846,24 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
>              tcg_gen_andi_tl(t0, t0, 0x7);
>  
>              tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
> -            gen_load_fpr32(fp, fs);
> -            gen_load_fpr32h(fph, fs);
> -            gen_store_fpr32(fp, fd);
> -            gen_store_fpr32h(fph, fd);
> +            gen_load_fpr32(ctx, fp, fs);
> +            gen_load_fpr32h(ctx, fph, fs);
> +            gen_store_fpr32(ctx, fp, fd);
> +            gen_store_fpr32h(ctx, fph, fd);
>              tcg_gen_br(l2);
>              gen_set_label(l1);
>              tcg_gen_brcondi_tl(TCG_COND_NE, t0, 4, l2);
>              tcg_temp_free(t0);
>  #ifdef TARGET_WORDS_BIGENDIAN
> -            gen_load_fpr32(fp, fs);
> -            gen_load_fpr32h(fph, ft);
> -            gen_store_fpr32h(fp, fd);
> -            gen_store_fpr32(fph, fd);
> +            gen_load_fpr32(ctx, fp, fs);
> +            gen_load_fpr32h(ctx, fph, ft);
> +            gen_store_fpr32h(ctx, fp, fd);
> +            gen_store_fpr32(ctx, fph, fd);
>  #else
> -            gen_load_fpr32h(fph, fs);
> -            gen_load_fpr32(fp, ft);
> -            gen_store_fpr32(fph, fd);
> -            gen_store_fpr32h(fp, fd);
> +            gen_load_fpr32h(ctx, fph, fs);
> +            gen_load_fpr32(ctx, fp, ft);
> +            gen_store_fpr32(ctx, fph, fd);
> +            gen_store_fpr32h(ctx, fp, fd);
>  #endif
>              gen_set_label(l2);
>              tcg_temp_free_i32(fp);
> @@ -7884,13 +7878,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>              TCGv_i32 fp2 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> -            gen_load_fpr32(fp2, fr);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
> +            gen_load_fpr32(ctx, fp2, fr);
>              gen_helper_float_muladd_s(fp2, cpu_env, fp0, fp1, fp2);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp2, fd);
> +            gen_store_fpr32(ctx, fp2, fd);
>              tcg_temp_free_i32(fp2);
>          }
>          opn = "madd.s";
> @@ -7939,13 +7933,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>              TCGv_i32 fp2 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> -            gen_load_fpr32(fp2, fr);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
> +            gen_load_fpr32(ctx, fp2, fr);
>              gen_helper_float_mulsub_s(fp2, cpu_env, fp0, fp1, fp2);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp2, fd);
> +            gen_store_fpr32(ctx, fp2, fd);
>              tcg_temp_free_i32(fp2);
>          }
>          opn = "msub.s";
> @@ -7994,13 +7988,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>              TCGv_i32 fp2 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> -            gen_load_fpr32(fp2, fr);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
> +            gen_load_fpr32(ctx, fp2, fr);
>              gen_helper_float_nmuladd_s(fp2, cpu_env, fp0, fp1, fp2);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp2, fd);
> +            gen_store_fpr32(ctx, fp2, fd);
>              tcg_temp_free_i32(fp2);
>          }
>          opn = "nmadd.s";
> @@ -8049,13 +8043,13 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
>              TCGv_i32 fp1 = tcg_temp_new_i32();
>              TCGv_i32 fp2 = tcg_temp_new_i32();
>  
> -            gen_load_fpr32(fp0, fs);
> -            gen_load_fpr32(fp1, ft);
> -            gen_load_fpr32(fp2, fr);
> +            gen_load_fpr32(ctx, fp0, fs);
> +            gen_load_fpr32(ctx, fp1, ft);
> +            gen_load_fpr32(ctx, fp2, fr);
>              gen_helper_float_nmulsub_s(fp2, cpu_env, fp0, fp1, fp2);
>              tcg_temp_free_i32(fp0);
>              tcg_temp_free_i32(fp1);
> -            gen_store_fpr32(fp2, fd);
> +            gen_store_fpr32(ctx, fp2, fd);
>              tcg_temp_free_i32(fp2);
>          }
>          opn = "nmsub.s";
> @@ -10996,13 +10990,13 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>                  case MOVF_FMT:
>                      switch (fmt) {
>                      case FMT_SDPS_S:
> -                        gen_movcf_s(rs, rt, cc, 0);
> +                        gen_movcf_s(ctx, rs, rt, cc, 0);
>                          break;
>                      case FMT_SDPS_D:
>                          gen_movcf_d(ctx, rs, rt, cc, 0);
>                          break;
>                      case FMT_SDPS_PS:
> -                        gen_movcf_ps(rs, rt, cc, 0);
> +                        gen_movcf_ps(ctx, rs, rt, cc, 0);
>                          break;
>                      default:
>                          goto pool32f_invalid;
> @@ -11011,13 +11005,13 @@ static void decode_micromips32_opc (CPUMIPSState *env, DisasContext *ctx,
>                  case MOVT_FMT:
>                      switch (fmt) {
>                      case FMT_SDPS_S:
> -                        gen_movcf_s(rs, rt, cc, 1);
> +                        gen_movcf_s(ctx, rs, rt, cc, 1);
>                          break;
>                      case FMT_SDPS_D:
>                          gen_movcf_d(ctx, rs, rt, cc, 1);
>                          break;
>                      case FMT_SDPS_PS:
> -                        gen_movcf_ps(rs, rt, cc, 1);
> +                        gen_movcf_ps(ctx, rs, rt, cc, 1);
>                          break;
>                      default:
>                          goto pool32f_invalid;

This patch looks fine. That said it is pointless without the next one...


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 5/7] target-mips: Use TCG registers for the FPU.
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
                   ` (3 preceding siblings ...)
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 4/7] target-mips: Pass DisasContext to fpr32 load/store routines Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-18 16:39   ` Aurelien Jarno
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 6/7] target-mips: Add accessors for the two 32-bit halves of a 64-bit FPR Richard Henderson
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 7/7] target-mips: Implement Loongson Multimedia Instructions Richard Henderson
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

With normal FP, this doesn't have much affect on the generated code,
because most of the FP operations are not CONST/PURE, and so we spill
registers in about the same frequency as the explicit load/stores.

But with Loongson multimedia instructions, which are all integral and
whose helpers are in fact CONST+PURE, this greatly improves the code.

Rather than over-use the deposit operation, we create TCG registers for
both the 64-bit FPU register as a whole and the two 32-bit halves.  We
only ever reference the whole register or the two half registers in any
one TB, so there's no problem with aliasing.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 141 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 97 insertions(+), 44 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index b4301e9..df92cec 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -479,6 +479,12 @@ static TCGv cpu_dspctrl, btarget, bcond;
 static TCGv_i32 hflags;
 static TCGv_i32 fpu_fcr0, fpu_fcr31;
 
+/* FPU registers.  These alias, but we'll only use one or the other
+   in any one TB based on MIPS_HFLAG_F64.  */
+static TCGv_i32 fpu_f32[32];
+static TCGv_i32 fpu_fh32[32];
+static TCGv_i64 fpu_f64[32];
+
 static uint32_t gen_opc_hflags[OPC_BUF_SIZE];
 
 #include "gen-icount.h"
@@ -545,26 +551,45 @@ enum {
     BS_EXCP     = 3, /* We reached an exception condition */
 };
 
-static const char *regnames[] =
-    { "r0", "at", "v0", "v1", "a0", "a1", "a2", "a3",
-      "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7",
-      "s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
-      "t8", "t9", "k0", "k1", "gp", "sp", "s8", "ra", };
+static const char * const regnames[] = {
+    "r0", "at", "v0", "v1", "a0", "a1", "a2", "a3",
+    "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7",
+    "s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
+    "t8", "t9", "k0", "k1", "gp", "sp", "s8", "ra",
+};
+
+static const char * const regnames_HI[] = {
+    "HI0", "HI1", "HI2", "HI3",
+};
 
-static const char *regnames_HI[] =
-    { "HI0", "HI1", "HI2", "HI3", };
+static const char * const regnames_LO[] = {
+    "LO0", "LO1", "LO2", "LO3",
+};
 
-static const char *regnames_LO[] =
-    { "LO0", "LO1", "LO2", "LO3", };
+static const char * const regnames_ACX[] = {
+    "ACX0", "ACX1", "ACX2", "ACX3",
+};
 
-static const char *regnames_ACX[] =
-    { "ACX0", "ACX1", "ACX2", "ACX3", };
+static const char * const fregnames[] = {
+    "f0",  "f1",  "f2",  "f3",  "f4",  "f5",  "f6",  "f7",
+    "f8",  "f9",  "f10", "f11", "f12", "f13", "f14", "f15",
+    "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23",
+    "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
+};
 
-static const char *fregnames[] =
-    { "f0",  "f1",  "f2",  "f3",  "f4",  "f5",  "f6",  "f7",
-      "f8",  "f9",  "f10", "f11", "f12", "f13", "f14", "f15",
-      "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23",
-      "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", };
+static const char * const flregnames[] = {
+    "fl0",  "fl1",  "fl2",  "fl3",  "fl4",  "fl5",  "fl6",  "fl7",
+    "fl8",  "fl9",  "fl10", "fl11", "fl12", "fl13", "fl14", "fl15",
+    "fl16", "fl17", "fl18", "fl19", "fl20", "fl21", "fl22", "fl23",
+    "fl24", "fl25", "fl26", "fl27", "fl28", "fl29", "fl30", "fl31",
+};
+
+static const char * const fhregnames[] = {
+    "fh0",  "fh1",  "fh2",  "fh3",  "fh4",  "fh5",  "fh6",  "fh7",
+    "fh8",  "fh9",  "fh10", "fh11", "fh12", "fh13", "fh14", "fh15",
+    "fh16", "fh17", "fh18", "fh19", "fh20", "fh21", "fh22", "fh23",
+    "fh24", "fh25", "fh26", "fh27", "fh28", "fh29", "fh30", "fh31",
+};
 
 #ifdef MIPS_DEBUG_DISAS
 #define MIPS_DEBUG(fmt, ...)                                                  \
@@ -662,55 +687,70 @@ static inline void gen_store_srsgpr (int from, int to)
 }
 
 /* Floating point register moves. */
-static inline void gen_load_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
+static void gen_load_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
 {
-    tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
+    if (ctx->hflags & MIPS_HFLAG_F64) {
+        tcg_gen_trunc_i64_i32(t, fpu_f64[reg]);
+    } else {
+        tcg_gen_mov_i32(t, fpu_f32[reg]);
+    }
 }
 
-static inline void gen_store_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
+static void gen_store_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
 {
-    tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
+    if (ctx->hflags & MIPS_HFLAG_F64) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        tcg_gen_ext_i32_i64(t64, t);
+        tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 0, 32);
+        tcg_temp_free_i64(t64);
+    } else {
+        tcg_gen_mov_i32(fpu_f32[reg], t);
+    }
 }
 
-static inline void gen_load_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
+static void gen_load_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
 {
-    tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
+    if (ctx->hflags & MIPS_HFLAG_F64) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        tcg_gen_shri_i64(t64, fpu_f64[reg], 32);
+        tcg_gen_trunc_i64_i32(t, t64);
+        tcg_temp_free_i64(t64);
+    } else {
+        tcg_gen_mov_i32(t, fpu_fh32[reg]);
+    }
 }
 
-static inline void gen_store_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
+static void gen_store_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
 {
-    tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
+    if (ctx->hflags & MIPS_HFLAG_F64) {
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        tcg_gen_ext_i32_i64(t64, t);
+        tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 32, 32);
+        tcg_temp_free_i64(t64);
+    } else {
+        tcg_gen_mov_i32(fpu_fh32[reg], t);
+    }
 }
 
-static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
+static void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
     if (ctx->hflags & MIPS_HFLAG_F64) {
-        tcg_gen_ld_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
+        tcg_gen_mov_i64(t, fpu_f64[reg]);
     } else {
-        TCGv_i32 t0 = tcg_temp_new_i32();
-        TCGv_i32 t1 = tcg_temp_new_i32();
-        gen_load_fpr32(ctx, t0, reg & ~1);
-        gen_load_fpr32(ctx, t1, reg | 1);
-        tcg_gen_concat_i32_i64(t, t0, t1);
-        tcg_temp_free_i32(t0);
-        tcg_temp_free_i32(t1);
+        tcg_gen_concat_i32_i64(t, fpu_f32[reg & ~1], fpu_f32[reg | 1]);
     }
 }
 
-static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
+static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
 {
     if (ctx->hflags & MIPS_HFLAG_F64) {
-        tcg_gen_st_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
+        tcg_gen_mov_i64(fpu_f64[reg], t);
     } else {
-        TCGv_i64 t0 = tcg_temp_new_i64();
-        TCGv_i32 t1 = tcg_temp_new_i32();
-        tcg_gen_trunc_i64_i32(t1, t);
-        gen_store_fpr32(ctx, t1, reg & ~1);
-        tcg_gen_shri_i64(t0, t, 32);
-        tcg_gen_trunc_i64_i32(t1, t0);
-        gen_store_fpr32(ctx, t1, reg | 1);
-        tcg_temp_free_i32(t1);
-        tcg_temp_free_i64(t0);
+        TCGv_i64 t64 = tcg_temp_new_i64();
+        tcg_gen_shri_i64(t64, t, 32);
+        tcg_gen_trunc_i64_i32(fpu_f32[reg | 1], t64);
+        tcg_temp_free_i64(t64);
+        tcg_gen_trunc_i64_i32(fpu_f32[reg & ~1], t);
     }
 }
 
@@ -12694,6 +12734,19 @@ static void mips_tcg_init(void)
                                        offsetof(CPUMIPSState, active_fpu.fcr31),
                                        "fcr31");
 
+    for (i = 0; i < 32; i++) {
+        int off = offsetof(CPUMIPSState, active_fpu.fpr[i].w[FP_ENDIAN_IDX]);
+        fpu_f32[i] = tcg_global_mem_new_i32(TCG_AREG0, off, flregnames[i]);
+    }
+    for (i = 0; i < 32; i++) {
+        int off = offsetof(CPUMIPSState, active_fpu.fpr[i].w[!FP_ENDIAN_IDX]);
+        fpu_fh32[i] = tcg_global_mem_new_i32(TCG_AREG0, off, fhregnames[i]);
+    }
+    for (i = 0; i < 32; i++) {
+        int off = offsetof(CPUMIPSState, active_fpu.fpr[i].w[FP_ENDIAN_IDX]);
+        fpu_f64[i] = tcg_global_mem_new_i64(TCG_AREG0, off, fregnames[i]);
+    }
+
     /* register helpers */
 #define GEN_HELPER 2
 #include "helper.h"
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 5/7] target-mips: Use TCG registers for the FPU.
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 5/7] target-mips: Use TCG registers for the FPU Richard Henderson
@ 2012-09-18 16:39   ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2012-09-18 16:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 17, 2012 at 02:35:11PM -0700, Richard Henderson wrote:
> With normal FP, this doesn't have much affect on the generated code,
> because most of the FP operations are not CONST/PURE, and so we spill
> registers in about the same frequency as the explicit load/stores.
> 
> But with Loongson multimedia instructions, which are all integral and
> whose helpers are in fact CONST+PURE, this greatly improves the code.
> 
> Rather than over-use the deposit operation, we create TCG registers for
> both the 64-bit FPU register as a whole and the two 32-bit halves.  We
> only ever reference the whole register or the two half registers in any
> one TB, so there's no problem with aliasing.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/translate.c | 141 +++++++++++++++++++++++++++++++++---------------
>  1 file changed, 97 insertions(+), 44 deletions(-)
> 
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index b4301e9..df92cec 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -479,6 +479,12 @@ static TCGv cpu_dspctrl, btarget, bcond;
>  static TCGv_i32 hflags;
>  static TCGv_i32 fpu_fcr0, fpu_fcr31;
>  
> +/* FPU registers.  These alias, but we'll only use one or the other
> +   in any one TB based on MIPS_HFLAG_F64.  */
> +static TCGv_i32 fpu_f32[32];
> +static TCGv_i32 fpu_fh32[32];
> +static TCGv_i64 fpu_f64[32];
> +
>  static uint32_t gen_opc_hflags[OPC_BUF_SIZE];
>  
>  #include "gen-icount.h"
> @@ -545,26 +551,45 @@ enum {
>      BS_EXCP     = 3, /* We reached an exception condition */
>  };
>  
> -static const char *regnames[] =
> -    { "r0", "at", "v0", "v1", "a0", "a1", "a2", "a3",
> -      "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7",
> -      "s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
> -      "t8", "t9", "k0", "k1", "gp", "sp", "s8", "ra", };
> +static const char * const regnames[] = {
> +    "r0", "at", "v0", "v1", "a0", "a1", "a2", "a3",
> +    "t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7",
> +    "s0", "s1", "s2", "s3", "s4", "s5", "s6", "s7",
> +    "t8", "t9", "k0", "k1", "gp", "sp", "s8", "ra",
> +};
> +
> +static const char * const regnames_HI[] = {
> +    "HI0", "HI1", "HI2", "HI3",
> +};
>  
> -static const char *regnames_HI[] =
> -    { "HI0", "HI1", "HI2", "HI3", };
> +static const char * const regnames_LO[] = {
> +    "LO0", "LO1", "LO2", "LO3",
> +};
>  
> -static const char *regnames_LO[] =
> -    { "LO0", "LO1", "LO2", "LO3", };
> +static const char * const regnames_ACX[] = {
> +    "ACX0", "ACX1", "ACX2", "ACX3",
> +};
>  
> -static const char *regnames_ACX[] =
> -    { "ACX0", "ACX1", "ACX2", "ACX3", };
> +static const char * const fregnames[] = {
> +    "f0",  "f1",  "f2",  "f3",  "f4",  "f5",  "f6",  "f7",
> +    "f8",  "f9",  "f10", "f11", "f12", "f13", "f14", "f15",
> +    "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23",
> +    "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31",
> +};
>  
> -static const char *fregnames[] =
> -    { "f0",  "f1",  "f2",  "f3",  "f4",  "f5",  "f6",  "f7",
> -      "f8",  "f9",  "f10", "f11", "f12", "f13", "f14", "f15",
> -      "f16", "f17", "f18", "f19", "f20", "f21", "f22", "f23",
> -      "f24", "f25", "f26", "f27", "f28", "f29", "f30", "f31", };
> +static const char * const flregnames[] = {
> +    "fl0",  "fl1",  "fl2",  "fl3",  "fl4",  "fl5",  "fl6",  "fl7",
> +    "fl8",  "fl9",  "fl10", "fl11", "fl12", "fl13", "fl14", "fl15",
> +    "fl16", "fl17", "fl18", "fl19", "fl20", "fl21", "fl22", "fl23",
> +    "fl24", "fl25", "fl26", "fl27", "fl28", "fl29", "fl30", "fl31",
> +};
> +
> +static const char * const fhregnames[] = {
> +    "fh0",  "fh1",  "fh2",  "fh3",  "fh4",  "fh5",  "fh6",  "fh7",
> +    "fh8",  "fh9",  "fh10", "fh11", "fh12", "fh13", "fh14", "fh15",
> +    "fh16", "fh17", "fh18", "fh19", "fh20", "fh21", "fh22", "fh23",
> +    "fh24", "fh25", "fh26", "fh27", "fh28", "fh29", "fh30", "fh31",
> +};
>
>  #ifdef MIPS_DEBUG_DISAS
>  #define MIPS_DEBUG(fmt, ...)                                                  \
> @@ -662,55 +687,70 @@ static inline void gen_store_srsgpr (int from, int to)
>  }
>  
>  /* Floating point register moves. */
> -static inline void gen_load_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
> +static void gen_load_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
> -    tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
> +    if (ctx->hflags & MIPS_HFLAG_F64) {
> +        tcg_gen_trunc_i64_i32(t, fpu_f64[reg]);
> +    } else {
> +        tcg_gen_mov_i32(t, fpu_f32[reg]);
> +    }
>  }
>  
> -static inline void gen_store_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
> +static void gen_store_fpr32(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
> -    tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[FP_ENDIAN_IDX]));
> +    if (ctx->hflags & MIPS_HFLAG_F64) {
> +        TCGv_i64 t64 = tcg_temp_new_i64();
> +        tcg_gen_ext_i32_i64(t64, t);
> +        tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 0, 32);
> +        tcg_temp_free_i64(t64);
> +    } else {
> +        tcg_gen_mov_i32(fpu_f32[reg], t);
> +    }
>  }
>  
> -static inline void gen_load_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
> +static void gen_load_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
> -    tcg_gen_ld_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
> +    if (ctx->hflags & MIPS_HFLAG_F64) {
> +        TCGv_i64 t64 = tcg_temp_new_i64();
> +        tcg_gen_shri_i64(t64, fpu_f64[reg], 32);
> +        tcg_gen_trunc_i64_i32(t, t64);
> +        tcg_temp_free_i64(t64);
> +    } else {
> +        tcg_gen_mov_i32(t, fpu_fh32[reg]);
> +    }
>  }
>  
> -static inline void gen_store_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
> +static void gen_store_fpr32h(DisasContext *ctx, TCGv_i32 t, int reg)
>  {
> -    tcg_gen_st_i32(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].w[!FP_ENDIAN_IDX]));
> +    if (ctx->hflags & MIPS_HFLAG_F64) {
> +        TCGv_i64 t64 = tcg_temp_new_i64();
> +        tcg_gen_ext_i32_i64(t64, t);
> +        tcg_gen_deposit_i64(fpu_f64[reg], fpu_f64[reg], t64, 32, 32);
> +        tcg_temp_free_i64(t64);
> +    } else {
> +        tcg_gen_mov_i32(fpu_fh32[reg], t);
> +    }
>  }
>  
> -static inline void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
> +static void gen_load_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
>  {
>      if (ctx->hflags & MIPS_HFLAG_F64) {
> -        tcg_gen_ld_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
> +        tcg_gen_mov_i64(t, fpu_f64[reg]);
>      } else {
> -        TCGv_i32 t0 = tcg_temp_new_i32();
> -        TCGv_i32 t1 = tcg_temp_new_i32();
> -        gen_load_fpr32(ctx, t0, reg & ~1);
> -        gen_load_fpr32(ctx, t1, reg | 1);
> -        tcg_gen_concat_i32_i64(t, t0, t1);
> -        tcg_temp_free_i32(t0);
> -        tcg_temp_free_i32(t1);
> +        tcg_gen_concat_i32_i64(t, fpu_f32[reg & ~1], fpu_f32[reg | 1]);
>      }
>  }
>  
> -static inline void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
> +static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
>  {
>      if (ctx->hflags & MIPS_HFLAG_F64) {
> -        tcg_gen_st_i64(t, cpu_env, offsetof(CPUMIPSState, active_fpu.fpr[reg].d));
> +        tcg_gen_mov_i64(fpu_f64[reg], t);
>      } else {
> -        TCGv_i64 t0 = tcg_temp_new_i64();
> -        TCGv_i32 t1 = tcg_temp_new_i32();
> -        tcg_gen_trunc_i64_i32(t1, t);
> -        gen_store_fpr32(ctx, t1, reg & ~1);
> -        tcg_gen_shri_i64(t0, t, 32);
> -        tcg_gen_trunc_i64_i32(t1, t0);
> -        gen_store_fpr32(ctx, t1, reg | 1);
> -        tcg_temp_free_i32(t1);
> -        tcg_temp_free_i64(t0);
> +        TCGv_i64 t64 = tcg_temp_new_i64();
> +        tcg_gen_shri_i64(t64, t, 32);
> +        tcg_gen_trunc_i64_i32(fpu_f32[reg | 1], t64);
> +        tcg_temp_free_i64(t64);
> +        tcg_gen_trunc_i64_i32(fpu_f32[reg & ~1], t);
>      }
>  }
>  
> @@ -12694,6 +12734,19 @@ static void mips_tcg_init(void)
>                                         offsetof(CPUMIPSState, active_fpu.fcr31),
>                                         "fcr31");
>  
> +    for (i = 0; i < 32; i++) {
> +        int off = offsetof(CPUMIPSState, active_fpu.fpr[i].w[FP_ENDIAN_IDX]);
> +        fpu_f32[i] = tcg_global_mem_new_i32(TCG_AREG0, off, flregnames[i]);
> +    }
> +    for (i = 0; i < 32; i++) {
> +        int off = offsetof(CPUMIPSState, active_fpu.fpr[i].w[!FP_ENDIAN_IDX]);
> +        fpu_fh32[i] = tcg_global_mem_new_i32(TCG_AREG0, off, fhregnames[i]);
> +    }
> +    for (i = 0; i < 32; i++) {
> +        int off = offsetof(CPUMIPSState, active_fpu.fpr[i].w[FP_ENDIAN_IDX]);

This should be fpr[i].d.

> +        fpu_f64[i] = tcg_global_mem_new_i64(TCG_AREG0, off, fregnames[i]);
> +    }
> +

Adding so many globals (i.e. multiplying by 4) has a cost that is greater
than the gains. Remember the register allocator is doing a loop on all
globals at the end of a basic block or when calling a non CONST 
helper/op. While the generated code looks nicer, this slow down the
guest by roughly 12% (measured on a boot time).

I am currently working on an optimization of the liveness/register
allocator which among other things, partly mitigates that (I hope to get
the patches ready for posting in a week or so). That said the slow down
is still around 3%. I think we should go for only mapping the fp
registers as 64-bit registers, and use trunc/shift/deposit to read/write
them. Of course the generated code doesn't look so nice, but what is
important is that the overall execution is faster, not slower.

>      /* register helpers */
>  #define GEN_HELPER 2
>  #include "helper.h"
> -- 
> 1.7.11.4
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 6/7] target-mips: Add accessors for the two 32-bit halves of a 64-bit FPR
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
                   ` (4 preceding siblings ...)
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 5/7] target-mips: Use TCG registers for the FPU Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 7/7] target-mips: Implement Loongson Multimedia Instructions Richard Henderson
  6 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Not much used yet, but more users to come.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/translate.c | 64 +++++++++++++++++++++++++++++--------------------
 1 file changed, 38 insertions(+), 26 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index df92cec..57454f0 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -754,6 +754,24 @@ static void gen_store_fpr64(DisasContext *ctx, TCGv_i64 t, int reg)
     }
 }
 
+static void gen_load_fpr_pair(DisasContext *ctx, TCGv_i32 tl,
+                              TCGv_i32 th, int reg)
+{
+    gen_load_fpr32(ctx, tl, reg);
+    gen_load_fpr32h(ctx, th, reg);
+}
+
+static void gen_store_fpr_pair(DisasContext *ctx, TCGv_i32 tl,
+                               TCGv_i32 th, int reg)
+{
+    if (ctx->hflags & MIPS_HFLAG_F64) {
+        tcg_gen_concat_i32_i64(fpu_f64[reg], tl, th);
+    } else {
+        tcg_gen_mov_i32(fpu_f32[reg], tl);
+        tcg_gen_mov_i32(fpu_fh32[reg], th);
+    }
+}
+
 static inline int get_fp_bit (int cc)
 {
     if (cc)
@@ -7671,8 +7689,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
 
             gen_load_fpr32(ctx, fp0, fs);
             gen_load_fpr32(ctx, fp1, ft);
-            gen_store_fpr32h(ctx, fp0, fd);
-            gen_store_fpr32(ctx, fp1, fd);
+            gen_store_fpr_pair(ctx, fp1, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7686,8 +7703,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
 
             gen_load_fpr32(ctx, fp0, fs);
             gen_load_fpr32h(ctx, fp1, ft);
-            gen_store_fpr32(ctx, fp1, fd);
-            gen_store_fpr32h(ctx, fp0, fd);
+            gen_store_fpr_pair(ctx, fp1, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7701,8 +7717,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
 
             gen_load_fpr32h(ctx, fp0, fs);
             gen_load_fpr32(ctx, fp1, ft);
-            gen_store_fpr32(ctx, fp1, fd);
-            gen_store_fpr32h(ctx, fp0, fd);
+            gen_store_fpr_pair(ctx, fp1, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7716,8 +7731,7 @@ static void gen_farith (DisasContext *ctx, enum fopcode op1,
 
             gen_load_fpr32h(ctx, fp0, fs);
             gen_load_fpr32h(ctx, fp1, ft);
-            gen_store_fpr32(ctx, fp1, fd);
-            gen_store_fpr32h(ctx, fp0, fd);
+            gen_store_fpr_pair(ctx, fp1, fp0, fd);
             tcg_temp_free_i32(fp0);
             tcg_temp_free_i32(fp1);
         }
@@ -7877,8 +7891,7 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
         check_cp1_64bitmode(ctx);
         {
             TCGv t0 = tcg_temp_local_new();
-            TCGv_i32 fp = tcg_temp_new_i32();
-            TCGv_i32 fph = tcg_temp_new_i32();
+            TCGv_i32 fp, fph;
             int l1 = gen_new_label();
             int l2 = gen_new_label();
 
@@ -7886,28 +7899,27 @@ static void gen_flt3_arith (DisasContext *ctx, uint32_t opc,
             tcg_gen_andi_tl(t0, t0, 0x7);
 
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0, l1);
-            gen_load_fpr32(ctx, fp, fs);
-            gen_load_fpr32h(ctx, fph, fs);
-            gen_store_fpr32(ctx, fp, fd);
-            gen_store_fpr32h(ctx, fph, fd);
+
+            fp = tcg_temp_new_i32();
+            fph = tcg_temp_new_i32();
+            gen_load_fpr_pair(ctx, fp, fph, fs);
+            gen_store_fpr_pair(ctx, fp, fph, fd);
+            tcg_temp_free_i32(fp);
+            tcg_temp_free_i32(fph);
             tcg_gen_br(l2);
+
             gen_set_label(l1);
             tcg_gen_brcondi_tl(TCG_COND_NE, t0, 4, l2);
             tcg_temp_free(t0);
-#ifdef TARGET_WORDS_BIGENDIAN
-            gen_load_fpr32(ctx, fp, fs);
-            gen_load_fpr32h(ctx, fph, ft);
-            gen_store_fpr32h(ctx, fp, fd);
-            gen_store_fpr32(ctx, fph, fd);
-#else
-            gen_load_fpr32h(ctx, fph, fs);
-            gen_load_fpr32(ctx, fp, ft);
-            gen_store_fpr32(ctx, fph, fd);
-            gen_store_fpr32h(ctx, fp, fd);
-#endif
-            gen_set_label(l2);
+
+            fp = tcg_temp_new_i32();
+            fph = tcg_temp_new_i32();
+            gen_load_fpr_pair(ctx, fp, fph, fs);
+            gen_store_fpr_pair(ctx, fph, fp, fd);
             tcg_temp_free_i32(fp);
             tcg_temp_free_i32(fph);
+
+            gen_set_label(l2);
         }
         opn = "alnv.ps";
         break;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Qemu-devel] [PATCH 7/7] target-mips: Implement Loongson Multimedia Instructions
  2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
                   ` (5 preceding siblings ...)
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 6/7] target-mips: Add accessors for the two 32-bit halves of a 64-bit FPR Richard Henderson
@ 2012-09-17 21:35 ` Richard Henderson
  2012-09-18 16:39   ` Aurelien Jarno
  6 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2012-09-17 21:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: aurelien

Implements all of the COP2 instructions except for the S<cond>
family of comparisons.  The documentation is unclear for those.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-mips/Makefile.objs |   2 +-
 target-mips/helper.h      |  59 ++++
 target-mips/lmi_helper.c  | 744 ++++++++++++++++++++++++++++++++++++++++++++++
 target-mips/translate.c   | 379 ++++++++++++++++++++++-
 4 files changed, 1180 insertions(+), 4 deletions(-)
 create mode 100644 target-mips/lmi_helper.c

diff --git a/target-mips/Makefile.objs b/target-mips/Makefile.objs
index ca20f21..3eeeeac 100644
--- a/target-mips/Makefile.objs
+++ b/target-mips/Makefile.objs
@@ -1,2 +1,2 @@
-obj-y += translate.o op_helper.o helper.o cpu.o
+obj-y += translate.o op_helper.o lmi_helper.o helper.o cpu.o
 obj-$(CONFIG_SOFTMMU) += machine.o
diff --git a/target-mips/helper.h b/target-mips/helper.h
index 109ac37..f35ed78 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -303,4 +303,63 @@ DEF_HELPER_1(rdhwr_ccres, tl, env)
 DEF_HELPER_2(pmon, void, env, int)
 DEF_HELPER_1(wait, void, env)
 
+/* Loongson multimedia functions.  */
+DEF_HELPER_FLAGS_2(paddsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(paddush, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(paddh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(paddw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(paddsb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(paddusb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(paddb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(psubsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psubush, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psubh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psubw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psubsb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psubusb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psubb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(pshufh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(packsswh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(packsshb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(packushb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(punpcklhw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(punpckhhw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(punpcklbh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(punpckhbh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(punpcklwd, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(punpckhwd, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(pavgh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pavgb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pmaxsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pminsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pmaxub, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pminub, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(pcmpeqw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pcmpgtw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pcmpeqh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pcmpgth, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pcmpeqb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pcmpgtb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(psllw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psllh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psrlw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psrlh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psraw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(psrah, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(pmullh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pmulhh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pmulhuh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(pmaddhw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+
+DEF_HELPER_FLAGS_2(pasubub, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
+DEF_HELPER_FLAGS_1(biadd, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64)
+DEF_HELPER_FLAGS_1(pmovmskb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64)
+
 #include "def-helper.h"
diff --git a/target-mips/lmi_helper.c b/target-mips/lmi_helper.c
new file mode 100644
index 0000000..1b24353
--- /dev/null
+++ b/target-mips/lmi_helper.c
@@ -0,0 +1,744 @@
+/*
+ *  Loongson Multimedia Instruction emulation helpers for QEMU.
+ *
+ *  Copyright (c) 2011  Richard Henderson <rth@twiddle.net>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "cpu.h"
+#include "helper.h"
+
+/* If the byte ordering doesn't matter, i.e. all columns are treated
+   identically, then this union can be used directly.  If byte ordering
+   does matter, we generally ignore dumping to memory.  */
+typedef union {
+    uint8_t  ub[8];
+    int8_t   sb[8];
+    uint16_t uh[4];
+    int16_t  sh[4];
+    uint32_t uw[2];
+    int32_t  sw[2];
+    uint64_t d;
+} LMIValue;
+
+/* Some byte ordering issues can be mitigated by XORing in the following.  */
+#ifdef HOST_WORDS_BIGENDIAN
+# define BYTE_ORDER_XOR(N) N
+#else
+# define BYTE_ORDER_XOR(N) 0
+#endif
+
+#define SATSB(x)  (x < -0x80 ? -0x80 : x > 0x7f ? 0x7f : x)
+#define SATUB(x)  (x > 0xff ? 0xff : x)
+
+#define SATSH(x)  (x < -0x8000 ? -0x8000 : x > 0x7fff ? 0x7fff : x)
+#define SATUH(x)  (x > 0xffff ? 0xffff : x)
+
+#define SATSW(x) \
+    (x < -0x80000000ll ? -0x80000000ll : x > 0x7fffffff ? 0x7fffffff : x)
+#define SATUW(x)  (x > 0xffffffffull ? 0xffffffffull : x)
+
+uint64_t helper_paddsb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        int r = vs.sb[i] + vt.sb[i];
+        vs.sb[i] = SATSB(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_paddusb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        int r = vs.ub[i] + vt.ub[i];
+        vs.ub[i] = SATUB(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_paddsh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        int r = vs.sh[i] + vt.sh[i];
+        vs.sh[i] = SATSH(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_paddush(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        int r = vs.uh[i] + vt.uh[i];
+        vs.uh[i] = SATUH(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_paddb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        vs.ub[i] += vt.ub[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_paddh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        vs.uh[i] += vt.uh[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_paddw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 2; ++i) {
+        vs.uw[i] += vt.uw[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubsb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        int r = vs.sb[i] - vt.sb[i];
+        vs.sb[i] = SATSB(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubusb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        int r = vs.ub[i] - vt.ub[i];
+        vs.ub[i] = SATUB(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubsh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        int r = vs.sh[i] - vt.sh[i];
+        vs.sh[i] = SATSH(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubush(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        int r = vs.uh[i] - vt.uh[i];
+        vs.uh[i] = SATUH(r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        vs.ub[i] -= vt.ub[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        vs.uh[i] -= vt.uh[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_psubw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned int i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 2; ++i) {
+        vs.uw[i] -= vt.uw[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_pshufh(uint64_t fs, uint64_t ft)
+{
+    unsigned host = BYTE_ORDER_XOR(3);
+    LMIValue vd, vs;
+    unsigned i;
+
+    vs.d = fs;
+    vd.d = 0;
+    for (i = 0; i < 4; i++, ft >>= 2) {
+        vd.uh[i ^ host] = vs.uh[(ft & 3) ^ host];
+    }
+    return vd.d;
+}
+
+uint64_t helper_packsswh(uint64_t fs, uint64_t ft)
+{
+    uint64_t fd = 0;
+    int64_t tmp;
+
+    tmp = (int32_t)(fs >> 0);
+    tmp = SATSH(tmp);
+    fd |= (tmp & 0xffff) << 0;
+
+    tmp = (int32_t)(fs >> 32);
+    tmp = SATSH(tmp);
+    fd |= (tmp & 0xffff) << 16;
+
+    tmp = (int32_t)(ft >> 0);
+    tmp = SATSH(tmp);
+    fd |= (tmp & 0xffff) << 32;
+
+    tmp = (int32_t)(ft >> 32);
+    tmp = SATSH(tmp);
+    fd |= (tmp & 0xffff) << 48;
+
+    return fd;
+}
+
+uint64_t helper_packsshb(uint64_t fs, uint64_t ft)
+{
+    uint64_t fd = 0;
+    unsigned int i;
+
+    for (i = 0; i < 4; ++i) {
+        int16_t tmp = fs >> (i * 16);
+        tmp = SATSB(tmp);
+        fd |= (uint64_t)(tmp & 0xff) << (i * 8);
+    }
+    for (i = 0; i < 4; ++i) {
+        int16_t tmp = ft >> (i * 16);
+        tmp = SATSB(tmp);
+        fd |= (uint64_t)(tmp & 0xff) << (i * 8 + 32);
+    }
+
+    return fd;
+}
+
+uint64_t helper_packushb(uint64_t fs, uint64_t ft)
+{
+    uint64_t fd = 0;
+    unsigned int i;
+
+    for (i = 0; i < 4; ++i) {
+        int16_t tmp = fs >> (i * 16);
+        tmp = SATUB(tmp);
+        fd |= (uint64_t)(tmp & 0xff) << (i * 8);
+    }
+    for (i = 0; i < 4; ++i) {
+        int16_t tmp = ft >> (i * 16);
+        tmp = SATUB(tmp);
+        fd |= (uint64_t)(tmp & 0xff) << (i * 8 + 32);
+    }
+
+    return fd;
+}
+
+uint64_t helper_punpcklwd(uint64_t fs, uint64_t ft)
+{
+    return (fs & 0xffffffff) | (ft << 32);
+}
+
+uint64_t helper_punpckhwd(uint64_t fs, uint64_t ft)
+{
+    return (fs >> 32) | (ft & ~0xffffffffull);
+}
+
+uint64_t helper_punpcklhw(uint64_t fs, uint64_t ft)
+{
+    unsigned host = BYTE_ORDER_XOR(3);
+    LMIValue vd, vs, vt;
+
+    vs.d = fs;
+    vt.d = ft;
+    vd.uh[0 ^ host] = vs.uh[0 ^ host];
+    vd.uh[1 ^ host] = vt.uh[0 ^ host];
+    vd.uh[2 ^ host] = vs.uh[1 ^ host];
+    vd.uh[3 ^ host] = vt.uh[1 ^ host];
+
+    return vd.d;
+}
+
+uint64_t helper_punpckhhw(uint64_t fs, uint64_t ft)
+{
+    unsigned host = BYTE_ORDER_XOR(3);
+    LMIValue vd, vs, vt;
+
+    vs.d = fs;
+    vt.d = ft;
+    vd.uh[0 ^ host] = vs.uh[2 ^ host];
+    vd.uh[1 ^ host] = vt.uh[2 ^ host];
+    vd.uh[2 ^ host] = vs.uh[3 ^ host];
+    vd.uh[3 ^ host] = vt.uh[3 ^ host];
+
+    return vd.d;
+}
+
+uint64_t helper_punpcklbh(uint64_t fs, uint64_t ft)
+{
+    unsigned host = BYTE_ORDER_XOR(7);
+    LMIValue vd, vs, vt;
+
+    vs.d = fs;
+    vt.d = ft;
+    vd.ub[0 ^ host] = vs.ub[0 ^ host];
+    vd.ub[1 ^ host] = vt.ub[0 ^ host];
+    vd.ub[2 ^ host] = vs.ub[1 ^ host];
+    vd.ub[3 ^ host] = vt.ub[1 ^ host];
+    vd.ub[4 ^ host] = vs.ub[2 ^ host];
+    vd.ub[5 ^ host] = vt.ub[2 ^ host];
+    vd.ub[6 ^ host] = vs.ub[3 ^ host];
+    vd.ub[7 ^ host] = vt.ub[3 ^ host];
+
+    return vd.d;
+}
+
+uint64_t helper_punpckhbh(uint64_t fs, uint64_t ft)
+{
+    unsigned host = BYTE_ORDER_XOR(7);
+    LMIValue vd, vs, vt;
+
+    vs.d = fs;
+    vt.d = ft;
+    vd.ub[0 ^ host] = vs.ub[4 ^ host];
+    vd.ub[1 ^ host] = vt.ub[4 ^ host];
+    vd.ub[2 ^ host] = vs.ub[5 ^ host];
+    vd.ub[3 ^ host] = vt.ub[5 ^ host];
+    vd.ub[4 ^ host] = vs.ub[6 ^ host];
+    vd.ub[5 ^ host] = vt.ub[6 ^ host];
+    vd.ub[6 ^ host] = vs.ub[7 ^ host];
+    vd.ub[7 ^ host] = vt.ub[7 ^ host];
+
+    return vd.d;
+}
+
+uint64_t helper_pavgh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.uh[i] = (vs.uh[i] + vt.uh[i] + 1) >> 1;
+    }
+    return vs.d;
+}
+
+uint64_t helper_pavgb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; i++) {
+        vs.ub[i] = (vs.ub[i] + vt.ub[i] + 1) >> 1;
+    }
+    return vs.d;
+}
+
+uint64_t helper_pmaxsh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.sh[i] = (vs.sh[i] >= vt.sh[i] ? vs.sh[i] : vt.sh[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pminsh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.sh[i] = (vs.sh[i] <= vt.sh[i] ? vs.sh[i] : vt.sh[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pmaxub(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.ub[i] = (vs.ub[i] >= vt.ub[i] ? vs.ub[i] : vt.ub[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pminub(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.ub[i] = (vs.ub[i] <= vt.ub[i] ? vs.ub[i] : vt.ub[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pcmpeqw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 2; i++) {
+        vs.uw[i] = -(vs.uw[i] == vt.uw[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pcmpgtw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 2; i++) {
+        vs.uw[i] = -(vs.uw[i] > vt.uw[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pcmpeqh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.uh[i] = -(vs.uh[i] == vt.uh[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pcmpgth(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; i++) {
+        vs.uh[i] = -(vs.uh[i] > vt.uh[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pcmpeqb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; i++) {
+        vs.ub[i] = -(vs.ub[i] == vt.ub[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_pcmpgtb(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; i++) {
+        vs.ub[i] = -(vs.ub[i] > vt.ub[i]);
+    }
+    return vs.d;
+}
+
+uint64_t helper_psllw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs;
+    unsigned i;
+
+    ft &= 0x7f;
+    if (ft > 31) {
+        return 0;
+    }
+    vs.d = fs;
+    for (i = 0; i < 2; ++i) {
+        vs.uw[i] <<= ft;
+    }
+    return vs.d;
+}
+
+uint64_t helper_psrlw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs;
+    unsigned i;
+
+    ft &= 0x7f;
+    if (ft > 31) {
+        return 0;
+    }
+    vs.d = fs;
+    for (i = 0; i < 2; ++i) {
+        vs.uw[i] >>= ft;
+    }
+    return vs.d;
+}
+
+uint64_t helper_psraw(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs;
+    unsigned i;
+
+    ft &= 0x7f;
+    if (ft > 31) {
+        ft = 31;
+    }
+    vs.d = fs;
+    for (i = 0; i < 2; ++i) {
+        vs.sw[i] >>= ft;
+    }
+    return vs.d;
+}
+
+uint64_t helper_psllh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs;
+    unsigned i;
+
+    ft &= 0x7f;
+    if (ft > 15) {
+        return 0;
+    }
+    vs.d = fs;
+    for (i = 0; i < 4; ++i) {
+        vs.uh[i] <<= ft;
+    }
+    return vs.d;
+}
+
+uint64_t helper_psrlh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs;
+    unsigned i;
+
+    ft &= 0x7f;
+    if (ft > 15) {
+        return 0;
+    }
+    vs.d = fs;
+    for (i = 0; i < 4; ++i) {
+        vs.uh[i] >>= ft;
+    }
+    return vs.d;
+}
+
+uint64_t helper_psrah(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs;
+    unsigned i;
+
+    ft &= 0x7f;
+    if (ft > 15) {
+        ft = 15;
+    }
+    vs.d = fs;
+    for (i = 0; i < 4; ++i) {
+        vs.sh[i] >>= ft;
+    }
+    return vs.d;
+}
+
+uint64_t helper_pmullh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        vs.sh[i] *= vt.sh[i];
+    }
+    return vs.d;
+}
+
+uint64_t helper_pmulhh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        int32_t r = vs.sh[i] * vt.sh[i];
+        vs.sh[i] = r >> 16;
+    }
+    return vs.d;
+}
+
+uint64_t helper_pmulhuh(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 4; ++i) {
+        uint32_t r = vs.uh[i] * vt.uh[i];
+        vs.uh[i] = r >> 16;
+    }
+    return vs.d;
+}
+
+uint64_t helper_pmaddhw(uint64_t fs, uint64_t ft)
+{
+    unsigned host = BYTE_ORDER_XOR(3);
+    LMIValue vs, vt;
+    uint32_t p0, p1;
+
+    vs.d = fs;
+    vt.d = ft;
+    p0  = vs.sh[0 ^ host] * vt.sh[0 ^ host];
+    p0 += vs.sh[1 ^ host] * vt.sh[1 ^ host];
+    p1  = vs.sh[2 ^ host] * vt.sh[2 ^ host];
+    p1 += vs.sh[3 ^ host] * vt.sh[3 ^ host];
+
+    return ((uint64_t)p1 << 32) | p0;
+}
+
+uint64_t helper_pasubub(uint64_t fs, uint64_t ft)
+{
+    LMIValue vs, vt;
+    unsigned i;
+
+    vs.d = fs;
+    vt.d = ft;
+    for (i = 0; i < 8; ++i) {
+        int r = vs.ub[i] - vt.ub[i];
+        vs.ub[i] = (r < 0 ? -r : r);
+    }
+    return vs.d;
+}
+
+uint64_t helper_biadd(uint64_t fs)
+{
+    unsigned i, fd;
+
+    for (i = fd = 0; i < 8; ++i) {
+        fd += (fs >> (i * 8)) & 0xff;
+    }
+    return fd & 0xffff;
+}
+
+uint64_t helper_pmovmskb(uint64_t fs)
+{
+    unsigned fd = 0;
+
+    fd |= ((fs >>  7) & 1) << 0;
+    fd |= ((fs >> 15) & 1) << 1;
+    fd |= ((fs >> 23) & 1) << 2;
+    fd |= ((fs >> 31) & 1) << 3;
+    fd |= ((fs >> 39) & 1) << 4;
+    fd |= ((fs >> 47) & 1) << 5;
+    fd |= ((fs >> 55) & 1) << 6;
+    fd |= ((fs >> 63) & 1) << 7;
+
+    return fd & 0xff;
+}
diff --git a/target-mips/translate.c b/target-mips/translate.c
index 57454f0..ac941e6 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -446,6 +446,103 @@ enum {
     OPC_BC2     = (0x08 << 21) | OPC_CP2,
 };
 
+#define MASK_LMI(op)  (MASK_OP_MAJOR(op) | (op & (0x1F << 21)) | (op & 0x1F))
+
+enum {
+    OPC_PADDSH  = (24 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDUSH = (25 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDH   = (26 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDW   = (27 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDSB  = (28 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDUSB = (29 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDB   = (30 << 21) | (0x00) | OPC_CP2,
+    OPC_PADDD   = (31 << 21) | (0x00) | OPC_CP2,
+
+    OPC_PSUBSH  = (24 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBUSH = (25 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBH   = (26 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBW   = (27 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBSB  = (28 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBUSB = (29 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBB   = (30 << 21) | (0x01) | OPC_CP2,
+    OPC_PSUBD   = (31 << 21) | (0x01) | OPC_CP2,
+
+    OPC_PSHUFH   = (24 << 21) | (0x02) | OPC_CP2,
+    OPC_PACKSSWH = (25 << 21) | (0x02) | OPC_CP2,
+    OPC_PACKSSHB = (26 << 21) | (0x02) | OPC_CP2,
+    OPC_PACKUSHB = (27 << 21) | (0x02) | OPC_CP2,
+    OPC_XOR_CP2  = (28 << 21) | (0x02) | OPC_CP2,
+    OPC_NOR_CP2  = (29 << 21) | (0x02) | OPC_CP2,
+    OPC_AND_CP2  = (30 << 21) | (0x02) | OPC_CP2,
+    OPC_PANDN    = (31 << 21) | (0x02) | OPC_CP2,
+
+    OPC_PUNPCKLHW = (24 << 21) | (0x03) | OPC_CP2,
+    OPC_PUNPCKHHW = (25 << 21) | (0x03) | OPC_CP2,
+    OPC_PUNPCKLBH = (26 << 21) | (0x03) | OPC_CP2,
+    OPC_PUNPCKHBH = (27 << 21) | (0x03) | OPC_CP2,
+    OPC_PINSRH_0  = (28 << 21) | (0x03) | OPC_CP2,
+    OPC_PINSRH_1  = (29 << 21) | (0x03) | OPC_CP2,
+    OPC_PINSRH_2  = (30 << 21) | (0x03) | OPC_CP2,
+    OPC_PINSRH_3  = (31 << 21) | (0x03) | OPC_CP2,
+
+    OPC_PAVGH   = (24 << 21) | (0x08) | OPC_CP2,
+    OPC_PAVGB   = (25 << 21) | (0x08) | OPC_CP2,
+    OPC_PMAXSH  = (26 << 21) | (0x08) | OPC_CP2,
+    OPC_PMINSH  = (27 << 21) | (0x08) | OPC_CP2,
+    OPC_PMAXUB  = (28 << 21) | (0x08) | OPC_CP2,
+    OPC_PMINUB  = (29 << 21) | (0x08) | OPC_CP2,
+
+    OPC_PCMPEQW = (24 << 21) | (0x09) | OPC_CP2,
+    OPC_PCMPGTW = (25 << 21) | (0x09) | OPC_CP2,
+    OPC_PCMPEQH = (26 << 21) | (0x09) | OPC_CP2,
+    OPC_PCMPGTH = (27 << 21) | (0x09) | OPC_CP2,
+    OPC_PCMPEQB = (28 << 21) | (0x09) | OPC_CP2,
+    OPC_PCMPGTB = (29 << 21) | (0x09) | OPC_CP2,
+
+    OPC_PSLLW   = (24 << 21) | (0x0A) | OPC_CP2,
+    OPC_PSLLH   = (25 << 21) | (0x0A) | OPC_CP2,
+    OPC_PMULLH  = (26 << 21) | (0x0A) | OPC_CP2,
+    OPC_PMULHH  = (27 << 21) | (0x0A) | OPC_CP2,
+    OPC_PMULUW  = (28 << 21) | (0x0A) | OPC_CP2,
+    OPC_PMULHUH = (29 << 21) | (0x0A) | OPC_CP2,
+
+    OPC_PSRLW     = (24 << 21) | (0x0B) | OPC_CP2,
+    OPC_PSRLH     = (25 << 21) | (0x0B) | OPC_CP2,
+    OPC_PSRAW     = (26 << 21) | (0x0B) | OPC_CP2,
+    OPC_PSRAH     = (27 << 21) | (0x0B) | OPC_CP2,
+    OPC_PUNPCKLWD = (28 << 21) | (0x0B) | OPC_CP2,
+    OPC_PUNPCKHWD = (29 << 21) | (0x0B) | OPC_CP2,
+
+    OPC_ADDU_CP2 = (24 << 21) | (0x0C) | OPC_CP2,
+    OPC_OR_CP2   = (25 << 21) | (0x0C) | OPC_CP2,
+    OPC_ADD_CP2  = (26 << 21) | (0x0C) | OPC_CP2,
+    OPC_DADD_CP2 = (27 << 21) | (0x0C) | OPC_CP2,
+    OPC_SEQU_CP2 = (28 << 21) | (0x0C) | OPC_CP2,
+    OPC_SEQ_CP2  = (29 << 21) | (0x0C) | OPC_CP2,
+
+    OPC_SUBU_CP2 = (24 << 21) | (0x0D) | OPC_CP2,
+    OPC_PASUBUB  = (25 << 21) | (0x0D) | OPC_CP2,
+    OPC_SUB_CP2  = (26 << 21) | (0x0D) | OPC_CP2,
+    OPC_DSUB_CP2 = (27 << 21) | (0x0D) | OPC_CP2,
+    OPC_SLTU_CP2 = (28 << 21) | (0x0D) | OPC_CP2,
+    OPC_SLT_CP2  = (29 << 21) | (0x0D) | OPC_CP2,
+
+    OPC_SLL_CP2  = (24 << 21) | (0x0E) | OPC_CP2,
+    OPC_DSLL_CP2 = (25 << 21) | (0x0E) | OPC_CP2,
+    OPC_PEXTRH   = (26 << 21) | (0x0E) | OPC_CP2,
+    OPC_PMADDHW  = (27 << 21) | (0x0E) | OPC_CP2,
+    OPC_SLEU_CP2 = (28 << 21) | (0x0E) | OPC_CP2,
+    OPC_SLE_CP2  = (29 << 21) | (0x0E) | OPC_CP2,
+
+    OPC_SRL_CP2  = (24 << 21) | (0x0F) | OPC_CP2,
+    OPC_DSRL_CP2 = (25 << 21) | (0x0F) | OPC_CP2,
+    OPC_SRA_CP2  = (26 << 21) | (0x0F) | OPC_CP2,
+    OPC_DSRA_CP2 = (27 << 21) | (0x0F) | OPC_CP2,
+    OPC_BIADD    = (28 << 21) | (0x0F) | OPC_CP2,
+    OPC_PMOVMSKB = (29 << 21) | (0x0F) | OPC_CP2,
+};
+
+
 #define MASK_CP3(op)       MASK_OP_MAJOR(op) | (op & 0x3F)
 
 enum {
@@ -2424,8 +2521,8 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
 }
 
 /* Godson integer instructions */
-static void gen_loongson_integer (DisasContext *ctx, uint32_t opc,
-                                int rd, int rs, int rt)
+static void gen_loongson_integer(DisasContext *ctx, uint32_t opc,
+                                 int rd, int rs, int rt)
 {
     const char *opn = "loongson";
     TCGv t0, t1;
@@ -2637,6 +2734,278 @@ static void gen_loongson_integer (DisasContext *ctx, uint32_t opc,
     tcg_temp_free(t1);
 }
 
+/* Loongson multimedia instructions */
+static void gen_loongson_multimedia(DisasContext *ctx, int rd, int rs, int rt)
+{
+    const char *opn = "loongson_cp2";
+    uint32_t opc, shift_max;
+    TCGv_i64 t0, t1;
+
+    opc = MASK_LMI(ctx->opcode);
+    switch (opc) {
+    case OPC_ADD_CP2:
+    case OPC_SUB_CP2:
+    case OPC_DADD_CP2:
+    case OPC_DSUB_CP2:
+        t0 = tcg_temp_local_new_i64();
+        t1 = tcg_temp_local_new_i64();
+        break;
+    default:
+        t0 = tcg_temp_new_i64();
+        t1 = tcg_temp_new_i64();
+        break;
+    }
+
+    gen_load_fpr64(ctx, t0, rs);
+    gen_load_fpr64(ctx, t1, rt);
+
+#define LMI_HELPER(UP, LO) \
+    case OPC_##UP: gen_helper_##LO(t0, t0, t1); opn = #LO; break
+#define LMI_HELPER_1(UP, LO) \
+    case OPC_##UP: gen_helper_##LO(t0, t0); opn = #LO; break
+#define LMI_DIRECT(UP, LO, OP) \
+    case OPC_##UP: tcg_gen_##OP##_i64(t0, t0, t1); opn = #LO; break
+
+    switch (opc) {
+    LMI_HELPER(PADDSH, paddsh);
+    LMI_HELPER(PADDUSH, paddush);
+    LMI_HELPER(PADDH, paddh);
+    LMI_HELPER(PADDW, paddw);
+    LMI_HELPER(PADDSB, paddsb);
+    LMI_HELPER(PADDUSB, paddusb);
+    LMI_HELPER(PADDB, paddb);
+
+    LMI_HELPER(PSUBSH, psubsh);
+    LMI_HELPER(PSUBUSH, psubush);
+    LMI_HELPER(PSUBH, psubh);
+    LMI_HELPER(PSUBW, psubw);
+    LMI_HELPER(PSUBSB, psubsb);
+    LMI_HELPER(PSUBUSB, psubusb);
+    LMI_HELPER(PSUBB, psubb);
+
+    LMI_HELPER(PSHUFH, pshufh);
+    LMI_HELPER(PACKSSWH, packsswh);
+    LMI_HELPER(PACKSSHB, packsshb);
+    LMI_HELPER(PACKUSHB, packushb);
+
+    LMI_HELPER(PUNPCKLHW, punpcklhw);
+    LMI_HELPER(PUNPCKHHW, punpckhhw);
+    LMI_HELPER(PUNPCKLBH, punpcklbh);
+    LMI_HELPER(PUNPCKHBH, punpckhbh);
+    LMI_HELPER(PUNPCKLWD, punpcklwd);
+    LMI_HELPER(PUNPCKHWD, punpckhwd);
+
+    LMI_HELPER(PAVGH, pavgh);
+    LMI_HELPER(PAVGB, pavgb);
+    LMI_HELPER(PMAXSH, pmaxsh);
+    LMI_HELPER(PMINSH, pminsh);
+    LMI_HELPER(PMAXUB, pmaxub);
+    LMI_HELPER(PMINUB, pminub);
+
+    LMI_HELPER(PCMPEQW, pcmpeqw);
+    LMI_HELPER(PCMPGTW, pcmpgtw);
+    LMI_HELPER(PCMPEQH, pcmpeqh);
+    LMI_HELPER(PCMPGTH, pcmpgth);
+    LMI_HELPER(PCMPEQB, pcmpeqb);
+    LMI_HELPER(PCMPGTB, pcmpgtb);
+
+    LMI_HELPER(PSLLW, psllw);
+    LMI_HELPER(PSLLH, psllh);
+    LMI_HELPER(PSRLW, psrlw);
+    LMI_HELPER(PSRLH, psrlh);
+    LMI_HELPER(PSRAW, psraw);
+    LMI_HELPER(PSRAH, psrah);
+
+    LMI_HELPER(PMULLH, pmullh);
+    LMI_HELPER(PMULHH, pmulhh);
+    LMI_HELPER(PMULHUH, pmulhuh);
+    LMI_HELPER(PMADDHW, pmaddhw);
+
+    LMI_HELPER(PASUBUB, pasubub);
+    LMI_HELPER_1(BIADD, biadd);
+    LMI_HELPER_1(PMOVMSKB, pmovmskb);
+
+    LMI_DIRECT(PADDD, paddd, add);
+    LMI_DIRECT(PSUBD, psubd, sub);
+    LMI_DIRECT(XOR_CP2, xor, xor);
+    LMI_DIRECT(NOR_CP2, nor, nor);
+    LMI_DIRECT(AND_CP2, and, and);
+    LMI_DIRECT(PANDN, pandn, andc);
+    LMI_DIRECT(OR, or, or);
+
+    case OPC_PINSRH_0:
+        tcg_gen_deposit_i64(t0, t0, t1, 0, 16);
+        opn = "pinsrh_0";
+        break;
+    case OPC_PINSRH_1:
+        tcg_gen_deposit_i64(t0, t0, t1, 16, 16);
+        opn = "pinsrh_1";
+        break;
+    case OPC_PINSRH_2:
+        tcg_gen_deposit_i64(t0, t0, t1, 32, 16);
+        opn = "pinsrh_2";
+        break;
+    case OPC_PINSRH_3:
+        tcg_gen_deposit_i64(t0, t0, t1, 48, 16);
+        opn = "pinsrh_3";
+        break;
+
+    case OPC_PEXTRH:
+        tcg_gen_andi_i64(t1, t1, 3);
+        tcg_gen_shli_i64(t1, t1, 4);
+        tcg_gen_shr_i64(t0, t0, t1);
+        tcg_gen_ext16u_i64(t0, t0);
+        opn = "pextrh";
+        break;
+
+    case OPC_ADDU_CP2:
+        tcg_gen_add_i64(t0, t0, t1);
+        tcg_gen_ext32s_i64(t0, t0);
+        opn = "addu";
+        break;
+    case OPC_SUBU_CP2:
+        tcg_gen_sub_i64(t0, t0, t1);
+        tcg_gen_ext32s_i64(t0, t0);
+        opn = "addu";
+        break;
+
+    case OPC_SLL_CP2:
+        opn = "sll";
+        shift_max = 32;
+        goto do_shift;
+    case OPC_SRL_CP2:
+        opn = "srl";
+        shift_max = 32;
+        goto do_shift;
+    case OPC_SRA_CP2:
+        opn = "sra";
+        shift_max = 32;
+        goto do_shift;
+    case OPC_DSLL_CP2:
+        opn = "dsll";
+        shift_max = 64;
+        goto do_shift;
+    case OPC_DSRL_CP2:
+        opn = "dsrl";
+        shift_max = 64;
+        goto do_shift;
+    case OPC_DSRA_CP2:
+        opn = "dsra";
+        shift_max = 64;
+        goto do_shift;
+    do_shift:
+        /* Make sure shift count isn't TCG undefined behaviour.  */
+        tcg_gen_andi_i64(t1, t1, shift_max - 1);
+
+        switch (opc) {
+        case OPC_SLL_CP2:
+        case OPC_DSLL_CP2:
+            tcg_gen_shl_i64(t0, t0, t1);
+            break;
+        case OPC_SRA_CP2:
+        case OPC_DSRA_CP2:
+            /* Since SRA is UndefinedResult without sign-extended inputs,
+               we can treat SRA and DSRA the same.  */
+            tcg_gen_sar_i64(t0, t0, t1);
+            break;
+        case OPC_SRL_CP2:
+            /* We want to shift in zeros for SRL; zero-extend first.  */
+            tcg_gen_ext32u_i64(t0, t0);
+            /* FALLTHRU */
+        case OPC_DSRL_CP2:
+            tcg_gen_shr_i64(t0, t0, t1);
+            break;
+        }
+
+        if (shift_max == 32) {
+            tcg_gen_ext32s_i64(t0, t0);
+        }
+
+        /* Shifts larger than MAX produce zero.  */
+        tcg_gen_setcondi_i64(TCG_COND_LTU, t1, t1, shift_max);
+        tcg_gen_neg_i64(t1, t1);
+        tcg_gen_and_i64(t0, t0, t1);
+        break;
+
+    case OPC_ADD_CP2:
+    case OPC_DADD_CP2:
+        {
+            TCGv_i64 t2 = tcg_temp_new_i64();
+            int lab = gen_new_label();
+
+            tcg_gen_mov_i64(t2, t0);
+            tcg_gen_add_i64(t0, t1, t2);
+            if (opc == OPC_ADD_CP2) {
+                tcg_gen_ext32s_i64(t0, t0);
+            }
+            tcg_gen_xor_i64(t1, t1, t2);
+            tcg_gen_xor_i64(t2, t2, t0);
+            tcg_gen_andc_i64(t1, t2, t1);
+            tcg_temp_free_i64(t2);
+            tcg_gen_brcondi_i64(TCG_COND_GE, t1, 0, lab);
+            generate_exception(ctx, EXCP_OVERFLOW);
+            gen_set_label(lab);
+
+            opn = (opc == OPC_ADD_CP2 ? "add" : "dadd");
+            break;
+        }
+
+    case OPC_SUB_CP2:
+    case OPC_DSUB_CP2:
+        {
+            TCGv_i64 t2 = tcg_temp_new_i64();
+            int lab = gen_new_label();
+
+            tcg_gen_mov_i64(t2, t0);
+            tcg_gen_sub_i64(t0, t1, t2);
+            if (opc == OPC_SUB_CP2) {
+                tcg_gen_ext32s_i64(t0, t0);
+            }
+            tcg_gen_xor_i64(t1, t1, t2);
+            tcg_gen_xor_i64(t2, t2, t0);
+            tcg_gen_and_i64(t1, t1, t2);
+            tcg_temp_free_i64(t2);
+            tcg_gen_brcondi_i64(TCG_COND_GE, t1, 0, lab);
+            generate_exception(ctx, EXCP_OVERFLOW);
+            gen_set_label(lab);
+
+            opn = (opc == OPC_SUB_CP2 ? "sub" : "dsub");
+            break;
+        }
+
+    case OPC_PMULUW:
+        tcg_gen_ext32u_i64(t0, t0);
+        tcg_gen_ext32u_i64(t1, t1);
+        tcg_gen_mul_i64(t0, t0, t1);
+        opn = "pmuluw";
+        break;
+
+    case OPC_SEQU_CP2:
+    case OPC_SEQ_CP2:
+    case OPC_SLTU_CP2:
+    case OPC_SLT_CP2:
+    case OPC_SLEU_CP2:
+    case OPC_SLE_CP2:
+        /* ??? Document is unclear: Set FCC[CC].  Does that mean the
+           FD field is the CC field?  */
+    default:
+        MIPS_INVAL(opn);
+        generate_exception(ctx, EXCP_RI);
+        return;
+    }
+
+#undef LMI_HELPER
+#undef LMI_DIRECT
+
+    gen_store_fpr64(ctx, t0, rd);
+
+    (void)opn; /* avoid a compiler warning */
+    MIPS_DEBUG("%s %s, %s, %s", opn,
+               fregnames[rd], fregnames[rs], fregnames[rt]);
+    tcg_temp_free_i64(t0);
+    tcg_temp_free_i64(t1);
+}
+
 /* Traps */
 static void gen_trap (DisasContext *ctx, uint32_t opc,
                       int rs, int rt, int16_t imm)
@@ -12344,10 +12713,14 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
     case OPC_LDC2:
     case OPC_SWC2:
     case OPC_SDC2:
-    case OPC_CP2:
         /* COP2: Not implemented. */
         generate_exception_err(ctx, EXCP_CpU, 2);
         break;
+    case OPC_CP2:
+        check_insn(env, ctx, INSN_LOONGSON2F);
+        /* Note that these instructions use different fields.  */
+        gen_loongson_multimedia(ctx, sa, rd, rt);
+        break;
 
     case OPC_CP3:
         if (env->CP0_Config1 & (1 << CP0C1_FP)) {
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [PATCH 7/7] target-mips: Implement Loongson Multimedia Instructions
  2012-09-17 21:35 ` [Qemu-devel] [PATCH 7/7] target-mips: Implement Loongson Multimedia Instructions Richard Henderson
@ 2012-09-18 16:39   ` Aurelien Jarno
  0 siblings, 0 replies; 14+ messages in thread
From: Aurelien Jarno @ 2012-09-18 16:39 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Sep 17, 2012 at 02:35:13PM -0700, Richard Henderson wrote:
> Implements all of the COP2 instructions except for the S<cond>
> family of comparisons.  The documentation is unclear for those.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  target-mips/Makefile.objs |   2 +-
>  target-mips/helper.h      |  59 ++++
>  target-mips/lmi_helper.c  | 744 ++++++++++++++++++++++++++++++++++++++++++++++
>  target-mips/translate.c   | 379 ++++++++++++++++++++++-
>  4 files changed, 1180 insertions(+), 4 deletions(-)
>  create mode 100644 target-mips/lmi_helper.c
> 
> diff --git a/target-mips/Makefile.objs b/target-mips/Makefile.objs
> index ca20f21..3eeeeac 100644
> --- a/target-mips/Makefile.objs
> +++ b/target-mips/Makefile.objs
> @@ -1,2 +1,2 @@
> -obj-y += translate.o op_helper.o helper.o cpu.o
> +obj-y += translate.o op_helper.o lmi_helper.o helper.o cpu.o
>  obj-$(CONFIG_SOFTMMU) += machine.o
> diff --git a/target-mips/helper.h b/target-mips/helper.h
> index 109ac37..f35ed78 100644
> --- a/target-mips/helper.h
> +++ b/target-mips/helper.h
> @@ -303,4 +303,63 @@ DEF_HELPER_1(rdhwr_ccres, tl, env)
>  DEF_HELPER_2(pmon, void, env, int)
>  DEF_HELPER_1(wait, void, env)
>  
> +/* Loongson multimedia functions.  */
> +DEF_HELPER_FLAGS_2(paddsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(paddush, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(paddh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(paddw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(paddsb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(paddusb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(paddb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(psubsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psubush, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psubh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psubw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psubsb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psubusb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psubb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(pshufh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(packsswh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(packsshb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(packushb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(punpcklhw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(punpckhhw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(punpcklbh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(punpckhbh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(punpcklwd, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(punpckhwd, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(pavgh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pavgb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pmaxsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pminsh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pmaxub, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pminub, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(pcmpeqw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pcmpgtw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pcmpeqh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pcmpgth, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pcmpeqb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pcmpgtb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(psllw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psllh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psrlw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psrlh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psraw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(psrah, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(pmullh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pmulhh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pmulhuh, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_2(pmaddhw, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +
> +DEF_HELPER_FLAGS_2(pasubub, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64, i64)
> +DEF_HELPER_FLAGS_1(biadd, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64)
> +DEF_HELPER_FLAGS_1(pmovmskb, TCG_CALL_CONST | TCG_CALL_PURE, i64, i64)
> +
>  #include "def-helper.h"
> diff --git a/target-mips/lmi_helper.c b/target-mips/lmi_helper.c
> new file mode 100644
> index 0000000..1b24353
> --- /dev/null
> +++ b/target-mips/lmi_helper.c
> @@ -0,0 +1,744 @@
> +/*
> + *  Loongson Multimedia Instruction emulation helpers for QEMU.
> + *
> + *  Copyright (c) 2011  Richard Henderson <rth@twiddle.net>
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "cpu.h"
> +#include "helper.h"
> +
> +/* If the byte ordering doesn't matter, i.e. all columns are treated
> +   identically, then this union can be used directly.  If byte ordering
> +   does matter, we generally ignore dumping to memory.  */
> +typedef union {
> +    uint8_t  ub[8];
> +    int8_t   sb[8];
> +    uint16_t uh[4];
> +    int16_t  sh[4];
> +    uint32_t uw[2];
> +    int32_t  sw[2];
> +    uint64_t d;
> +} LMIValue;
> +
> +/* Some byte ordering issues can be mitigated by XORing in the following.  */
> +#ifdef HOST_WORDS_BIGENDIAN
> +# define BYTE_ORDER_XOR(N) N
> +#else
> +# define BYTE_ORDER_XOR(N) 0
> +#endif
> +
> +#define SATSB(x)  (x < -0x80 ? -0x80 : x > 0x7f ? 0x7f : x)
> +#define SATUB(x)  (x > 0xff ? 0xff : x)
> +
> +#define SATSH(x)  (x < -0x8000 ? -0x8000 : x > 0x7fff ? 0x7fff : x)
> +#define SATUH(x)  (x > 0xffff ? 0xffff : x)
> +
> +#define SATSW(x) \
> +    (x < -0x80000000ll ? -0x80000000ll : x > 0x7fffffff ? 0x7fffffff : x)
> +#define SATUW(x)  (x > 0xffffffffull ? 0xffffffffull : x)
> +
> +uint64_t helper_paddsb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        int r = vs.sb[i] + vt.sb[i];
> +        vs.sb[i] = SATSB(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_paddusb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        int r = vs.ub[i] + vt.ub[i];
> +        vs.ub[i] = SATUB(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_paddsh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        int r = vs.sh[i] + vt.sh[i];
> +        vs.sh[i] = SATSH(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_paddush(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        int r = vs.uh[i] + vt.uh[i];
> +        vs.uh[i] = SATUH(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_paddb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        vs.ub[i] += vt.ub[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_paddh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        vs.uh[i] += vt.uh[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_paddw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 2; ++i) {
> +        vs.uw[i] += vt.uw[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubsb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        int r = vs.sb[i] - vt.sb[i];
> +        vs.sb[i] = SATSB(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubusb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        int r = vs.ub[i] - vt.ub[i];
> +        vs.ub[i] = SATUB(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubsh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        int r = vs.sh[i] - vt.sh[i];
> +        vs.sh[i] = SATSH(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubush(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        int r = vs.uh[i] - vt.uh[i];
> +        vs.uh[i] = SATUH(r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        vs.ub[i] -= vt.ub[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        vs.uh[i] -= vt.uh[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psubw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned int i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 2; ++i) {
> +        vs.uw[i] -= vt.uw[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pshufh(uint64_t fs, uint64_t ft)
> +{
> +    unsigned host = BYTE_ORDER_XOR(3);
> +    LMIValue vd, vs;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vd.d = 0;
> +    for (i = 0; i < 4; i++, ft >>= 2) {
> +        vd.uh[i ^ host] = vs.uh[(ft & 3) ^ host];
> +    }
> +    return vd.d;
> +}
> +
> +uint64_t helper_packsswh(uint64_t fs, uint64_t ft)
> +{
> +    uint64_t fd = 0;
> +    int64_t tmp;
> +
> +    tmp = (int32_t)(fs >> 0);
> +    tmp = SATSH(tmp);
> +    fd |= (tmp & 0xffff) << 0;
> +
> +    tmp = (int32_t)(fs >> 32);
> +    tmp = SATSH(tmp);
> +    fd |= (tmp & 0xffff) << 16;
> +
> +    tmp = (int32_t)(ft >> 0);
> +    tmp = SATSH(tmp);
> +    fd |= (tmp & 0xffff) << 32;
> +
> +    tmp = (int32_t)(ft >> 32);
> +    tmp = SATSH(tmp);
> +    fd |= (tmp & 0xffff) << 48;
> +
> +    return fd;
> +}
> +
> +uint64_t helper_packsshb(uint64_t fs, uint64_t ft)
> +{
> +    uint64_t fd = 0;
> +    unsigned int i;
> +
> +    for (i = 0; i < 4; ++i) {
> +        int16_t tmp = fs >> (i * 16);
> +        tmp = SATSB(tmp);
> +        fd |= (uint64_t)(tmp & 0xff) << (i * 8);
> +    }
> +    for (i = 0; i < 4; ++i) {
> +        int16_t tmp = ft >> (i * 16);
> +        tmp = SATSB(tmp);
> +        fd |= (uint64_t)(tmp & 0xff) << (i * 8 + 32);
> +    }
> +
> +    return fd;
> +}
> +
> +uint64_t helper_packushb(uint64_t fs, uint64_t ft)
> +{
> +    uint64_t fd = 0;
> +    unsigned int i;
> +
> +    for (i = 0; i < 4; ++i) {
> +        int16_t tmp = fs >> (i * 16);
> +        tmp = SATUB(tmp);
> +        fd |= (uint64_t)(tmp & 0xff) << (i * 8);
> +    }
> +    for (i = 0; i < 4; ++i) {
> +        int16_t tmp = ft >> (i * 16);
> +        tmp = SATUB(tmp);
> +        fd |= (uint64_t)(tmp & 0xff) << (i * 8 + 32);
> +    }
> +
> +    return fd;
> +}
> +
> +uint64_t helper_punpcklwd(uint64_t fs, uint64_t ft)
> +{
> +    return (fs & 0xffffffff) | (ft << 32);
> +}
> +
> +uint64_t helper_punpckhwd(uint64_t fs, uint64_t ft)
> +{
> +    return (fs >> 32) | (ft & ~0xffffffffull);
> +}
> +
> +uint64_t helper_punpcklhw(uint64_t fs, uint64_t ft)
> +{
> +    unsigned host = BYTE_ORDER_XOR(3);
> +    LMIValue vd, vs, vt;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    vd.uh[0 ^ host] = vs.uh[0 ^ host];
> +    vd.uh[1 ^ host] = vt.uh[0 ^ host];
> +    vd.uh[2 ^ host] = vs.uh[1 ^ host];
> +    vd.uh[3 ^ host] = vt.uh[1 ^ host];
> +
> +    return vd.d;
> +}
> +
> +uint64_t helper_punpckhhw(uint64_t fs, uint64_t ft)
> +{
> +    unsigned host = BYTE_ORDER_XOR(3);
> +    LMIValue vd, vs, vt;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    vd.uh[0 ^ host] = vs.uh[2 ^ host];
> +    vd.uh[1 ^ host] = vt.uh[2 ^ host];
> +    vd.uh[2 ^ host] = vs.uh[3 ^ host];
> +    vd.uh[3 ^ host] = vt.uh[3 ^ host];
> +
> +    return vd.d;
> +}
> +
> +uint64_t helper_punpcklbh(uint64_t fs, uint64_t ft)
> +{
> +    unsigned host = BYTE_ORDER_XOR(7);
> +    LMIValue vd, vs, vt;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    vd.ub[0 ^ host] = vs.ub[0 ^ host];
> +    vd.ub[1 ^ host] = vt.ub[0 ^ host];
> +    vd.ub[2 ^ host] = vs.ub[1 ^ host];
> +    vd.ub[3 ^ host] = vt.ub[1 ^ host];
> +    vd.ub[4 ^ host] = vs.ub[2 ^ host];
> +    vd.ub[5 ^ host] = vt.ub[2 ^ host];
> +    vd.ub[6 ^ host] = vs.ub[3 ^ host];
> +    vd.ub[7 ^ host] = vt.ub[3 ^ host];
> +
> +    return vd.d;
> +}
> +
> +uint64_t helper_punpckhbh(uint64_t fs, uint64_t ft)
> +{
> +    unsigned host = BYTE_ORDER_XOR(7);
> +    LMIValue vd, vs, vt;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    vd.ub[0 ^ host] = vs.ub[4 ^ host];
> +    vd.ub[1 ^ host] = vt.ub[4 ^ host];
> +    vd.ub[2 ^ host] = vs.ub[5 ^ host];
> +    vd.ub[3 ^ host] = vt.ub[5 ^ host];
> +    vd.ub[4 ^ host] = vs.ub[6 ^ host];
> +    vd.ub[5 ^ host] = vt.ub[6 ^ host];
> +    vd.ub[6 ^ host] = vs.ub[7 ^ host];
> +    vd.ub[7 ^ host] = vt.ub[7 ^ host];
> +
> +    return vd.d;
> +}
> +
> +uint64_t helper_pavgh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.uh[i] = (vs.uh[i] + vt.uh[i] + 1) >> 1;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pavgb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; i++) {
> +        vs.ub[i] = (vs.ub[i] + vt.ub[i] + 1) >> 1;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pmaxsh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.sh[i] = (vs.sh[i] >= vt.sh[i] ? vs.sh[i] : vt.sh[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pminsh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.sh[i] = (vs.sh[i] <= vt.sh[i] ? vs.sh[i] : vt.sh[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pmaxub(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.ub[i] = (vs.ub[i] >= vt.ub[i] ? vs.ub[i] : vt.ub[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pminub(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.ub[i] = (vs.ub[i] <= vt.ub[i] ? vs.ub[i] : vt.ub[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pcmpeqw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 2; i++) {
> +        vs.uw[i] = -(vs.uw[i] == vt.uw[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pcmpgtw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 2; i++) {
> +        vs.uw[i] = -(vs.uw[i] > vt.uw[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pcmpeqh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.uh[i] = -(vs.uh[i] == vt.uh[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pcmpgth(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; i++) {
> +        vs.uh[i] = -(vs.uh[i] > vt.uh[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pcmpeqb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; i++) {
> +        vs.ub[i] = -(vs.ub[i] == vt.ub[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pcmpgtb(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; i++) {
> +        vs.ub[i] = -(vs.ub[i] > vt.ub[i]);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psllw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs;
> +    unsigned i;
> +
> +    ft &= 0x7f;
> +    if (ft > 31) {
> +        return 0;
> +    }
> +    vs.d = fs;
> +    for (i = 0; i < 2; ++i) {
> +        vs.uw[i] <<= ft;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psrlw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs;
> +    unsigned i;
> +
> +    ft &= 0x7f;
> +    if (ft > 31) {
> +        return 0;
> +    }
> +    vs.d = fs;
> +    for (i = 0; i < 2; ++i) {
> +        vs.uw[i] >>= ft;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psraw(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs;
> +    unsigned i;
> +
> +    ft &= 0x7f;
> +    if (ft > 31) {
> +        ft = 31;
> +    }
> +    vs.d = fs;
> +    for (i = 0; i < 2; ++i) {
> +        vs.sw[i] >>= ft;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psllh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs;
> +    unsigned i;
> +
> +    ft &= 0x7f;
> +    if (ft > 15) {
> +        return 0;
> +    }
> +    vs.d = fs;
> +    for (i = 0; i < 4; ++i) {
> +        vs.uh[i] <<= ft;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psrlh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs;
> +    unsigned i;
> +
> +    ft &= 0x7f;
> +    if (ft > 15) {
> +        return 0;
> +    }
> +    vs.d = fs;
> +    for (i = 0; i < 4; ++i) {
> +        vs.uh[i] >>= ft;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_psrah(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs;
> +    unsigned i;
> +
> +    ft &= 0x7f;
> +    if (ft > 15) {
> +        ft = 15;
> +    }
> +    vs.d = fs;
> +    for (i = 0; i < 4; ++i) {
> +        vs.sh[i] >>= ft;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pmullh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        vs.sh[i] *= vt.sh[i];
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pmulhh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        int32_t r = vs.sh[i] * vt.sh[i];
> +        vs.sh[i] = r >> 16;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pmulhuh(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 4; ++i) {
> +        uint32_t r = vs.uh[i] * vt.uh[i];
> +        vs.uh[i] = r >> 16;
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_pmaddhw(uint64_t fs, uint64_t ft)
> +{
> +    unsigned host = BYTE_ORDER_XOR(3);
> +    LMIValue vs, vt;
> +    uint32_t p0, p1;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    p0  = vs.sh[0 ^ host] * vt.sh[0 ^ host];
> +    p0 += vs.sh[1 ^ host] * vt.sh[1 ^ host];
> +    p1  = vs.sh[2 ^ host] * vt.sh[2 ^ host];
> +    p1 += vs.sh[3 ^ host] * vt.sh[3 ^ host];
> +
> +    return ((uint64_t)p1 << 32) | p0;
> +}
> +
> +uint64_t helper_pasubub(uint64_t fs, uint64_t ft)
> +{
> +    LMIValue vs, vt;
> +    unsigned i;
> +
> +    vs.d = fs;
> +    vt.d = ft;
> +    for (i = 0; i < 8; ++i) {
> +        int r = vs.ub[i] - vt.ub[i];
> +        vs.ub[i] = (r < 0 ? -r : r);
> +    }
> +    return vs.d;
> +}
> +
> +uint64_t helper_biadd(uint64_t fs)
> +{
> +    unsigned i, fd;
> +
> +    for (i = fd = 0; i < 8; ++i) {
> +        fd += (fs >> (i * 8)) & 0xff;
> +    }
> +    return fd & 0xffff;
> +}
> +
> +uint64_t helper_pmovmskb(uint64_t fs)
> +{
> +    unsigned fd = 0;
> +
> +    fd |= ((fs >>  7) & 1) << 0;
> +    fd |= ((fs >> 15) & 1) << 1;
> +    fd |= ((fs >> 23) & 1) << 2;
> +    fd |= ((fs >> 31) & 1) << 3;
> +    fd |= ((fs >> 39) & 1) << 4;
> +    fd |= ((fs >> 47) & 1) << 5;
> +    fd |= ((fs >> 55) & 1) << 6;
> +    fd |= ((fs >> 63) & 1) << 7;
> +
> +    return fd & 0xff;
> +}
> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index 57454f0..ac941e6 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -446,6 +446,103 @@ enum {
>      OPC_BC2     = (0x08 << 21) | OPC_CP2,
>  };
>  
> +#define MASK_LMI(op)  (MASK_OP_MAJOR(op) | (op & (0x1F << 21)) | (op & 0x1F))
> +
> +enum {
> +    OPC_PADDSH  = (24 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDUSH = (25 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDH   = (26 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDW   = (27 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDSB  = (28 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDUSB = (29 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDB   = (30 << 21) | (0x00) | OPC_CP2,
> +    OPC_PADDD   = (31 << 21) | (0x00) | OPC_CP2,
> +
> +    OPC_PSUBSH  = (24 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBUSH = (25 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBH   = (26 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBW   = (27 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBSB  = (28 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBUSB = (29 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBB   = (30 << 21) | (0x01) | OPC_CP2,
> +    OPC_PSUBD   = (31 << 21) | (0x01) | OPC_CP2,
> +
> +    OPC_PSHUFH   = (24 << 21) | (0x02) | OPC_CP2,
> +    OPC_PACKSSWH = (25 << 21) | (0x02) | OPC_CP2,
> +    OPC_PACKSSHB = (26 << 21) | (0x02) | OPC_CP2,
> +    OPC_PACKUSHB = (27 << 21) | (0x02) | OPC_CP2,
> +    OPC_XOR_CP2  = (28 << 21) | (0x02) | OPC_CP2,
> +    OPC_NOR_CP2  = (29 << 21) | (0x02) | OPC_CP2,
> +    OPC_AND_CP2  = (30 << 21) | (0x02) | OPC_CP2,
> +    OPC_PANDN    = (31 << 21) | (0x02) | OPC_CP2,
> +
> +    OPC_PUNPCKLHW = (24 << 21) | (0x03) | OPC_CP2,
> +    OPC_PUNPCKHHW = (25 << 21) | (0x03) | OPC_CP2,
> +    OPC_PUNPCKLBH = (26 << 21) | (0x03) | OPC_CP2,
> +    OPC_PUNPCKHBH = (27 << 21) | (0x03) | OPC_CP2,
> +    OPC_PINSRH_0  = (28 << 21) | (0x03) | OPC_CP2,
> +    OPC_PINSRH_1  = (29 << 21) | (0x03) | OPC_CP2,
> +    OPC_PINSRH_2  = (30 << 21) | (0x03) | OPC_CP2,
> +    OPC_PINSRH_3  = (31 << 21) | (0x03) | OPC_CP2,
> +
> +    OPC_PAVGH   = (24 << 21) | (0x08) | OPC_CP2,
> +    OPC_PAVGB   = (25 << 21) | (0x08) | OPC_CP2,
> +    OPC_PMAXSH  = (26 << 21) | (0x08) | OPC_CP2,
> +    OPC_PMINSH  = (27 << 21) | (0x08) | OPC_CP2,
> +    OPC_PMAXUB  = (28 << 21) | (0x08) | OPC_CP2,
> +    OPC_PMINUB  = (29 << 21) | (0x08) | OPC_CP2,
> +
> +    OPC_PCMPEQW = (24 << 21) | (0x09) | OPC_CP2,
> +    OPC_PCMPGTW = (25 << 21) | (0x09) | OPC_CP2,
> +    OPC_PCMPEQH = (26 << 21) | (0x09) | OPC_CP2,
> +    OPC_PCMPGTH = (27 << 21) | (0x09) | OPC_CP2,
> +    OPC_PCMPEQB = (28 << 21) | (0x09) | OPC_CP2,
> +    OPC_PCMPGTB = (29 << 21) | (0x09) | OPC_CP2,
> +
> +    OPC_PSLLW   = (24 << 21) | (0x0A) | OPC_CP2,
> +    OPC_PSLLH   = (25 << 21) | (0x0A) | OPC_CP2,
> +    OPC_PMULLH  = (26 << 21) | (0x0A) | OPC_CP2,
> +    OPC_PMULHH  = (27 << 21) | (0x0A) | OPC_CP2,
> +    OPC_PMULUW  = (28 << 21) | (0x0A) | OPC_CP2,
> +    OPC_PMULHUH = (29 << 21) | (0x0A) | OPC_CP2,
> +
> +    OPC_PSRLW     = (24 << 21) | (0x0B) | OPC_CP2,
> +    OPC_PSRLH     = (25 << 21) | (0x0B) | OPC_CP2,
> +    OPC_PSRAW     = (26 << 21) | (0x0B) | OPC_CP2,
> +    OPC_PSRAH     = (27 << 21) | (0x0B) | OPC_CP2,
> +    OPC_PUNPCKLWD = (28 << 21) | (0x0B) | OPC_CP2,
> +    OPC_PUNPCKHWD = (29 << 21) | (0x0B) | OPC_CP2,
> +
> +    OPC_ADDU_CP2 = (24 << 21) | (0x0C) | OPC_CP2,
> +    OPC_OR_CP2   = (25 << 21) | (0x0C) | OPC_CP2,
> +    OPC_ADD_CP2  = (26 << 21) | (0x0C) | OPC_CP2,
> +    OPC_DADD_CP2 = (27 << 21) | (0x0C) | OPC_CP2,
> +    OPC_SEQU_CP2 = (28 << 21) | (0x0C) | OPC_CP2,
> +    OPC_SEQ_CP2  = (29 << 21) | (0x0C) | OPC_CP2,
> +
> +    OPC_SUBU_CP2 = (24 << 21) | (0x0D) | OPC_CP2,
> +    OPC_PASUBUB  = (25 << 21) | (0x0D) | OPC_CP2,
> +    OPC_SUB_CP2  = (26 << 21) | (0x0D) | OPC_CP2,
> +    OPC_DSUB_CP2 = (27 << 21) | (0x0D) | OPC_CP2,
> +    OPC_SLTU_CP2 = (28 << 21) | (0x0D) | OPC_CP2,
> +    OPC_SLT_CP2  = (29 << 21) | (0x0D) | OPC_CP2,
> +
> +    OPC_SLL_CP2  = (24 << 21) | (0x0E) | OPC_CP2,
> +    OPC_DSLL_CP2 = (25 << 21) | (0x0E) | OPC_CP2,
> +    OPC_PEXTRH   = (26 << 21) | (0x0E) | OPC_CP2,
> +    OPC_PMADDHW  = (27 << 21) | (0x0E) | OPC_CP2,
> +    OPC_SLEU_CP2 = (28 << 21) | (0x0E) | OPC_CP2,
> +    OPC_SLE_CP2  = (29 << 21) | (0x0E) | OPC_CP2,
> +
> +    OPC_SRL_CP2  = (24 << 21) | (0x0F) | OPC_CP2,
> +    OPC_DSRL_CP2 = (25 << 21) | (0x0F) | OPC_CP2,
> +    OPC_SRA_CP2  = (26 << 21) | (0x0F) | OPC_CP2,
> +    OPC_DSRA_CP2 = (27 << 21) | (0x0F) | OPC_CP2,
> +    OPC_BIADD    = (28 << 21) | (0x0F) | OPC_CP2,
> +    OPC_PMOVMSKB = (29 << 21) | (0x0F) | OPC_CP2,
> +};
> +
> +
>  #define MASK_CP3(op)       MASK_OP_MAJOR(op) | (op & 0x3F)
>  
>  enum {
> @@ -2424,8 +2521,8 @@ static void gen_cl (DisasContext *ctx, uint32_t opc,
>  }
>  
>  /* Godson integer instructions */
> -static void gen_loongson_integer (DisasContext *ctx, uint32_t opc,
> -                                int rd, int rs, int rt)
> +static void gen_loongson_integer(DisasContext *ctx, uint32_t opc,
> +                                 int rd, int rs, int rt)
>  {
>      const char *opn = "loongson";
>      TCGv t0, t1;
> @@ -2637,6 +2734,278 @@ static void gen_loongson_integer (DisasContext *ctx, uint32_t opc,
>      tcg_temp_free(t1);
>  }
>  
> +/* Loongson multimedia instructions */
> +static void gen_loongson_multimedia(DisasContext *ctx, int rd, int rs, int rt)
> +{
> +    const char *opn = "loongson_cp2";
> +    uint32_t opc, shift_max;
> +    TCGv_i64 t0, t1;
> +
> +    opc = MASK_LMI(ctx->opcode);
> +    switch (opc) {
> +    case OPC_ADD_CP2:
> +    case OPC_SUB_CP2:
> +    case OPC_DADD_CP2:
> +    case OPC_DSUB_CP2:
> +        t0 = tcg_temp_local_new_i64();
> +        t1 = tcg_temp_local_new_i64();
> +        break;
> +    default:
> +        t0 = tcg_temp_new_i64();
> +        t1 = tcg_temp_new_i64();
> +        break;
> +    }
> +
> +    gen_load_fpr64(ctx, t0, rs);
> +    gen_load_fpr64(ctx, t1, rt);
> +
> +#define LMI_HELPER(UP, LO) \
> +    case OPC_##UP: gen_helper_##LO(t0, t0, t1); opn = #LO; break
> +#define LMI_HELPER_1(UP, LO) \
> +    case OPC_##UP: gen_helper_##LO(t0, t0); opn = #LO; break
> +#define LMI_DIRECT(UP, LO, OP) \
> +    case OPC_##UP: tcg_gen_##OP##_i64(t0, t0, t1); opn = #LO; break
> +
> +    switch (opc) {
> +    LMI_HELPER(PADDSH, paddsh);
> +    LMI_HELPER(PADDUSH, paddush);
> +    LMI_HELPER(PADDH, paddh);
> +    LMI_HELPER(PADDW, paddw);
> +    LMI_HELPER(PADDSB, paddsb);
> +    LMI_HELPER(PADDUSB, paddusb);
> +    LMI_HELPER(PADDB, paddb);
> +
> +    LMI_HELPER(PSUBSH, psubsh);
> +    LMI_HELPER(PSUBUSH, psubush);
> +    LMI_HELPER(PSUBH, psubh);
> +    LMI_HELPER(PSUBW, psubw);
> +    LMI_HELPER(PSUBSB, psubsb);
> +    LMI_HELPER(PSUBUSB, psubusb);
> +    LMI_HELPER(PSUBB, psubb);
> +
> +    LMI_HELPER(PSHUFH, pshufh);
> +    LMI_HELPER(PACKSSWH, packsswh);
> +    LMI_HELPER(PACKSSHB, packsshb);
> +    LMI_HELPER(PACKUSHB, packushb);
> +
> +    LMI_HELPER(PUNPCKLHW, punpcklhw);
> +    LMI_HELPER(PUNPCKHHW, punpckhhw);
> +    LMI_HELPER(PUNPCKLBH, punpcklbh);
> +    LMI_HELPER(PUNPCKHBH, punpckhbh);
> +    LMI_HELPER(PUNPCKLWD, punpcklwd);
> +    LMI_HELPER(PUNPCKHWD, punpckhwd);
> +
> +    LMI_HELPER(PAVGH, pavgh);
> +    LMI_HELPER(PAVGB, pavgb);
> +    LMI_HELPER(PMAXSH, pmaxsh);
> +    LMI_HELPER(PMINSH, pminsh);
> +    LMI_HELPER(PMAXUB, pmaxub);
> +    LMI_HELPER(PMINUB, pminub);
> +
> +    LMI_HELPER(PCMPEQW, pcmpeqw);
> +    LMI_HELPER(PCMPGTW, pcmpgtw);
> +    LMI_HELPER(PCMPEQH, pcmpeqh);
> +    LMI_HELPER(PCMPGTH, pcmpgth);
> +    LMI_HELPER(PCMPEQB, pcmpeqb);
> +    LMI_HELPER(PCMPGTB, pcmpgtb);
> +
> +    LMI_HELPER(PSLLW, psllw);
> +    LMI_HELPER(PSLLH, psllh);
> +    LMI_HELPER(PSRLW, psrlw);
> +    LMI_HELPER(PSRLH, psrlh);
> +    LMI_HELPER(PSRAW, psraw);
> +    LMI_HELPER(PSRAH, psrah);
> +
> +    LMI_HELPER(PMULLH, pmullh);
> +    LMI_HELPER(PMULHH, pmulhh);
> +    LMI_HELPER(PMULHUH, pmulhuh);
> +    LMI_HELPER(PMADDHW, pmaddhw);
> +
> +    LMI_HELPER(PASUBUB, pasubub);
> +    LMI_HELPER_1(BIADD, biadd);
> +    LMI_HELPER_1(PMOVMSKB, pmovmskb);
> +
> +    LMI_DIRECT(PADDD, paddd, add);
> +    LMI_DIRECT(PSUBD, psubd, sub);
> +    LMI_DIRECT(XOR_CP2, xor, xor);
> +    LMI_DIRECT(NOR_CP2, nor, nor);
> +    LMI_DIRECT(AND_CP2, and, and);
> +    LMI_DIRECT(PANDN, pandn, andc);
> +    LMI_DIRECT(OR, or, or);
> +
> +    case OPC_PINSRH_0:
> +        tcg_gen_deposit_i64(t0, t0, t1, 0, 16);
> +        opn = "pinsrh_0";
> +        break;
> +    case OPC_PINSRH_1:
> +        tcg_gen_deposit_i64(t0, t0, t1, 16, 16);
> +        opn = "pinsrh_1";
> +        break;
> +    case OPC_PINSRH_2:
> +        tcg_gen_deposit_i64(t0, t0, t1, 32, 16);
> +        opn = "pinsrh_2";
> +        break;
> +    case OPC_PINSRH_3:
> +        tcg_gen_deposit_i64(t0, t0, t1, 48, 16);
> +        opn = "pinsrh_3";
> +        break;
> +
> +    case OPC_PEXTRH:
> +        tcg_gen_andi_i64(t1, t1, 3);
> +        tcg_gen_shli_i64(t1, t1, 4);
> +        tcg_gen_shr_i64(t0, t0, t1);
> +        tcg_gen_ext16u_i64(t0, t0);
> +        opn = "pextrh";
> +        break;
> +
> +    case OPC_ADDU_CP2:
> +        tcg_gen_add_i64(t0, t0, t1);
> +        tcg_gen_ext32s_i64(t0, t0);
> +        opn = "addu";
> +        break;
> +    case OPC_SUBU_CP2:
> +        tcg_gen_sub_i64(t0, t0, t1);
> +        tcg_gen_ext32s_i64(t0, t0);
> +        opn = "addu";
> +        break;
> +
> +    case OPC_SLL_CP2:
> +        opn = "sll";
> +        shift_max = 32;
> +        goto do_shift;
> +    case OPC_SRL_CP2:
> +        opn = "srl";
> +        shift_max = 32;
> +        goto do_shift;
> +    case OPC_SRA_CP2:
> +        opn = "sra";
> +        shift_max = 32;
> +        goto do_shift;
> +    case OPC_DSLL_CP2:
> +        opn = "dsll";
> +        shift_max = 64;
> +        goto do_shift;
> +    case OPC_DSRL_CP2:
> +        opn = "dsrl";
> +        shift_max = 64;
> +        goto do_shift;
> +    case OPC_DSRA_CP2:
> +        opn = "dsra";
> +        shift_max = 64;
> +        goto do_shift;
> +    do_shift:
> +        /* Make sure shift count isn't TCG undefined behaviour.  */
> +        tcg_gen_andi_i64(t1, t1, shift_max - 1);
> +
> +        switch (opc) {
> +        case OPC_SLL_CP2:
> +        case OPC_DSLL_CP2:
> +            tcg_gen_shl_i64(t0, t0, t1);
> +            break;
> +        case OPC_SRA_CP2:
> +        case OPC_DSRA_CP2:
> +            /* Since SRA is UndefinedResult without sign-extended inputs,
> +               we can treat SRA and DSRA the same.  */
> +            tcg_gen_sar_i64(t0, t0, t1);
> +            break;
> +        case OPC_SRL_CP2:
> +            /* We want to shift in zeros for SRL; zero-extend first.  */
> +            tcg_gen_ext32u_i64(t0, t0);
> +            /* FALLTHRU */
> +        case OPC_DSRL_CP2:
> +            tcg_gen_shr_i64(t0, t0, t1);
> +            break;
> +        }
> +
> +        if (shift_max == 32) {
> +            tcg_gen_ext32s_i64(t0, t0);
> +        }
> +
> +        /* Shifts larger than MAX produce zero.  */
> +        tcg_gen_setcondi_i64(TCG_COND_LTU, t1, t1, shift_max);
> +        tcg_gen_neg_i64(t1, t1);
> +        tcg_gen_and_i64(t0, t0, t1);
> +        break;
> +
> +    case OPC_ADD_CP2:
> +    case OPC_DADD_CP2:
> +        {
> +            TCGv_i64 t2 = tcg_temp_new_i64();
> +            int lab = gen_new_label();
> +
> +            tcg_gen_mov_i64(t2, t0);
> +            tcg_gen_add_i64(t0, t1, t2);
> +            if (opc == OPC_ADD_CP2) {
> +                tcg_gen_ext32s_i64(t0, t0);
> +            }
> +            tcg_gen_xor_i64(t1, t1, t2);
> +            tcg_gen_xor_i64(t2, t2, t0);
> +            tcg_gen_andc_i64(t1, t2, t1);
> +            tcg_temp_free_i64(t2);
> +            tcg_gen_brcondi_i64(TCG_COND_GE, t1, 0, lab);
> +            generate_exception(ctx, EXCP_OVERFLOW);
> +            gen_set_label(lab);
> +
> +            opn = (opc == OPC_ADD_CP2 ? "add" : "dadd");
> +            break;
> +        }
> +
> +    case OPC_SUB_CP2:
> +    case OPC_DSUB_CP2:
> +        {
> +            TCGv_i64 t2 = tcg_temp_new_i64();
> +            int lab = gen_new_label();
> +
> +            tcg_gen_mov_i64(t2, t0);
> +            tcg_gen_sub_i64(t0, t1, t2);
> +            if (opc == OPC_SUB_CP2) {
> +                tcg_gen_ext32s_i64(t0, t0);
> +            }
> +            tcg_gen_xor_i64(t1, t1, t2);
> +            tcg_gen_xor_i64(t2, t2, t0);
> +            tcg_gen_and_i64(t1, t1, t2);
> +            tcg_temp_free_i64(t2);
> +            tcg_gen_brcondi_i64(TCG_COND_GE, t1, 0, lab);
> +            generate_exception(ctx, EXCP_OVERFLOW);
> +            gen_set_label(lab);
> +
> +            opn = (opc == OPC_SUB_CP2 ? "sub" : "dsub");
> +            break;
> +        }
> +
> +    case OPC_PMULUW:
> +        tcg_gen_ext32u_i64(t0, t0);
> +        tcg_gen_ext32u_i64(t1, t1);
> +        tcg_gen_mul_i64(t0, t0, t1);
> +        opn = "pmuluw";
> +        break;
> +
> +    case OPC_SEQU_CP2:
> +    case OPC_SEQ_CP2:
> +    case OPC_SLTU_CP2:
> +    case OPC_SLT_CP2:
> +    case OPC_SLEU_CP2:
> +    case OPC_SLE_CP2:
> +        /* ??? Document is unclear: Set FCC[CC].  Does that mean the
> +           FD field is the CC field?  */
> +    default:
> +        MIPS_INVAL(opn);
> +        generate_exception(ctx, EXCP_RI);
> +        return;
> +    }
> +
> +#undef LMI_HELPER
> +#undef LMI_DIRECT
> +
> +    gen_store_fpr64(ctx, t0, rd);
> +
> +    (void)opn; /* avoid a compiler warning */
> +    MIPS_DEBUG("%s %s, %s, %s", opn,
> +               fregnames[rd], fregnames[rs], fregnames[rt]);
> +    tcg_temp_free_i64(t0);
> +    tcg_temp_free_i64(t1);
> +}
> +
>  /* Traps */
>  static void gen_trap (DisasContext *ctx, uint32_t opc,
>                        int rs, int rt, int16_t imm)
> @@ -12344,10 +12713,14 @@ static void decode_opc (CPUMIPSState *env, DisasContext *ctx, int *is_branch)
>      case OPC_LDC2:
>      case OPC_SWC2:
>      case OPC_SDC2:
> -    case OPC_CP2:
>          /* COP2: Not implemented. */
>          generate_exception_err(ctx, EXCP_CpU, 2);
>          break;
> +    case OPC_CP2:
> +        check_insn(env, ctx, INSN_LOONGSON2F);
> +        /* Note that these instructions use different fields.  */
> +        gen_loongson_multimedia(ctx, sa, rd, rt);
> +        break;
>  
>      case OPC_CP3:
>          if (env->CP0_Config1 & (1 << CP0C1_FP)) {
> -- 
> 1.7.11.4
> 

I haven't look all instructions in details, but it looks fine to me.
Would it be possible to repost this patch without the need to apply the
FPR TCG patches before, so it can be merged separately?

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-09-18 16:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-17 21:35 [Qemu-devel] [PATCH v2 0/7] target-mips improvements Richard Henderson
2012-09-17 21:35 ` [Qemu-devel] [PATCH 1/7] target-mips: Set opn in gen_ldst_multiple Richard Henderson
2012-09-18 16:38   ` Aurelien Jarno
2012-09-17 21:35 ` [Qemu-devel] [PATCH 2/7] target-mips: Fix MIPS_DEBUG Richard Henderson
2012-09-18 16:38   ` Aurelien Jarno
2012-09-17 21:35 ` [Qemu-devel] [PATCH 3/7] target-mips: Always evaluate debugging macro arguments Richard Henderson
2012-09-18 16:38   ` Aurelien Jarno
2012-09-17 21:35 ` [Qemu-devel] [PATCH 4/7] target-mips: Pass DisasContext to fpr32 load/store routines Richard Henderson
2012-09-18 16:39   ` Aurelien Jarno
2012-09-17 21:35 ` [Qemu-devel] [PATCH 5/7] target-mips: Use TCG registers for the FPU Richard Henderson
2012-09-18 16:39   ` Aurelien Jarno
2012-09-17 21:35 ` [Qemu-devel] [PATCH 6/7] target-mips: Add accessors for the two 32-bit halves of a 64-bit FPR Richard Henderson
2012-09-17 21:35 ` [Qemu-devel] [PATCH 7/7] target-mips: Implement Loongson Multimedia Instructions Richard Henderson
2012-09-18 16:39   ` Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).