qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets
@ 2014-01-26 19:24 Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 01/21] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
                   ` (21 more replies)
  0 siblings, 22 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

This patch series is kind of in two parts. The first 8 patches
are the "Neon second set" that's already been pretty much reviewed;
I'm resending them just because there were a few minor nits that
came up in the last round which have been fixed:
 * patch 6 added the missing SQDMULH/SQRDMULH unallocated/unimplemented
   case in the SIMD 3-same initial decode
 * patch 7 now uses the _i32 vector element accessors
 * patch 8 fixed a few codestyle nits
(RTH: these all seemed trivial enough that I've left your
reviewed-by tags in place).

Patches 9..21 are new, and fill in a number of gaps that bring
us up to (and past) parity with the SuSE tree for coverage:
 * more SIMD 3-same ops, including basically all the integer ones
 * the "scalar pairwise" instruction group
 * more SIMD scalar-3-same ops, including all the integer ones
 * some (but not all) of the 2-reg misc and scalar 2-reg misc groups
   (by the same rationale as with 3-same, we aim for "anything
   implemented in the SuSE tree" plus enough to make it reasonably
   likely we've got the general function structure correct)

thanks
-- PMM

Alex Bennée (2):
  target-arm: A64: Add SIMD shift by immediate
  target-arm: A64: Add 2-reg-misc REV* instructions

Peter Maydell (19):
  target-arm: A64: Add SIMD three-different multiply accumulate insns
  target-arm: A64: Add SIMD three-different ABDL instructions
  target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops
  target-arm: A64: Add top level decode for SIMD 3-same group
  target-arm: A64: Add logic ops from SIMD 3 same group
  target-arm: A64: Add integer ops from SIMD 3-same group
  target-arm: A64: Add simple SIMD 3-same floating point ops
  target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns
  target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same
    insns
  target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD
  tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR
  target-arm: A64: Implement scalar pairwise ops
  target-arm: A64: Implement remaining integer scalar-3-same insns
  target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc
  target-arm: A64: Add skeleton decode for SIMD 2-reg misc group
  target-arm: A64: Implement 2-register misc compares, ABS, NEG
  target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT
  target-arm: A64: Add narrowing 2-reg-misc instructions
  target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group

 target-arm/helper.h        |    1 +
 target-arm/neon_helper.c   |   12 +
 target-arm/translate-a64.c | 2341 ++++++++++++++++++++++++++++++++++++++++++--
 tcg/tcg.h                  |    3 +
 4 files changed, 2299 insertions(+), 58 deletions(-)

-- 
1.8.5

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 01/21] target-arm: A64: Add SIMD three-different multiply accumulate insns
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 02/21] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add support for the multiply-accumulate instructions from the
SIMD three-different instructions group (C3.6.15):
 * skeleton decode of unallocated encodings and split of
   the group into its three sub-parts
 * framework for handling the 64x64->128 widening subpart
 * implementation of the multiply-accumulate instructions
   SMLAL, SMLAL2, UMLAL, UMLAL2, SMLSL, SMLSL2, UMLSL, UMLSL2,
   UMULL, UMULL2, SMULL, SMULL2

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 233 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 232 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 7cfb55b..924a539 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -700,6 +700,9 @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size)
  * zero extend as we are filling a partial chunk of the vector register.
  * These functions don't support 128 bit loads/stores, which would be
  * normal load/store operations.
+ *
+ * The _i32 versions are useful when operating on 32 bit quantities
+ * (eg for floating point single or using Neon helper functions).
  */
 
 /* Get value of an element within a vector register */
@@ -735,6 +738,32 @@ static void read_vec_element(DisasContext *s, TCGv_i64 tcg_dest, int srcidx,
     }
 }
 
+static void read_vec_element_i32(DisasContext *s, TCGv_i32 tcg_dest, int srcidx,
+                                 int element, TCGMemOp memop)
+{
+    int vect_off = vec_reg_offset(srcidx, element, memop & MO_SIZE);
+    switch (memop) {
+    case MO_8:
+        tcg_gen_ld8u_i32(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_16:
+        tcg_gen_ld16u_i32(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_8|MO_SIGN:
+        tcg_gen_ld8s_i32(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_16|MO_SIGN:
+        tcg_gen_ld16s_i32(tcg_dest, cpu_env, vect_off);
+        break;
+    case MO_32:
+    case MO_32|MO_SIGN:
+        tcg_gen_ld_i32(tcg_dest, cpu_env, vect_off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 /* Set value of an element within a vector register */
 static void write_vec_element(DisasContext *s, TCGv_i64 tcg_src, int destidx,
                               int element, TCGMemOp memop)
@@ -5546,6 +5575,150 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
     unsupported_encoding(s, insn);
 }
 
+static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
+                                int opcode, int rd, int rn, int rm)
+{
+    /* 3-reg-different widening insns: 64 x 64 -> 128 */
+    TCGv_i64 tcg_res[2];
+    int pass, accop;
+
+    tcg_res[0] = tcg_temp_new_i64();
+    tcg_res[1] = tcg_temp_new_i64();
+
+    /* Does this op do an adding accumulate, a subtracting accumulate,
+     * or no accumulate at all?
+     */
+    switch (opcode) {
+    case 5:
+    case 8:
+    case 9:
+        accop = 1;
+        break;
+    case 10:
+    case 11:
+        accop = -1;
+        break;
+    default:
+        accop = 0;
+        break;
+    }
+
+    if (accop != 0) {
+        read_vec_element(s, tcg_res[0], rd, 0, MO_64);
+        read_vec_element(s, tcg_res[1], rd, 1, MO_64);
+    }
+
+    /* size == 2 means two 32x32->64 operations; this is worth special
+     * casing because we can generally handle it inline.
+     */
+    if (size == 2) {
+        for (pass = 0; pass < 2; pass++) {
+            TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+            TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+            TCGv_i64 tcg_passres;
+            TCGMemOp memop = MO_32 | (is_u ? 0 : MO_SIGN);
+
+            int elt = pass + is_q * 2;
+
+            read_vec_element(s, tcg_op1, rn, elt, memop);
+            read_vec_element(s, tcg_op2, rm, elt, memop);
+
+            if (accop == 0) {
+                tcg_passres = tcg_res[pass];
+            } else {
+                tcg_passres = tcg_temp_new_i64();
+            }
+
+            switch (opcode) {
+            case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+            case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+            case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
+                tcg_gen_mul_i64(tcg_passres, tcg_op1, tcg_op2);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (accop > 0) {
+                tcg_gen_add_i64(tcg_res[pass], tcg_res[pass], tcg_passres);
+                tcg_temp_free_i64(tcg_passres);
+            } else if (accop < 0) {
+                tcg_gen_sub_i64(tcg_res[pass], tcg_res[pass], tcg_passres);
+                tcg_temp_free_i64(tcg_passres);
+            }
+
+            tcg_temp_free_i64(tcg_op1);
+            tcg_temp_free_i64(tcg_op2);
+        }
+    } else {
+        /* size 0 or 1, generally helper functions */
+        for (pass = 0; pass < 2; pass++) {
+            TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+            TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+            TCGv_i64 tcg_passres;
+            int elt = pass + is_q * 2;
+
+            read_vec_element_i32(s, tcg_op1, rn, elt, MO_32);
+            read_vec_element_i32(s, tcg_op2, rm, elt, MO_32);
+
+            if (accop == 0) {
+                tcg_passres = tcg_res[pass];
+            } else {
+                tcg_passres = tcg_temp_new_i64();
+            }
+
+            switch (opcode) {
+            case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+            case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+            case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
+                if (size == 0) {
+                    if (is_u) {
+                        gen_helper_neon_mull_u8(tcg_passres, tcg_op1, tcg_op2);
+                    } else {
+                        gen_helper_neon_mull_s8(tcg_passres, tcg_op1, tcg_op2);
+                    }
+                } else {
+                    if (is_u) {
+                        gen_helper_neon_mull_u16(tcg_passres, tcg_op1, tcg_op2);
+                    } else {
+                        gen_helper_neon_mull_s16(tcg_passres, tcg_op1, tcg_op2);
+                    }
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
+            tcg_temp_free_i32(tcg_op1);
+            tcg_temp_free_i32(tcg_op2);
+
+            if (accop > 0) {
+                if (size == 0) {
+                    gen_helper_neon_addl_u16(tcg_res[pass], tcg_res[pass],
+                                             tcg_passres);
+                } else {
+                    gen_helper_neon_addl_u32(tcg_res[pass], tcg_res[pass],
+                                             tcg_passres);
+                }
+                tcg_temp_free_i64(tcg_passres);
+            } else if (accop < 0) {
+                if (size == 0) {
+                    gen_helper_neon_subl_u16(tcg_res[pass], tcg_res[pass],
+                                             tcg_passres);
+                } else {
+                    gen_helper_neon_subl_u32(tcg_res[pass], tcg_res[pass],
+                                             tcg_passres);
+                }
+                tcg_temp_free_i64(tcg_passres);
+            }
+        }
+    }
+
+    write_vec_element(s, tcg_res[0], rd, 0, MO_64);
+    write_vec_element(s, tcg_res[1], rd, 1, MO_64);
+    tcg_temp_free_i64(tcg_res[0]);
+    tcg_temp_free_i64(tcg_res[1]);
+}
+
 /* C3.6.15 AdvSIMD three different
  *   31  30  29 28       24 23  22  21 20  16 15    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -5554,7 +5727,65 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    /* Instructions in this group fall into three basic classes
+     * (in each case with the operation working on each element in
+     * the input vectors):
+     * (1) widening 64 x 64 -> 128 (with possibly Vd as an extra
+     *     128 bit input)
+     * (2) wide 64 x 128 -> 128
+     * (3) narrowing 128 x 128 -> 64
+     * Here we do initial decode, catch unallocated cases and
+     * dispatch to separate functions for each class.
+     */
+    int is_q = extract32(insn, 30, 1);
+    int is_u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int opcode = extract32(insn, 12, 4);
+    int rm = extract32(insn, 16, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+
+    switch (opcode) {
+    case 1: /* SADDW, SADDW2, UADDW, UADDW2 */
+    case 3: /* SSUBW, SSUBW2, USUBW, USUBW2 */
+        /* 64 x 128 -> 128 */
+        unsupported_encoding(s, insn);
+        break;
+    case 4: /* ADDHN, ADDHN2, RADDHN, RADDHN2 */
+    case 6: /* SUBHN, SUBHN2, RSUBHN, RSUBHN2 */
+        /* 128 x 128 -> 64 */
+        unsupported_encoding(s, insn);
+        break;
+    case 9:
+    case 11:
+    case 13:
+    case 14:
+        if (is_u) {
+            unallocated_encoding(s);
+            return;
+        }
+        /* fall through */
+    case 0:
+    case 2:
+    case 5:
+    case 7:
+        unsupported_encoding(s, insn);
+        break;
+    case 8:
+    case 10:
+    case 12:
+        /* 64 x 64 -> 128 */
+        if (size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        handle_3rd_widening(s, is_q, is_u, size, opcode, rd, rn, rm);
+        break;
+    default:
+        /* opcode 15 not allocated */
+        unallocated_encoding(s);
+        break;
+    }
 }
 
 /* C3.6.16 AdvSIMD three same
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 02/21] target-arm: A64: Add SIMD three-different ABDL instructions
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 01/21] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-28 18:59   ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 03/21] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the absolute-difference instructions in the SIMD
three-different group: SABAL, SABAL2, UABAL, UABAL2, SABDL,
SABDL2, UABDL, UABDL2.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 35 +++++++++++++++++++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 924a539..145125e 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5630,6 +5630,21 @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
             }
 
             switch (opcode) {
+            case 5: /* SABAL, SABAL2, UABAL, UABAL2 */
+            case 7: /* SABDL, SABDL2, UABDL, UABDL2 */
+            {
+                TCGv_i64 tcg_tmp1 = tcg_temp_new_i64();
+                TCGv_i64 tcg_tmp2 = tcg_temp_new_i64();
+
+                tcg_gen_sub_i64(tcg_tmp1, tcg_op1, tcg_op2);
+                tcg_gen_sub_i64(tcg_tmp2, tcg_op2, tcg_op1);
+                tcg_gen_movcond_i64(is_u ? TCG_COND_GEU : TCG_COND_GE,
+                                    tcg_passres,
+                                    tcg_op1, tcg_op2, tcg_tmp1, tcg_tmp2);
+                tcg_temp_free_i64(tcg_tmp1);
+                tcg_temp_free_i64(tcg_tmp2);
+                break;
+            }
             case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
             case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
             case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
@@ -5668,6 +5683,22 @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
             }
 
             switch (opcode) {
+            case 5: /* SABAL, SABAL2, UABAL, UABAL2 */
+            case 7: /* SABDL, SABDL2, UABDL, UABDL2 */
+                if (size == 0) {
+                    if (is_u) {
+                        gen_helper_neon_abdl_u16(tcg_passres, tcg_op1, tcg_op2);
+                    } else {
+                        gen_helper_neon_abdl_s16(tcg_passres, tcg_op1, tcg_op2);
+                    }
+                } else {
+                    if (is_u) {
+                        gen_helper_neon_abdl_u32(tcg_passres, tcg_op1, tcg_op2);
+                    } else {
+                        gen_helper_neon_abdl_s32(tcg_passres, tcg_op1, tcg_op2);
+                    }
+                }
+                break;
             case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
             case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
             case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
@@ -5767,10 +5798,10 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
         /* fall through */
     case 0:
     case 2:
-    case 5:
-    case 7:
         unsupported_encoding(s, insn);
         break;
+    case 5:
+    case 7:
     case 8:
     case 10:
     case 12:
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 03/21] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 01/21] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 02/21] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 04/21] target-arm: A64: Add top level decode for SIMD 3-same group Peter Maydell
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the add, sub and compare ops from the SIMD "scalar three same"
group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 131 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 130 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 145125e..6ff3e43 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5531,6 +5531,58 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
     unsupported_encoding(s, insn);
 }
 
+static void handle_3same_64(DisasContext *s, int opcode, bool u,
+                            TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm)
+{
+    /* Handle 64x64->64 opcodes which are shared between the scalar
+     * and vector 3-same groups. We cover every opcode where size == 3
+     * is valid in either the three-reg-same (integer, not pairwise)
+     * or scalar-three-reg-same groups. (Some opcodes are not yet
+     * implemented.)
+     */
+    TCGCond cond;
+
+    switch (opcode) {
+    case 0x6: /* CMGT, CMHI */
+        /* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
+         * We implement this using setcond (test) and then negating.
+         */
+        cond = u ? TCG_COND_GTU : TCG_COND_GT;
+    do_cmop:
+        tcg_gen_setcond_i64(cond, tcg_rd, tcg_rn, tcg_rm);
+        tcg_gen_neg_i64(tcg_rd, tcg_rd);
+        break;
+    case 0x7: /* CMGE, CMHS */
+        cond = u ? TCG_COND_GEU : TCG_COND_GE;
+        goto do_cmop;
+    case 0x11: /* CMTST, CMEQ */
+        if (u) {
+            cond = TCG_COND_EQ;
+            goto do_cmop;
+        }
+        /* CMTST : test is "if (X & Y != 0)". */
+        tcg_gen_and_i64(tcg_rd, tcg_rn, tcg_rm);
+        tcg_gen_setcondi_i64(TCG_COND_NE, tcg_rd, tcg_rd, 0);
+        tcg_gen_neg_i64(tcg_rd, tcg_rd);
+        break;
+    case 0x10: /* ADD, SUB */
+        if (u) {
+            tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
+        } else {
+            tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
+        }
+        break;
+    case 0x1: /* SQADD */
+    case 0x5: /* SQSUB */
+    case 0x8: /* SSHL, USHL */
+    case 0x9: /* SQSHL, UQSHL */
+    case 0xa: /* SRSHL, URSHL */
+    case 0xb: /* SQRSHL, UQRSHL */
+    default:
+        g_assert_not_reached();
+    }
+}
+
 /* C3.6.11 AdvSIMD scalar three same
  *  31 30  29 28       24 23  22  21 20  16 15    11  10 9    5 4    0
  * +-----+---+-----------+------+---+------+--------+---+------+------+
@@ -5539,7 +5591,84 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int opcode = extract32(insn, 11, 5);
+    int rm = extract32(insn, 16, 5);
+    int size = extract32(insn, 22, 2);
+    bool u = extract32(insn, 29, 1);
+    TCGv_i64 tcg_rn;
+    TCGv_i64 tcg_rm;
+    TCGv_i64 tcg_rd;
+
+    if (opcode >= 0x18) {
+        /* Floating point: U, size[1] and opcode indicate operation */
+        int fpopcode = opcode | (extract32(size, 1, 1) << 5) | (u << 6);
+        switch (fpopcode) {
+        case 0x1b: /* FMULX */
+        case 0x1c: /* FCMEQ */
+        case 0x1f: /* FRECPS */
+        case 0x3f: /* FRSQRTS */
+        case 0x5c: /* FCMGE */
+        case 0x5d: /* FACGE */
+        case 0x7a: /* FABD  */
+        case 0x7c: /* FCMGT */
+        case 0x7d: /* FACGT */
+            unsupported_encoding(s, insn);
+            return;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+    }
+
+    switch (opcode) {
+    case 0x1: /* SQADD, UQADD */
+    case 0x5: /* SQSUB, UQSUB */
+    case 0x8: /* SSHL, USHL */
+    case 0xa: /* SRSHL, URSHL */
+        unsupported_encoding(s, insn);
+        return;
+    case 0x6: /* CMGT, CMHI */
+    case 0x7: /* CMGE, CMHS */
+    case 0x11: /* CMTST, CMEQ */
+    case 0x10: /* ADD, SUB (vector) */
+        if (size != 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    case 0x9: /* SQSHL, UQSHL */
+    case 0xb: /* SQRSHL, UQRSHL */
+        unsupported_encoding(s, insn);
+        return;
+    case 0x16: /* SQDMULH, SQRDMULH (vector) */
+        if (size != 1 && size != 2) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+
+    tcg_rn = read_fp_dreg(s, rn);       /* op1 */
+    tcg_rm = read_fp_dreg(s, rm);       /* op2 */
+    tcg_rd = tcg_temp_new_i64();
+
+    /* For the moment we only support the opcodes which are
+     * 64-bit-width only. The size != 3 cases will
+     * be handled later when the relevant ops are implemented.
+     */
+    handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+
+    write_fp_dreg(s, rd, tcg_rd);
+
+    tcg_temp_free_i64(tcg_rn);
+    tcg_temp_free_i64(tcg_rm);
+    tcg_temp_free_i64(tcg_rd);
 }
 
 /* C3.6.12 AdvSIMD scalar two reg misc
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 04/21] target-arm: A64: Add top level decode for SIMD 3-same group
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (2 preceding siblings ...)
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 03/21] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 05/21] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add top level decode for the A64 SIMD three regs same group
(C3.6.16), splitting it into the pairwise, logical, float and
integer subgroups.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 6ff3e43..2079c96 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5948,6 +5948,30 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
     }
 }
 
+/* Logic op (opcode == 3) subgroup of C3.6.16. */
+static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
+/* Pairwise op subgroup of C3.6.16. */
+static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
+/* Floating point op subgroup of C3.6.16. */
+static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
+/* Integer op subgroup of C3.6.16. */
+static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
+{
+    unsupported_encoding(s, insn);
+}
+
 /* C3.6.16 AdvSIMD three same
  *  31  30  29  28       24 23  22  21 20  16 15    11  10 9    5 4    0
  * +---+---+---+-----------+------+---+------+--------+---+------+------+
@@ -5956,7 +5980,26 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int opcode = extract32(insn, 11, 5);
+
+    switch (opcode) {
+    case 0x3: /* logic ops */
+        disas_simd_3same_logic(s, insn);
+        break;
+    case 0x17: /* ADDP */
+    case 0x14: /* SMAXP, UMAXP */
+    case 0x15: /* SMINP, UMINP */
+        /* Pairwise operations */
+        disas_simd_3same_pair(s, insn);
+        break;
+    case 0x18 ... 0x31:
+        /* floating point ops, sz[1] and U are part of opcode */
+        disas_simd_3same_float(s, insn);
+        break;
+    default:
+        disas_simd_3same_int(s, insn);
+        break;
+    }
 }
 
 /* C3.6.17 AdvSIMD two reg misc
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 05/21] target-arm: A64: Add logic ops from SIMD 3 same group
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (3 preceding siblings ...)
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 04/21] target-arm: A64: Add top level decode for SIMD 3-same group Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 06/21] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add support for the logical operations (ORR, AND, BIC, ORN, EOR, BSL,
BIT and BIF) from the SIMD 3 register same group (C3.6.16).

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 73 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 2079c96..4767cbf 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5951,7 +5951,78 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
 /* Logic op (opcode == 3) subgroup of C3.6.16. */
 static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int rm = extract32(insn, 16, 5);
+    int size = extract32(insn, 22, 2);
+    bool is_u = extract32(insn, 29, 1);
+    bool is_q = extract32(insn, 30, 1);
+    TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+    TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+    TCGv_i64 tcg_res[2];
+    int pass;
+
+    tcg_res[0] = tcg_temp_new_i64();
+    tcg_res[1] = tcg_temp_new_i64();
+
+    for (pass = 0; pass < (is_q ? 2 : 1); pass++) {
+        read_vec_element(s, tcg_op1, rn, pass, MO_64);
+        read_vec_element(s, tcg_op2, rm, pass, MO_64);
+
+        if (!is_u) {
+            switch (size) {
+            case 0: /* AND */
+                tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2);
+                break;
+            case 1: /* BIC */
+                tcg_gen_andc_i64(tcg_res[pass], tcg_op1, tcg_op2);
+                break;
+            case 2: /* ORR */
+                tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2);
+                break;
+            case 3: /* ORN */
+                tcg_gen_orc_i64(tcg_res[pass], tcg_op1, tcg_op2);
+                break;
+            }
+        } else {
+            if (size != 0) {
+                /* B* ops need res loaded to operate on */
+                read_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+            }
+
+            switch (size) {
+            case 0: /* EOR */
+                tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2);
+                break;
+            case 1: /* BSL bitwise select */
+                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2);
+                tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+                tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1);
+                break;
+            case 2: /* BIT, bitwise insert if true */
+                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+                tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2);
+                tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
+                break;
+            case 3: /* BIF, bitwise insert if false */
+                tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+                tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2);
+                tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
+                break;
+            }
+        }
+    }
+
+    write_vec_element(s, tcg_res[0], rd, 0, MO_64);
+    if (!is_q) {
+        tcg_gen_movi_i64(tcg_res[1], 0);
+    }
+    write_vec_element(s, tcg_res[1], rd, 1, MO_64);
+
+    tcg_temp_free_i64(tcg_op1);
+    tcg_temp_free_i64(tcg_op2);
+    tcg_temp_free_i64(tcg_res[0]);
+    tcg_temp_free_i64(tcg_res[1]);
 }
 
 /* Pairwise op subgroup of C3.6.16. */
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 06/21] target-arm: A64: Add integer ops from SIMD 3-same group
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (4 preceding siblings ...)
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 05/21] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 07/21] target-arm: A64: Add simple SIMD 3-same floating point ops Peter Maydell
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add some of the integer operations in the SIMD 3-same group:
specifically, the comparisons, addition and subtraction.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 165 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 164 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 4767cbf..3934bce 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -72,6 +72,9 @@ typedef struct AArch64DecodeTable {
     AArch64DecodeFn *disas_fn;
 } AArch64DecodeTable;
 
+/* Function prototype for gen_ functions for calling Neon helpers */
+typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+
 /* initialize TCG globals.  */
 void a64_translate_init(void)
 {
@@ -787,6 +790,25 @@ static void write_vec_element(DisasContext *s, TCGv_i64 tcg_src, int destidx,
     }
 }
 
+static void write_vec_element_i32(DisasContext *s, TCGv_i32 tcg_src,
+                                  int destidx, int element, TCGMemOp memop)
+{
+    int vect_off = vec_reg_offset(destidx, element, memop & MO_SIZE);
+    switch (memop) {
+    case MO_8:
+        tcg_gen_st8_i32(tcg_src, cpu_env, vect_off);
+        break;
+    case MO_16:
+        tcg_gen_st16_i32(tcg_src, cpu_env, vect_off);
+        break;
+    case MO_32:
+        tcg_gen_st_i32(tcg_src, cpu_env, vect_off);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 /* Clear the high 64 bits of a 128 bit vector (in general non-quad
  * vector ops all need to do this).
  */
@@ -6040,7 +6062,148 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
 /* Integer op subgroup of C3.6.16. */
 static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int is_q = extract32(insn, 30, 1);
+    int u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int opcode = extract32(insn, 11, 5);
+    int rm = extract32(insn, 16, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+    int pass;
+
+    switch (opcode) {
+    case 0x13: /* MUL, PMUL */
+        if (u && size != 0) {
+            unallocated_encoding(s);
+            return;
+        }
+        /* fall through */
+    case 0x0: /* SHADD, UHADD */
+    case 0x2: /* SRHADD, URHADD */
+    case 0x4: /* SHSUB, UHSUB */
+    case 0xc: /* SMAX, UMAX */
+    case 0xd: /* SMIN, UMIN */
+    case 0xe: /* SABD, UABD */
+    case 0xf: /* SABA, UABA */
+    case 0x12: /* MLA, MLS */
+        if (size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    case 0x1: /* SQADD */
+    case 0x5: /* SQSUB */
+    case 0x8: /* SSHL, USHL */
+    case 0x9: /* SQSHL, UQSHL */
+    case 0xa: /* SRSHL, URSHL */
+    case 0xb: /* SQRSHL, UQRSHL */
+        if (size == 3 && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    case 0x16: /* SQDMULH, SQRDMULH */
+        if (size == 0 || size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    default:
+        if (size == 3 && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    }
+
+    if (size == 3) {
+        for (pass = 0; pass < (is_q ? 2 : 1); pass++) {
+            TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+            TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+            TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+            read_vec_element(s, tcg_op1, rn, pass, MO_64);
+            read_vec_element(s, tcg_op2, rm, pass, MO_64);
+
+            handle_3same_64(s, opcode, u, tcg_res, tcg_op1, tcg_op2);
+
+            write_vec_element(s, tcg_res, rd, pass, MO_64);
+
+            tcg_temp_free_i64(tcg_res);
+            tcg_temp_free_i64(tcg_op1);
+            tcg_temp_free_i64(tcg_op2);
+        }
+    } else {
+        for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+            TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+            TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+            TCGv_i32 tcg_res = tcg_temp_new_i32();
+            NeonGenTwoOpFn *genfn;
+
+            read_vec_element_i32(s, tcg_op1, rn, pass, MO_32);
+            read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
+
+            switch (opcode) {
+            case 0x6: /* CMGT, CMHI */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_u8 },
+                    { gen_helper_neon_cgt_s16, gen_helper_neon_cgt_u16 },
+                    { gen_helper_neon_cgt_s32, gen_helper_neon_cgt_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0x7: /* CMGE, CMHS */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_cge_s8, gen_helper_neon_cge_u8 },
+                    { gen_helper_neon_cge_s16, gen_helper_neon_cge_u16 },
+                    { gen_helper_neon_cge_s32, gen_helper_neon_cge_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0x10: /* ADD, SUB */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 },
+                    { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
+                    { tcg_gen_add_i32, tcg_gen_sub_i32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0x11: /* CMTST, CMEQ */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_tst_u8, gen_helper_neon_ceq_u8 },
+                    { gen_helper_neon_tst_u16, gen_helper_neon_ceq_u16 },
+                    { gen_helper_neon_tst_u32, gen_helper_neon_ceq_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            default:
+                g_assert_not_reached();
+            }
+
+            genfn(tcg_res, tcg_op1, tcg_op2);
+
+            write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+
+            tcg_temp_free_i32(tcg_res);
+            tcg_temp_free_i32(tcg_op1);
+            tcg_temp_free_i32(tcg_op2);
+        }
+    }
+
+    if (!is_q) {
+        clear_vec_high(s, rd);
+    }
 }
 
 /* C3.6.16 AdvSIMD three same
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 07/21] target-arm: A64: Add simple SIMD 3-same floating point ops
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (5 preceding siblings ...)
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 06/21] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 08/21] target-arm: A64: Add SIMD shift by immediate Peter Maydell
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement a simple subset of the SIMD 3-same floating point
operations. This includes a common helper function used for both
scalar and vector ops; FABD is the only currently implemented
shared op.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 190 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 188 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 3934bce..966e502 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5605,6 +5605,131 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
     }
 }
 
+/* Handle the 3-same-operands float operations; shared by the scalar
+ * and vector encodings. The caller must filter out any encodings
+ * not allocated for the encoding it is dealing with.
+ */
+static void handle_3same_float(DisasContext *s, int size, int elements,
+                               int fpopcode, int rd, int rn, int rm)
+{
+    int pass;
+    TCGv_ptr fpst = get_fpstatus_ptr();
+
+    for (pass = 0; pass < elements; pass++) {
+        if (size) {
+            /* Double */
+            TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+            TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+            TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+            read_vec_element(s, tcg_op1, rn, pass, MO_64);
+            read_vec_element(s, tcg_op2, rm, pass, MO_64);
+
+            switch (fpopcode) {
+            case 0x18: /* FMAXNM */
+                gen_helper_vfp_maxnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x1a: /* FADD */
+                gen_helper_vfp_addd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x1e: /* FMAX */
+                gen_helper_vfp_maxd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x38: /* FMINNM */
+                gen_helper_vfp_minnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x3a: /* FSUB */
+                gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x3e: /* FMIN */
+                gen_helper_vfp_mind(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5b: /* FMUL */
+                gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5f: /* FDIV */
+                gen_helper_vfp_divd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x7a: /* FABD */
+                gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
+                gen_helper_vfp_absd(tcg_res, tcg_res);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            write_vec_element(s, tcg_res, rd, pass, MO_64);
+
+            tcg_temp_free_i64(tcg_res);
+            tcg_temp_free_i64(tcg_op1);
+            tcg_temp_free_i64(tcg_op2);
+        } else {
+            /* Single */
+            TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+            TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+            TCGv_i32 tcg_res = tcg_temp_new_i32();
+
+            read_vec_element_i32(s, tcg_op1, rn, pass, MO_32);
+            read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
+
+            switch (fpopcode) {
+            case 0x1a: /* FADD */
+                gen_helper_vfp_adds(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x1e: /* FMAX */
+                gen_helper_vfp_maxs(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x18: /* FMAXNM */
+                gen_helper_vfp_maxnums(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x38: /* FMINNM */
+                gen_helper_vfp_minnums(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x3a: /* FSUB */
+                gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x3e: /* FMIN */
+                gen_helper_vfp_mins(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5b: /* FMUL */
+                gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5f: /* FDIV */
+                gen_helper_vfp_divs(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x7a: /* FABD */
+                gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
+                gen_helper_vfp_abss(tcg_res, tcg_res);
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            if (elements == 1) {
+                /* scalar single so clear high part */
+                TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+                tcg_gen_extu_i32_i64(tcg_tmp, tcg_res);
+                write_vec_element(s, tcg_tmp, rd, pass, MO_64);
+                tcg_temp_free_i64(tcg_tmp);
+            } else {
+                write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+            }
+
+            tcg_temp_free_i32(tcg_res);
+            tcg_temp_free_i32(tcg_op1);
+            tcg_temp_free_i32(tcg_op2);
+        }
+    }
+
+    tcg_temp_free_ptr(fpst);
+
+    if ((elements << size) < 4) {
+        /* scalar, or non-quad vector op */
+        clear_vec_high(s, rd);
+    }
+}
+
 /* C3.6.11 AdvSIMD scalar three same
  *  31 30  29 28       24 23  22  21 20  16 15    11  10 9    5 4    0
  * +-----+---+-----------+------+---+------+--------+---+------+------+
@@ -5633,15 +5758,19 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
         case 0x3f: /* FRSQRTS */
         case 0x5c: /* FCMGE */
         case 0x5d: /* FACGE */
-        case 0x7a: /* FABD  */
         case 0x7c: /* FCMGT */
         case 0x7d: /* FACGT */
             unsupported_encoding(s, insn);
             return;
+        case 0x7a: /* FABD */
+            break;
         default:
             unallocated_encoding(s);
             return;
         }
+
+        handle_3same_float(s, extract32(size, 0, 1), 1, fpopcode, rd, rn, rm);
+        return;
     }
 
     switch (opcode) {
@@ -6056,7 +6185,64 @@ static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
 /* Floating point op subgroup of C3.6.16. */
 static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    /* For floating point ops, the U, size[1] and opcode bits
+     * together indicate the operation. size[0] indicates single
+     * or double.
+     */
+    int fpopcode = extract32(insn, 11, 5)
+        | (extract32(insn, 23, 1) << 5)
+        | (extract32(insn, 29, 1) << 6);
+    int is_q = extract32(insn, 30, 1);
+    int size = extract32(insn, 22, 1);
+    int rm = extract32(insn, 16, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+
+    int datasize = is_q ? 128 : 64;
+    int esize = 32 << size;
+    int elements = datasize / esize;
+
+    if (size == 1 && !is_q) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (fpopcode) {
+    case 0x58: /* FMAXNMP */
+    case 0x5a: /* FADDP */
+    case 0x5e: /* FMAXP */
+    case 0x78: /* FMINNMP */
+    case 0x7e: /* FMINP */
+        /* pairwise ops */
+        unsupported_encoding(s, insn);
+        return;
+    case 0x1b: /* FMULX */
+    case 0x1c: /* FCMEQ */
+    case 0x1f: /* FRECPS */
+    case 0x3f: /* FRSQRTS */
+    case 0x5c: /* FCMGE */
+    case 0x5d: /* FACGE */
+    case 0x7c: /* FCMGT */
+    case 0x7d: /* FACGT */
+    case 0x19: /* FMLA */
+    case 0x39: /* FMLS */
+        unsupported_encoding(s, insn);
+        return;
+    case 0x18: /* FMAXNM */
+    case 0x1a: /* FADD */
+    case 0x1e: /* FMAX */
+    case 0x38: /* FMINNM */
+    case 0x3a: /* FSUB */
+    case 0x3e: /* FMIN */
+    case 0x5b: /* FMUL */
+    case 0x5f: /* FDIV */
+    case 0x7a: /* FABD */
+        handle_3same_float(s, size, elements, fpopcode, rd, rn, rm);
+        return;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
 }
 
 /* Integer op subgroup of C3.6.16. */
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 08/21] target-arm: A64: Add SIMD shift by immediate
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (6 preceding siblings ...)
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 07/21] target-arm: A64: Add simple SIMD 3-same floating point ops Peter Maydell
@ 2014-01-26 19:24 ` Peter Maydell
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 09/21] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:24 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

From: Alex Bennée <alex.bennee@linaro.org>

This implements a subset of the AdvSIMD shift operations (namely all the
none saturating or narrowing ones). The actual shift generation code
itself is common for both the scalar and vector cases but wrapped with
either vector element iteration or the fp reg access.

The rounding operations need to take special care to correctly reflect
the result of adding rounding bits on high bits as the intermediates do
not truncate.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate-a64.c | 375 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 373 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 966e502..6cdb8fc 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5531,15 +5531,216 @@ static void disas_simd_scalar_pairwise(DisasContext *s, uint32_t insn)
     unsupported_encoding(s, insn);
 }
 
+/*
+ * Common SSHR[RA]/USHR[RA] - Shift right (optional rounding/accumulate)
+ *
+ * This code is handles the common shifting code and is used by both
+ * the vector and scalar code.
+ */
+static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
+                                    TCGv_i64 tcg_rnd, bool accumulate,
+                                    bool is_u, int size, int shift)
+{
+    bool extended_result = false;
+    bool round = !TCGV_IS_UNUSED_I64(tcg_rnd);
+    int ext_lshift = 0;
+    TCGv_i64 tcg_src_hi;
+
+    if (round && size == 3) {
+        extended_result = true;
+        ext_lshift = 64 - shift;
+        tcg_src_hi = tcg_temp_new_i64();
+    } else if (shift == 64) {
+        if (!accumulate && is_u) {
+            /* result is zero */
+            tcg_gen_movi_i64(tcg_res, 0);
+            return;
+        }
+    }
+
+    /* Deal with the rounding step */
+    if (round) {
+        if (extended_result) {
+            TCGv_i64 tcg_zero = tcg_const_i64(0);
+            if (!is_u) {
+                /* take care of sign extending tcg_res */
+                tcg_gen_sari_i64(tcg_src_hi, tcg_src, 63);
+                tcg_gen_add2_i64(tcg_src, tcg_src_hi,
+                                 tcg_src, tcg_src_hi,
+                                 tcg_rnd, tcg_zero);
+            } else {
+                tcg_gen_add2_i64(tcg_src, tcg_src_hi,
+                                 tcg_src, tcg_zero,
+                                 tcg_rnd, tcg_zero);
+            }
+            tcg_temp_free_i64(tcg_zero);
+        } else {
+            tcg_gen_add_i64(tcg_src, tcg_src, tcg_rnd);
+        }
+    }
+
+    /* Now do the shift right */
+    if (round && extended_result) {
+        /* extended case, >64 bit precision required */
+        if (ext_lshift == 0) {
+            /* special case, only high bits matter */
+            tcg_gen_mov_i64(tcg_src, tcg_src_hi);
+        } else {
+            tcg_gen_shri_i64(tcg_src, tcg_src, shift);
+            tcg_gen_shli_i64(tcg_src_hi, tcg_src_hi, ext_lshift);
+            tcg_gen_or_i64(tcg_src, tcg_src, tcg_src_hi);
+        }
+    } else {
+        if (is_u) {
+            if (shift == 64) {
+                /* essentially shifting in 64 zeros */
+                tcg_gen_movi_i64(tcg_src, 0);
+            } else {
+                tcg_gen_shri_i64(tcg_src, tcg_src, shift);
+            }
+        } else {
+            if (shift == 64) {
+                /* effectively extending the sign-bit */
+                tcg_gen_sari_i64(tcg_src, tcg_src, 63);
+            } else {
+                tcg_gen_sari_i64(tcg_src, tcg_src, shift);
+            }
+        }
+    }
+
+    if (accumulate) {
+        tcg_gen_add_i64(tcg_res, tcg_res, tcg_src);
+    } else {
+        tcg_gen_mov_i64(tcg_res, tcg_src);
+    }
+
+    if (extended_result) {
+        tcg_temp_free_i64(tcg_src_hi);
+    }
+}
+
+/* Common SHL/SLI - Shift left with an optional insert */
+static void handle_shli_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
+                                 bool insert, int shift)
+{
+    if (insert) { /* SLI */
+        tcg_gen_deposit_i64(tcg_res, tcg_res, tcg_src, shift, 64 - shift);
+    } else { /* SHL */
+        tcg_gen_shli_i64(tcg_res, tcg_src, shift);
+    }
+}
+
+/* SSHR[RA]/USHR[RA] - Scalar shift right (optional rounding/accumulate) */
+static void handle_scalar_simd_shri(DisasContext *s,
+                                    bool is_u, int immh, int immb,
+                                    int opcode, int rn, int rd)
+{
+    const int size = 3;
+    int immhb = immh << 3 | immb;
+    int shift = 2 * (8 << size) - immhb;
+    bool accumulate = false;
+    bool round = false;
+    TCGv_i64 tcg_rn;
+    TCGv_i64 tcg_rd;
+    TCGv_i64 tcg_round;
+
+    if (!extract32(immh, 3, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (opcode) {
+    case 0x02: /* SSRA / USRA (accumulate) */
+        accumulate = true;
+        break;
+    case 0x04: /* SRSHR / URSHR (rounding) */
+        round = true;
+        break;
+    case 0x06: /* SRSRA / URSRA (accum + rounding) */
+        accumulate = round = true;
+        break;
+    }
+
+    if (round) {
+        uint64_t round_const = 1ULL << (shift - 1);
+        tcg_round = tcg_const_i64(round_const);
+    } else {
+        TCGV_UNUSED_I64(tcg_round);
+    }
+
+    tcg_rn = read_fp_dreg(s, rn);
+    tcg_rd = accumulate ? read_fp_dreg(s, rd) : tcg_temp_new_i64();
+
+    handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+                               accumulate, is_u, size, shift);
+
+    write_fp_dreg(s, rd, tcg_rd);
+
+    tcg_temp_free_i64(tcg_rn);
+    tcg_temp_free_i64(tcg_rd);
+    if (round) {
+        tcg_temp_free_i64(tcg_round);
+    }
+}
+
+/* SHL/SLI - Scalar shift left */
+static void handle_scalar_simd_shli(DisasContext *s, bool insert,
+                                    int immh, int immb, int opcode,
+                                    int rn, int rd)
+{
+    int size = 32 - clz32(immh) - 1;
+    int immhb = immh << 3 | immb;
+    int shift = immhb - (8 << size);
+    TCGv_i64 tcg_rn = new_tmp_a64(s);
+    TCGv_i64 tcg_rd = new_tmp_a64(s);
+
+    if (!extract32(immh, 3, 1)) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    tcg_rn = read_fp_dreg(s, rn);
+    tcg_rd = insert ? read_fp_dreg(s, rd) : tcg_temp_new_i64();
+
+    handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift);
+
+    write_fp_dreg(s, rd, tcg_rd);
+
+    tcg_temp_free_i64(tcg_rn);
+    tcg_temp_free_i64(tcg_rd);
+}
+
 /* C3.6.9 AdvSIMD scalar shift by immediate
  *  31 30  29 28         23 22  19 18  16 15    11  10 9    5 4    0
  * +-----+---+-------------+------+------+--------+---+------+------+
  * | 0 1 | U | 1 1 1 1 1 0 | immh | immb | opcode | 1 |  Rn  |  Rd  |
  * +-----+---+-------------+------+------+--------+---+------+------+
+ *
+ * This is the scalar version so it works on a fixed sized registers
  */
 static void disas_simd_scalar_shift_imm(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int opcode = extract32(insn, 11, 5);
+    int immb = extract32(insn, 16, 3);
+    int immh = extract32(insn, 19, 4);
+    bool is_u = extract32(insn, 29, 1);
+
+    switch (opcode) {
+    case 0x00: /* SSHR / USHR */
+    case 0x02: /* SSRA / USRA */
+    case 0x04: /* SRSHR / URSHR */
+    case 0x06: /* SRSRA / URSRA */
+        handle_scalar_simd_shri(s, is_u, immh, immb, opcode, rn, rd);
+        break;
+    case 0x0a: /* SHL / SLI */
+        handle_scalar_simd_shli(s, is_u, immh, immb, opcode, rn, rd);
+        break;
+    default:
+        unsupported_encoding(s, insn);
+        break;
+    }
 }
 
 /* C3.6.10 AdvSIMD scalar three different
@@ -5844,6 +6045,148 @@ static void disas_simd_scalar_indexed(DisasContext *s, uint32_t insn)
     unsupported_encoding(s, insn);
 }
 
+/* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
+static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
+                                 int immh, int immb, int opcode, int rn, int rd)
+{
+    int size = 32 - clz32(immh) - 1;
+    int immhb = immh << 3 | immb;
+    int shift = 2 * (8 << size) - immhb;
+    bool accumulate = false;
+    bool round = false;
+    int dsize = is_q ? 128 : 64;
+    int esize = 8 << size;
+    int elements = dsize/esize;
+    TCGMemOp memop = size | (is_u ? 0 : MO_SIGN);
+    TCGv_i64 tcg_rn = new_tmp_a64(s);
+    TCGv_i64 tcg_rd = new_tmp_a64(s);
+    TCGv_i64 tcg_round;
+    int i;
+
+    if (extract32(immh, 3, 1) && !is_q) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    if (size > 3 && !is_q) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (opcode) {
+    case 0x02: /* SSRA / USRA (accumulate) */
+        accumulate = true;
+        break;
+    case 0x04: /* SRSHR / URSHR (rounding) */
+        round = true;
+        break;
+    case 0x06: /* SRSRA / URSRA (accum + rounding) */
+        accumulate = round = true;
+        break;
+    }
+
+    if (round) {
+        uint64_t round_const = 1ULL << (shift - 1);
+        tcg_round = tcg_const_i64(round_const);
+    } else {
+        TCGV_UNUSED_I64(tcg_round);
+    }
+
+    for (i = 0; i < elements; i++) {
+        read_vec_element(s, tcg_rn, rn, i, memop);
+        if (accumulate) {
+            read_vec_element(s, tcg_rd, rd, i, memop);
+        }
+
+        handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+                                accumulate, is_u, size, shift);
+
+        write_vec_element(s, tcg_rd, rd, i, size);
+    }
+
+    if (!is_q) {
+        clear_vec_high(s, rd);
+    }
+
+    if (round) {
+        tcg_temp_free_i64(tcg_round);
+    }
+}
+
+/* SHL/SLI - Vector shift left */
+static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
+                                int immh, int immb, int opcode, int rn, int rd)
+{
+    int size = 32 - clz32(immh) - 1;
+    int immhb = immh << 3 | immb;
+    int shift = immhb - (8 << size);
+    int dsize = is_q ? 128 : 64;
+    int esize = 8 << size;
+    int elements = dsize/esize;
+    TCGv_i64 tcg_rn = new_tmp_a64(s);
+    TCGv_i64 tcg_rd = new_tmp_a64(s);
+    int i;
+
+    if (extract32(immh, 3, 1) && !is_q) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    if (size > 3 && !is_q) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    for (i = 0; i < elements; i++) {
+        read_vec_element(s, tcg_rn, rn, i, size);
+        if (insert) {
+            read_vec_element(s, tcg_rd, rd, i, size);
+        }
+
+        handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift);
+
+        write_vec_element(s, tcg_rd, rd, i, size);
+    }
+
+    if (!is_q) {
+        clear_vec_high(s, rd);
+    }
+}
+
+/* USHLL/SHLL - Vector shift left with widening */
+static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
+                                 int immh, int immb, int opcode, int rn, int rd)
+{
+    int size = 32 - clz32(immh) - 1;
+    int immhb = immh << 3 | immb;
+    int shift = immhb - (8 << size);
+    int dsize = 64;
+    int esize = 8 << size;
+    int elements = dsize/esize;
+    TCGv_i64 tcg_rn = new_tmp_a64(s);
+    TCGv_i64 tcg_rd = new_tmp_a64(s);
+    int i;
+
+    if (size >= 3) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    /* For the LL variants the store is larger than the load,
+     * so if rd == rn we would overwrite parts of our input.
+     * So load everything right now and use shifts in the main loop.
+     */
+    read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64);
+
+    for (i = 0; i < elements; i++) {
+        tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize);
+        ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0);
+        tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
+        write_vec_element(s, tcg_rd, rd, i, size + 1);
+    }
+}
+
+
 /* C3.6.14 AdvSIMD shift by immediate
  *  31  30   29 28         23 22  19 18  16 15    11  10 9    5 4    0
  * +---+---+---+-------------+------+------+--------+---+------+------+
@@ -5852,7 +6195,35 @@ static void disas_simd_scalar_indexed(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int opcode = extract32(insn, 11, 5);
+    int immb = extract32(insn, 16, 3);
+    int immh = extract32(insn, 19, 4);
+    bool is_u = extract32(insn, 29, 1);
+    bool is_q = extract32(insn, 30, 1);
+
+    switch (opcode) {
+    case 0x00: /* SSHR / USHR */
+    case 0x02: /* SSRA / USRA (accumulate) */
+    case 0x04: /* SRSHR / URSHR (rounding) */
+    case 0x06: /* SRSRA / URSRA (accum + rounding) */
+        handle_vec_simd_shri(s, is_q, is_u, immh, immb, opcode, rn, rd);
+        break;
+    case 0x0a: /* SHL / SLI */
+        handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd);
+        break;
+    case 0x14: /* SSHLL / USHLL */
+        handle_vec_simd_wshli(s, is_q, is_u, immh, immb, opcode, rn, rd);
+        break;
+    default:
+        /* We don't currently implement any of the Narrow or saturating shifts;
+         * nor do we implement the fixed-point conversions in this
+         * encoding group (SCVTF, FCVTZS, UCVTF, FCVTZU).
+         */
+        unsupported_encoding(s, insn);
+        return;
+    }
 }
 
 static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 09/21] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (7 preceding siblings ...)
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 08/21] target-arm: A64: Add SIMD shift by immediate Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:05   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the SIMD 3-reg-same instructions SQADD, UQADD,
SQSUB, UQSUB, SSHL, USHL, SQSHl, UQSHL, SRSHL, URSHL,
SQRSHL, UQRSHL; these are all simple calls to existing
Neon helpers. We also enable SSHL, USHL, SRSHL and URSHL
for the 3-reg-same-scalar category (but not the others
because they can have non-size-64 operands and the
scalar_3reg_same function doesn't support that yet.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 134 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 112 insertions(+), 22 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 6cdb8fc..4a6886d 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -74,6 +74,7 @@ typedef struct AArch64DecodeTable {
 
 /* Function prototype for gen_ functions for calling Neon helpers */
 typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
 
 /* initialize TCG globals.  */
 void a64_translate_init(void)
@@ -5766,6 +5767,20 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
     TCGCond cond;
 
     switch (opcode) {
+    case 0x1: /* SQADD */
+        if (u) {
+            gen_helper_neon_qadd_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        } else {
+            gen_helper_neon_qadd_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        }
+        break;
+    case 0x5: /* SQSUB */
+        if (u) {
+            gen_helper_neon_qsub_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        } else {
+            gen_helper_neon_qsub_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        }
+        break;
     case 0x6: /* CMGT, CMHI */
         /* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
          * We implement this using setcond (test) and then negating.
@@ -5788,19 +5803,41 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
         tcg_gen_setcondi_i64(TCG_COND_NE, tcg_rd, tcg_rd, 0);
         tcg_gen_neg_i64(tcg_rd, tcg_rd);
         break;
-    case 0x10: /* ADD, SUB */
+    case 0x8: /* SSHL, USHL */
         if (u) {
-            tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
+            gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
         } else {
-            tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
+            gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
         }
         break;
-    case 0x1: /* SQADD */
-    case 0x5: /* SQSUB */
-    case 0x8: /* SSHL, USHL */
     case 0x9: /* SQSHL, UQSHL */
+        if (u) {
+            gen_helper_neon_qshl_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        } else {
+            gen_helper_neon_qshl_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        }
+        break;
     case 0xa: /* SRSHL, URSHL */
+        if (u) {
+            gen_helper_neon_rshl_u64(tcg_rd, tcg_rn, tcg_rm);
+        } else {
+            gen_helper_neon_rshl_s64(tcg_rd, tcg_rn, tcg_rm);
+        }
+        break;
     case 0xb: /* SQRSHL, UQRSHL */
+        if (u) {
+            gen_helper_neon_qrshl_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        } else {
+            gen_helper_neon_qrshl_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+        }
+        break;
+    case 0x10: /* ADD, SUB */
+        if (u) {
+            tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
+        } else {
+            tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
+        }
+        break;
     default:
         g_assert_not_reached();
     }
@@ -5977,10 +6014,10 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
     switch (opcode) {
     case 0x1: /* SQADD, UQADD */
     case 0x5: /* SQSUB, UQSUB */
-    case 0x8: /* SSHL, USHL */
-    case 0xa: /* SRSHL, URSHL */
         unsupported_encoding(s, insn);
         return;
+    case 0x8: /* SSHL, USHL */
+    case 0xa: /* SRSHL, URSHL */
     case 0x6: /* CMGT, CMHI */
     case 0x7: /* CMGE, CMHS */
     case 0x11: /* CMTST, CMEQ */
@@ -6649,18 +6686,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
         }
         unsupported_encoding(s, insn);
         return;
-    case 0x1: /* SQADD */
-    case 0x5: /* SQSUB */
-    case 0x8: /* SSHL, USHL */
-    case 0x9: /* SQSHL, UQSHL */
-    case 0xa: /* SRSHL, URSHL */
-    case 0xb: /* SQRSHL, UQRSHL */
-        if (size == 3 && !is_q) {
-            unallocated_encoding(s);
-            return;
-        }
-        unsupported_encoding(s, insn);
-        return;
     case 0x16: /* SQDMULH, SQRDMULH */
         if (size == 0 || size == 3) {
             unallocated_encoding(s);
@@ -6698,12 +6723,33 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             TCGv_i32 tcg_op1 = tcg_temp_new_i32();
             TCGv_i32 tcg_op2 = tcg_temp_new_i32();
             TCGv_i32 tcg_res = tcg_temp_new_i32();
-            NeonGenTwoOpFn *genfn;
+            NeonGenTwoOpFn *genfn = NULL;
+            NeonGenTwoOpEnvFn *genenvfn = NULL;
 
             read_vec_element_i32(s, tcg_op1, rn, pass, MO_32);
             read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
 
             switch (opcode) {
+            case 0x1: /* SQADD, UQADD */
+            {
+                static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                    { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 },
+                    { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 },
+                    { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 },
+                };
+                genenvfn = fns[size][u];
+                break;
+            }
+            case 0x5: /* SQSUB, UQSUB */
+            {
+                static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                    { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 },
+                    { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 },
+                    { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 },
+                };
+                genenvfn = fns[size][u];
+                break;
+            }
             case 0x6: /* CMGT, CMHI */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
@@ -6724,6 +6770,46 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
+            case 0x8: /* SSHL, USHL */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_shl_u8, gen_helper_neon_shl_s8 },
+                    { gen_helper_neon_shl_u16, gen_helper_neon_shl_s16 },
+                    { gen_helper_neon_shl_u32, gen_helper_neon_shl_s32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0x9: /* SQSHL, UQSHL */
+            {
+                static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                    { gen_helper_neon_qshl_u8, gen_helper_neon_qshl_s8 },
+                    { gen_helper_neon_qshl_u16, gen_helper_neon_qshl_s16 },
+                    { gen_helper_neon_qshl_u32, gen_helper_neon_qshl_s32 },
+                };
+                genenvfn = fns[size][u];
+                break;
+            }
+            case 0xa: /* SRSHL, URSHL */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_rshl_u8, gen_helper_neon_rshl_s8 },
+                    { gen_helper_neon_rshl_u16, gen_helper_neon_rshl_s16 },
+                    { gen_helper_neon_rshl_u32, gen_helper_neon_rshl_s32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0xb: /* SQRSHL, UQRSHL */
+            {
+                static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                    { gen_helper_neon_qrshl_u8, gen_helper_neon_qrshl_s8 },
+                    { gen_helper_neon_qrshl_u16, gen_helper_neon_qrshl_s16 },
+                    { gen_helper_neon_qrshl_u32, gen_helper_neon_qrshl_s32 },
+                };
+                genenvfn = fns[size][u];
+                break;
+            }
             case 0x10: /* ADD, SUB */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
@@ -6748,7 +6834,11 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 g_assert_not_reached();
             }
 
-            genfn(tcg_res, tcg_op1, tcg_op2);
+            if (genenvfn) {
+                genenvfn(tcg_res, cpu_env, tcg_op1, tcg_op2);
+            } else {
+                genfn(tcg_res, tcg_op1, tcg_op2);
+            }
 
             write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
 
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (8 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 09/21] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:13   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the SIMD 3-reg-same instructions where the size == 3 case
is reserved: SHADD, UHADD, SRHADD, URHADD, SHSUB, UHSUB, SMAX,
UMAX, SMIN, UMIN, SABD, UABD, SABA, UABA, MLA, MLS, MUL, PMUL,
SQRDMULH, SQDMULH. (None of these have scalar-3-same versions.)
This completes the non-pairwise integer instructions in this category.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 134 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 118 insertions(+), 16 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 4a6886d..515c72b 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6684,15 +6684,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
-        unsupported_encoding(s, insn);
-        return;
+        break;
     case 0x16: /* SQDMULH, SQRDMULH */
         if (size == 0 || size == 3) {
             unallocated_encoding(s);
             return;
         }
-        unsupported_encoding(s, insn);
-        return;
+        break;
     default:
         if (size == 3 && !is_q) {
             unallocated_encoding(s);
@@ -6730,6 +6728,16 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
 
             switch (opcode) {
+            case 0x0: /* SHADD, UHADD */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_hadd_s8, gen_helper_neon_hadd_u8 },
+                    { gen_helper_neon_hadd_s16, gen_helper_neon_hadd_u16 },
+                    { gen_helper_neon_hadd_s32, gen_helper_neon_hadd_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
             case 0x1: /* SQADD, UQADD */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -6740,6 +6748,26 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genenvfn = fns[size][u];
                 break;
             }
+            case 0x2: /* SRHADD, URHADD */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_rhadd_s8, gen_helper_neon_rhadd_u8 },
+                    { gen_helper_neon_rhadd_s16, gen_helper_neon_rhadd_u16 },
+                    { gen_helper_neon_rhadd_s32, gen_helper_neon_rhadd_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0x4: /* SHSUB, UHSUB */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_hsub_s8, gen_helper_neon_hsub_u8 },
+                    { gen_helper_neon_hsub_s16, gen_helper_neon_hsub_u16 },
+                    { gen_helper_neon_hsub_s32, gen_helper_neon_hsub_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
             case 0x5: /* SQSUB, UQSUB */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -6773,9 +6801,9 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             case 0x8: /* SSHL, USHL */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_shl_u8, gen_helper_neon_shl_s8 },
-                    { gen_helper_neon_shl_u16, gen_helper_neon_shl_s16 },
-                    { gen_helper_neon_shl_u32, gen_helper_neon_shl_s32 },
+                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
+                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
+                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
                 };
                 genfn = fns[size][u];
                 break;
@@ -6783,9 +6811,9 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             case 0x9: /* SQSHL, UQSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
-                    { gen_helper_neon_qshl_u8, gen_helper_neon_qshl_s8 },
-                    { gen_helper_neon_qshl_u16, gen_helper_neon_qshl_s16 },
-                    { gen_helper_neon_qshl_u32, gen_helper_neon_qshl_s32 },
+                    { gen_helper_neon_qshl_s8, gen_helper_neon_qshl_u8 },
+                    { gen_helper_neon_qshl_s16, gen_helper_neon_qshl_u16 },
+                    { gen_helper_neon_qshl_s32, gen_helper_neon_qshl_u32 },
                 };
                 genenvfn = fns[size][u];
                 break;
@@ -6793,9 +6821,9 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             case 0xa: /* SRSHL, URSHL */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
-                    { gen_helper_neon_rshl_u8, gen_helper_neon_rshl_s8 },
-                    { gen_helper_neon_rshl_u16, gen_helper_neon_rshl_s16 },
-                    { gen_helper_neon_rshl_u32, gen_helper_neon_rshl_s32 },
+                    { gen_helper_neon_rshl_s8, gen_helper_neon_rshl_u8 },
+                    { gen_helper_neon_rshl_s16, gen_helper_neon_rshl_u16 },
+                    { gen_helper_neon_rshl_s32, gen_helper_neon_rshl_u32 },
                 };
                 genfn = fns[size][u];
                 break;
@@ -6803,13 +6831,45 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
             case 0xb: /* SQRSHL, UQRSHL */
             {
                 static NeonGenTwoOpEnvFn * const fns[3][2] = {
-                    { gen_helper_neon_qrshl_u8, gen_helper_neon_qrshl_s8 },
-                    { gen_helper_neon_qrshl_u16, gen_helper_neon_qrshl_s16 },
-                    { gen_helper_neon_qrshl_u32, gen_helper_neon_qrshl_s32 },
+                    { gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
+                    { gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
+                    { gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
                 };
                 genenvfn = fns[size][u];
                 break;
             }
+            case 0xc: /* SMAX, UMAX */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_max_s8, gen_helper_neon_max_u8 },
+                    { gen_helper_neon_max_s16, gen_helper_neon_max_u16 },
+                    { gen_helper_neon_max_s32, gen_helper_neon_max_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+
+            case 0xd: /* SMIN, UMIN */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_min_s8, gen_helper_neon_min_u8 },
+                    { gen_helper_neon_min_s16, gen_helper_neon_min_u16 },
+                    { gen_helper_neon_min_s32, gen_helper_neon_min_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0xe: /* SABD, UABD */
+            case 0xf: /* SABA, UABA */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_abd_s8, gen_helper_neon_abd_u8 },
+                    { gen_helper_neon_abd_s16, gen_helper_neon_abd_u16 },
+                    { gen_helper_neon_abd_s32, gen_helper_neon_abd_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
             case 0x10: /* ADD, SUB */
             {
                 static NeonGenTwoOpFn * const fns[3][2] = {
@@ -6830,6 +6890,34 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
+            case 0x13: /* MUL, PMUL */
+                if (u) {
+                    /* PMUL */
+                    assert(size == 0);
+                    genfn = gen_helper_neon_mul_p8;
+                    break;
+                }
+                /* fall through : MUL */
+            case 0x12: /* MLA, MLS */
+            {
+                static NeonGenTwoOpFn * const fns[3] = {
+                    gen_helper_neon_mul_u8,
+                    gen_helper_neon_mul_u16,
+                    tcg_gen_mul_i32,
+                };
+                genfn = fns[size];
+                break;
+            }
+            case 0x16: /* SQDMULH, SQRDMULH */
+            {
+                static NeonGenTwoOpEnvFn * const fns[2][2] = {
+                    { gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 },
+                    { gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 },
+                };
+                assert(size == 1 || size == 2);
+                genenvfn = fns[size - 1][u];
+                break;
+            }
             default:
                 g_assert_not_reached();
             }
@@ -6840,6 +6928,20 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
                 genfn(tcg_res, tcg_op1, tcg_op2);
             }
 
+            if (opcode == 0xf || opcode == 0x12) {
+                /* SABA, UABA, MLA, MLS: accumulating ops */
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 },
+                    { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
+                    { tcg_gen_add_i32, tcg_gen_sub_i32 },
+                };
+                bool is_sub = (opcode == 0x12 && u); /* MLS */
+
+                genfn = fns[size][is_sub];
+                read_vec_element_i32(s, tcg_op1, rd, pass, MO_32);
+                genfn(tcg_res, tcg_res, tcg_op1);
+            }
+
             write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
 
             tcg_temp_free_i32(tcg_res);
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (9 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:21   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 12/21] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the pairwise integer operations in the 3-reg-same SIMD group:
ADDP, SMAXP, SMINP, UMAXP and UMINP.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 145 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 144 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 515c72b..81119a0 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6584,10 +6584,153 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
     tcg_temp_free_i64(tcg_res[1]);
 }
 
+/* Helper functions for pairwise 32 bit comparisons */
+static void gen_pmax_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+    tcg_gen_movcond_i32(TCG_COND_GE, res, op1, op2, op1, op2);
+}
+
+static void gen_pmax_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+    tcg_gen_movcond_i32(TCG_COND_GEU, res, op1, op2, op1, op2);
+}
+
+static void gen_pmin_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+    tcg_gen_movcond_i32(TCG_COND_LE, res, op1, op2, op1, op2);
+}
+
+static void gen_pmin_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+    tcg_gen_movcond_i32(TCG_COND_LEU, res, op1, op2, op1, op2);
+}
+
 /* Pairwise op subgroup of C3.6.16. */
 static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int is_q = extract32(insn, 30, 1);
+    int u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int opcode = extract32(insn, 11, 5);
+    int rm = extract32(insn, 16, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+    int pass;
+
+    if (size == 3 && !is_q) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (opcode) {
+    case 0x14: /* SMAXP, UMAXP */
+    case 0x15: /* SMINP, UMINP */
+        if (size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    case 0x17:
+        if (u) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    /* These operations work on the concatenated rm:rn, with each pair of
+     * adjacent elements being operated on to produce an element in the result.
+     */
+    if (size == 3) {
+        TCGv_i64 tcg_res[2];
+
+        for (pass = 0; pass < 2; pass++) {
+            TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+            TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+            int passreg = (pass == 0) ? rn : rm;
+
+            read_vec_element(s, tcg_op1, passreg, 0, MO_64);
+            read_vec_element(s, tcg_op2, passreg, 1, MO_64);
+            tcg_res[pass] = tcg_temp_new_i64();
+
+            /* The only 64 bit pairwise integer op is ADDP */
+            assert(opcode == 0x17);
+            tcg_gen_add_i64(tcg_res[pass], tcg_op1, tcg_op2);
+
+            tcg_temp_free_i64(tcg_op1);
+            tcg_temp_free_i64(tcg_op2);
+        }
+
+        for (pass = 0; pass < 2; pass++) {
+            write_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+            tcg_temp_free_i64(tcg_res[pass]);
+        }
+    } else {
+        int maxpass = is_q ? 4 : 2;
+        TCGv_i32 tcg_res[4];
+
+        for (pass = 0; pass < maxpass; pass++) {
+            TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+            TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+            NeonGenTwoOpFn *genfn;
+            int passreg = pass < (maxpass / 2) ? rn : rm;
+            int passelt = (is_q && (pass & 1)) ? 2 : 0;
+
+            read_vec_element_i32(s, tcg_op1, passreg, passelt, MO_32);
+            read_vec_element_i32(s, tcg_op2, passreg, passelt + 1, MO_32);
+            tcg_res[pass] = tcg_temp_new_i32();
+
+            switch (opcode) {
+            case 0x17: /* ADDP */
+            {
+                static NeonGenTwoOpFn * const fns[3] = {
+                    gen_helper_neon_padd_u8,
+                    gen_helper_neon_padd_u16,
+                    tcg_gen_add_i32,
+                };
+                genfn = fns[size];
+                break;
+            }
+            case 0x14: /* SMAXP, UMAXP */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_pmax_s8, gen_helper_neon_pmax_u8 },
+                    { gen_helper_neon_pmax_s16, gen_helper_neon_pmax_u16 },
+                    { gen_pmax_s32, gen_pmax_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            case 0x15: /* SMINP, UMINP */
+            {
+                static NeonGenTwoOpFn * const fns[3][2] = {
+                    { gen_helper_neon_pmin_s8, gen_helper_neon_pmin_u8 },
+                    { gen_helper_neon_pmin_s16, gen_helper_neon_pmin_u16 },
+                    { gen_pmin_s32, gen_pmin_u32 },
+                };
+                genfn = fns[size][u];
+                break;
+            }
+            default:
+                g_assert_not_reached();
+            }
+
+            genfn(tcg_res[pass], tcg_op1, tcg_op2);
+
+            tcg_temp_free_i32(tcg_op1);
+            tcg_temp_free_i32(tcg_op2);
+        }
+
+        for (pass = 0; pass < maxpass; pass++) {
+            write_vec_element_i32(s, tcg_res[pass], rd, pass, MO_32);
+            tcg_temp_free_i32(tcg_res[pass]);
+        }
+        if (!is_q) {
+            clear_vec_high(s, rd);
+        }
+    }
 }
 
 /* Floating point op subgroup of C3.6.16. */
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 12/21] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (10 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:23   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 13/21] target-arm: A64: Implement scalar pairwise ops Peter Maydell
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

We have macros for marking TCGv values as unused, checking if they
are unused and comparing them to each other. However these only exist
for TCGv_i32 and TCGv_i64; add them for TCGv_ptr as well.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 tcg/tcg.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index c72af6c..f7efcb4 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -324,13 +324,16 @@ typedef int TCGv_i64;
 
 #define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b))
 #define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b))
+#define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b))
 
 /* Dummy definition to avoid compiler warnings.  */
 #define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1)
 #define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1)
+#define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1)
 
 #define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1)
 #define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1)
+#define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1)
 
 /* call flags */
 /* Helper does not read globals (either directly or through an exception). It
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 13/21] target-arm: A64: Implement scalar pairwise ops
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (11 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 12/21] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:28   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 14/21] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the instructions in the scalar pairwise group (C3.6.8).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 113 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 81119a0..881e9eb 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5529,7 +5529,119 @@ static void disas_simd_scalar_copy(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_scalar_pairwise(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int opcode = extract32(insn, 12, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+    TCGv_ptr fpst;
+
+    /* For some ops (the FP ones), size[1] is part of the encoding.
+     * For ADDP strictly it is not but size[1] is always 1 for valid
+     * encodings.
+     */
+    opcode |= (extract32(size, 1, 1) << 5);
+
+    switch (opcode) {
+    case 0x3b: /* ADDP */
+        if (u || size != 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        TCGV_UNUSED_PTR(fpst);
+        break;
+    case 0xc: /* FMAXNMP */
+    case 0xd: /* FADDP */
+    case 0xf: /* FMAXP */
+    case 0x2c: /* FMINNMP */
+    case 0x2f: /* FMINP */
+        /* FP op, size[0] is 32 or 64 bit */
+        if (!u) {
+            unallocated_encoding(s);
+            return;
+        }
+        size = extract32(size, 0, 1) ? 3 : 2;
+        fpst = get_fpstatus_ptr();
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+
+    if (size == 3) {
+        TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+        TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+        TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+        read_vec_element(s, tcg_op1, rn, 0, MO_64);
+        read_vec_element(s, tcg_op2, rn, 1, MO_64);
+
+        switch (opcode) {
+        case 0x3b: /* ADDP */
+            tcg_gen_add_i64(tcg_res, tcg_op1, tcg_op2);
+            break;
+        case 0xc: /* FMAXNMP */
+            gen_helper_vfp_maxnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0xd: /* FADDP */
+            gen_helper_vfp_addd(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0xf: /* FMAXP */
+            gen_helper_vfp_maxd(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0x2c: /* FMINNMP */
+            gen_helper_vfp_minnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0x2f: /* FMINP */
+            gen_helper_vfp_mind(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        write_fp_dreg(s, rd, tcg_res);
+
+        tcg_temp_free_i64(tcg_op1);
+        tcg_temp_free_i64(tcg_op2);
+        tcg_temp_free_i64(tcg_res);
+    } else {
+        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+        TCGv_i32 tcg_res = tcg_temp_new_i32();
+
+        read_vec_element_i32(s, tcg_op1, rn, 0, MO_32);
+        read_vec_element_i32(s, tcg_op2, rn, 1, MO_32);
+
+        switch (opcode) {
+        case 0xc: /* FMAXNMP */
+            gen_helper_vfp_maxnums(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0xd: /* FADDP */
+            gen_helper_vfp_adds(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0xf: /* FMAXP */
+            gen_helper_vfp_maxs(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0x2c: /* FMINNMP */
+            gen_helper_vfp_minnums(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        case 0x2f: /* FMINP */
+            gen_helper_vfp_mins(tcg_res, tcg_op1, tcg_op2, fpst);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        write_fp_sreg(s, rd, tcg_res);
+
+        tcg_temp_free_i32(tcg_op1);
+        tcg_temp_free_i32(tcg_op2);
+        tcg_temp_free_i32(tcg_res);
+    }
+
+    if (!TCGV_IS_UNUSED_PTR(fpst)) {
+        tcg_temp_free_ptr(fpst);
+    }
 }
 
 /*
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 14/21] target-arm: A64: Implement remaining integer scalar-3-same insns
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (12 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 13/21] target-arm: A64: Implement scalar pairwise ops Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 21:01   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 15/21] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the remaining integer instructions in the scalar-three-reg-same
group: SQADD, UQADD, SQSUB, UQSUB, SQSHL, UQSHL, SQRSHL, UQRSHL,
SQDMULH, SQRDMULH.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 106 +++++++++++++++++++++++++++++++++++++--------
 1 file changed, 87 insertions(+), 19 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 881e9eb..8a2f311 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6094,8 +6094,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
     int rm = extract32(insn, 16, 5);
     int size = extract32(insn, 22, 2);
     bool u = extract32(insn, 29, 1);
-    TCGv_i64 tcg_rn;
-    TCGv_i64 tcg_rm;
     TCGv_i64 tcg_rd;
 
     if (opcode >= 0x18) {
@@ -6126,8 +6124,9 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
     switch (opcode) {
     case 0x1: /* SQADD, UQADD */
     case 0x5: /* SQSUB, UQSUB */
-        unsupported_encoding(s, insn);
-        return;
+    case 0x9: /* SQSHL, UQSHL */
+    case 0xb: /* SQRSHL, UQRSHL */
+        break;
     case 0x8: /* SSHL, USHL */
     case 0xa: /* SRSHL, URSHL */
     case 0x6: /* CMGT, CMHI */
@@ -6139,36 +6138,105 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
             return;
         }
         break;
-    case 0x9: /* SQSHL, UQSHL */
-    case 0xb: /* SQRSHL, UQRSHL */
-        unsupported_encoding(s, insn);
-        return;
     case 0x16: /* SQDMULH, SQRDMULH (vector) */
         if (size != 1 && size != 2) {
             unallocated_encoding(s);
             return;
         }
-        unsupported_encoding(s, insn);
-        return;
+        break;
     default:
         unallocated_encoding(s);
         return;
     }
 
-    tcg_rn = read_fp_dreg(s, rn);       /* op1 */
-    tcg_rm = read_fp_dreg(s, rm);       /* op2 */
     tcg_rd = tcg_temp_new_i64();
 
-    /* For the moment we only support the opcodes which are
-     * 64-bit-width only. The size != 3 cases will
-     * be handled later when the relevant ops are implemented.
-     */
-    handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+    if (size == 3) {
+        TCGv_i64 tcg_rn = read_fp_dreg(s, rn);
+        TCGv_i64 tcg_rm = read_fp_dreg(s, rm);
+
+        handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+        tcg_temp_free_i64(tcg_rn);
+        tcg_temp_free_i64(tcg_rm);
+    } else {
+        /* Do a single operation on the lowest element in the vector.
+         * We use the standard Neon helpers and rely on 0 OP 0 == 0 with
+         * no side effects for all these operations.
+         * OPTME: special-purpose helpers would avoid doing some
+         * unnecessary work in the helper for the 8 and 16 bit cases.
+         */
+        NeonGenTwoOpEnvFn *genenvfn;
+        TCGv_i32 tcg_rn = tcg_temp_new_i32();
+        TCGv_i32 tcg_rm = tcg_temp_new_i32();
+        TCGv_i32 tcg_rd32 = tcg_temp_new_i32();
+
+        read_vec_element_i32(s, tcg_rn, rn, 0, size);
+        read_vec_element_i32(s, tcg_rm, rm, 0, size);
+
+        switch (opcode) {
+        case 0x1: /* SQADD, UQADD */
+        {
+            static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 },
+                { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 },
+                { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 },
+            };
+            genenvfn = fns[size][u];
+            break;
+        }
+        case 0x5: /* SQSUB, UQSUB */
+        {
+            static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 },
+                { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 },
+                { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 },
+            };
+            genenvfn = fns[size][u];
+            break;
+        }
+        case 0x9: /* SQSHL, UQSHL */
+        {
+            static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                { gen_helper_neon_qshl_s8, gen_helper_neon_qshl_u8 },
+                { gen_helper_neon_qshl_s16, gen_helper_neon_qshl_u16 },
+                { gen_helper_neon_qshl_s32, gen_helper_neon_qshl_u32 },
+            };
+            genenvfn = fns[size][u];
+            break;
+        }
+        case 0xb: /* SQRSHL, UQRSHL */
+        {
+            static NeonGenTwoOpEnvFn * const fns[3][2] = {
+                { gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
+                { gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
+                { gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
+            };
+            genenvfn = fns[size][u];
+            break;
+        }
+        case 0x16: /* SQDMULH, SQRDMULH */
+        {
+            static NeonGenTwoOpEnvFn * const fns[2][2] = {
+                { gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 },
+                { gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 },
+            };
+            assert(size == 1 || size == 2);
+            genenvfn = fns[size - 1][u];
+            break;
+        }
+        default:
+            g_assert_not_reached();
+        }
+
+        genenvfn(tcg_rd32, cpu_env, tcg_rn, tcg_rm);
+        tcg_gen_extu_i32_i64(tcg_rd, tcg_rd32);
+        tcg_temp_free_i32(tcg_rd32);
+        tcg_temp_free_i32(tcg_rn);
+        tcg_temp_free_i32(tcg_rm);
+    }
 
     write_fp_dreg(s, rd, tcg_rd);
 
-    tcg_temp_free_i64(tcg_rn);
-    tcg_temp_free_i64(tcg_rm);
     tcg_temp_free_i64(tcg_rd);
 }
 
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 15/21] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (13 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 14/21] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:34   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 16/21] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the simple 64 bit integer operations from the SIMD
scalar 2-register misc group (C3.6.12): the comparisons against
zero, plus ABS and NEG.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 87 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 8a2f311..f784602 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6240,6 +6240,48 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
     tcg_temp_free_i64(tcg_rd);
 }
 
+static void handle_2misc_64(DisasContext *s, int opcode, bool u,
+                            TCGv_i64 tcg_rd, TCGv_i64 tcg_rn)
+{
+    /* Handle 64->64 opcodes which are shared between the scalar and
+     * vector 2-reg-misc groups. We cover every integer opcode where size == 3
+     * is valid in either group.
+     */
+    TCGCond cond;
+
+    switch (opcode) {
+    case 0xa: /* CMLT */
+        /* 64 bit integer comparison against zero, result is
+         * test ? (2^64 - 1) : 0. We implement via setcond(!test) and
+         * subtracting 1.
+         */
+        cond = TCG_COND_LT;
+    do_cmop:
+        tcg_gen_setcondi_i64(tcg_invert_cond(cond), tcg_rd, tcg_rn, 0);
+        tcg_gen_subi_i64(tcg_rd, tcg_rd, 1);
+        break;
+    case 0x8: /* CMGT, CMGE */
+        cond = u ? TCG_COND_GE : TCG_COND_GT;
+        goto do_cmop;
+    case 0x9: /* CMEQ, CMLE */
+        cond = u ? TCG_COND_LE : TCG_COND_EQ;
+        goto do_cmop;
+    case 0xb: /* ABS, NEG */
+        if (u) {
+            tcg_gen_neg_i64(tcg_rd, tcg_rn);
+        } else {
+            TCGv_i64 tcg_zero = tcg_const_i64(0);
+            tcg_gen_neg_i64(tcg_rd, tcg_rn);
+            tcg_gen_movcond_i64(TCG_COND_GT, tcg_rd, tcg_rn, tcg_zero,
+                                tcg_rn, tcg_rd);
+            tcg_temp_free_i64(tcg_zero);
+        }
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
 /* C3.6.12 AdvSIMD scalar two reg misc
  *  31 30  29 28       24 23  22 21       17 16    12 11 10 9    5 4    0
  * +-----+---+-----------+------+-----------+--------+-----+------+------+
@@ -6248,7 +6290,50 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int rd = extract32(insn, 0, 5);
+    int rn = extract32(insn, 5, 5);
+    int opcode = extract32(insn, 12, 5);
+    int size = extract32(insn, 22, 2);
+    bool u = extract32(insn, 29, 1);
+
+    switch (opcode) {
+    case 0xa: /* CMLT */
+        if (u) {
+            unallocated_encoding(s);
+            return;
+        }
+        /* fall through */
+    case 0x8: /* CMGT, CMGE */
+    case 0x9: /* CMEQ, CMLE */
+    case 0xb: /* ABS, NEG */
+        if (size != 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    default:
+        /* Other categories of encoding in this class:
+         *  + floating point (single and double)
+         *  + SUQADD/USQADD/SQABS/SQNEG : size 8, 16, 32 or 64
+         *  + SQXTN/SQXTN2/SQXTUN/SQXTUN2/UQXTN/UQXTN2:
+         *    narrowing saturate ops: size 64/32/16 -> 32/16/8
+         */
+        unsupported_encoding(s, insn);
+        return;
+    }
+
+    if (size == 3) {
+        TCGv_i64 tcg_rn = read_fp_dreg(s, rn);
+        TCGv_i64 tcg_rd = tcg_temp_new_i64();
+
+        handle_2misc_64(s, opcode, u, tcg_rd, tcg_rn);
+        write_fp_dreg(s, rd, tcg_rd);
+        tcg_temp_free_i64(tcg_rd);
+        tcg_temp_free_i64(tcg_rn);
+    } else {
+        /* the 'size might not be 64' ops aren't implemented yet */
+        g_assert_not_reached();
+    }
 }
 
 /* C3.6.13 AdvSIMD scalar x indexed element
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 16/21] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (14 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 15/21] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:40   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 17/21] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add a skeleton decode for the SIMD 2-reg misc group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 110 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 109 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f784602..f03d645 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7401,7 +7401,115 @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    int size = extract32(insn, 22, 2);
+    int opcode = extract32(insn, 12, 5);
+    bool u = extract32(insn, 29, 1);
+    bool is_q = extract32(insn, 30, 1);
+
+    switch (opcode) {
+    case 0x0: /* REV64, REV32 */
+    case 0x1: /* REV16 */
+        unsupported_encoding(s, insn);
+        return;
+    case 0x5: /* CNT, NOT, RBIT  */
+        if ((u == 0 && size > 0) ||
+            (u == 1 && size > 1)) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    case 0x2: /* SADDLP, UADDLP */
+    case 0x4: /* CLS, CLZ */
+    case 0x6: /* SADALP, UADALP */
+    case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */
+    case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */
+        if (size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    case 0x13: /* SHLL, SHLL2 */
+        if (u == 0 || size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    case 0xa: /* CMLT */
+        if (u == 1) {
+            unallocated_encoding(s);
+            return;
+        }
+        /* fall through */
+    case 0x3: /* SUQADD, USQADD */
+    case 0x7: /* SQABS, SQNEG */
+    case 0x8: /* CMGT, CMGE */
+    case 0x9: /* CMEQ, CMLE */
+    case 0xb: /* ABS, NEG */
+        if (size == 3 && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        unsupported_encoding(s, insn);
+        return;
+    case 0xc ... 0xf:
+    case 0x16 ... 0x1d:
+    case 0x1f:
+    {
+        /* Floating point: U, size[1] and opcode indicate operation;
+         * size[0] indicates single or double precision.
+         */
+        opcode |= (extract32(size, 1, 1) << 5) | (u << 6);
+        size = extract32(size, 0, 1) ? 3 : 2;
+        switch (opcode) {
+        case 0x16: /* FCVTN, FCVTN2 */
+        case 0x17: /* FCVTL, FCVTL2 */
+        case 0x18: /* FRINTN */
+        case 0x19: /* FRINTM */
+        case 0x1a: /* FCVTNS */
+        case 0x1b: /* FCVTMS */
+        case 0x1c: /* FCVTAS */
+        case 0x1d: /* SCVTF */
+        case 0x2c: /* FCMGT (zero) */
+        case 0x2d: /* FCMEQ (zero) */
+        case 0x2e: /* FCMLT (zero) */
+        case 0x2f: /* FABS */
+        case 0x38: /* FRINTP */
+        case 0x39: /* FRINTZ */
+        case 0x3a: /* FCVTPS */
+        case 0x3b: /* FCVTZS */
+        case 0x3c: /* URECPE */
+        case 0x3d: /* FRECPE */
+        case 0x56: /* FCVTXN, FCVTXN2 */
+        case 0x58: /* FRINTA */
+        case 0x59: /* FRINTX */
+        case 0x5a: /* FCVTNU */
+        case 0x5b: /* FCVTMU */
+        case 0x5c: /* FCVTAU */
+        case 0x5d: /* UCVTF */
+        case 0x6c: /* FCMGE (zero) */
+        case 0x6d: /* FCMLE (zero) */
+        case 0x6f: /* FNEG */
+        case 0x79: /* FRINTI */
+        case 0x7a: /* FCVTPU */
+        case 0x7b: /* FCVTZU */
+        case 0x7c: /* URSQRTE */
+        case 0x7d: /* FRSQRTE */
+        case 0x7f: /* FSQRT */
+            unsupported_encoding(s, insn);
+            return;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    }
+    default:
+        unallocated_encoding(s);
+        return;
+    }
 }
 
 /* C3.6.18 AdvSIMD vector x indexed element
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 17/21] target-arm: A64: Implement 2-register misc compares, ABS, NEG
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (15 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 16/21] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:44   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 18/21] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the simple 2-register-misc operations we can share
with the scalar-two-register-misc code. (SUQADD, USQADD, SQABS,
SQNEG also fall into this category, but aren't implemented in
the scalar-2-register case yet either.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 136 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 134 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index f03d645..d484967 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7405,6 +7405,8 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
     int opcode = extract32(insn, 12, 5);
     bool u = extract32(insn, 29, 1);
     bool is_q = extract32(insn, 30, 1);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
 
     switch (opcode) {
     case 0x0: /* REV64, REV32 */
@@ -7443,8 +7445,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             return;
         }
         /* fall through */
-    case 0x3: /* SUQADD, USQADD */
-    case 0x7: /* SQABS, SQNEG */
     case 0x8: /* CMGT, CMGE */
     case 0x9: /* CMEQ, CMLE */
     case 0xb: /* ABS, NEG */
@@ -7452,6 +7452,13 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             unallocated_encoding(s);
             return;
         }
+        break;
+    case 0x3: /* SUQADD, USQADD */
+    case 0x7: /* SQABS, SQNEG */
+        if (size == 3 && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
         unsupported_encoding(s, insn);
         return;
     case 0xc ... 0xf:
@@ -7510,6 +7517,131 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         unallocated_encoding(s);
         return;
     }
+
+    if (size == 3) {
+        /* All 64-bit element operations can be shared with scalar 2misc */
+        int pass;
+
+        for (pass = 0; pass < (is_q ? 2 : 1); pass++) {
+            TCGv_i64 tcg_op = tcg_temp_new_i64();
+            TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+            read_vec_element(s, tcg_op, rn, pass, MO_64);
+
+            handle_2misc_64(s, opcode, u, tcg_res, tcg_op);
+
+            write_vec_element(s, tcg_res, rd, pass, MO_64);
+
+            tcg_temp_free_i64(tcg_res);
+            tcg_temp_free_i64(tcg_op);
+        }
+    } else {
+        int pass;
+
+        for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+            TCGv_i32 tcg_op = tcg_temp_new_i32();
+            TCGv_i32 tcg_res = tcg_temp_new_i32();
+            TCGCond cond;
+
+            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
+
+            if (size == 2) {
+                /* Special cases for 32 bit elements */
+                switch (opcode) {
+                case 0xa: /* CMLT */
+                    /* 32 bit integer comparison against zero, result is
+                     * test ? (2^32 - 1) : 0. We implement via setcond(test)
+                     * and inverting.
+                     */
+                    cond = TCG_COND_LT;
+                do_cmop:
+                    tcg_gen_setcondi_i32(cond, tcg_res, tcg_op, 0);
+                    tcg_gen_neg_i32(tcg_res, tcg_res);
+                    break;
+                case 0x8: /* CMGT, CMGE */
+                    cond = u ? TCG_COND_GE : TCG_COND_GT;
+                    goto do_cmop;
+                case 0x9: /* CMEQ, CMLE */
+                    cond = u ? TCG_COND_LE : TCG_COND_EQ;
+                    goto do_cmop;
+                case 0xb: /* ABS, NEG */
+                    if (u) {
+                        tcg_gen_neg_i32(tcg_res, tcg_op);
+                    } else {
+                        TCGv_i32 tcg_zero = tcg_const_i32(0);
+                        tcg_gen_neg_i32(tcg_res, tcg_op);
+                        tcg_gen_movcond_i32(TCG_COND_GT, tcg_res, tcg_op,
+                                            tcg_zero, tcg_op, tcg_res);
+                        tcg_temp_free_i32(tcg_zero);
+                    }
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            } else {
+                /* Use helpers for 8 and 16 bit elements */
+                switch (opcode) {
+                case 0x8: /* CMGT, CMGE */
+                case 0x9: /* CMEQ, CMLE */
+                case 0xa: /* CMLT */
+                {
+                    static NeonGenTwoOpFn * const fns[3][2] = {
+                        { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_s16 },
+                        { gen_helper_neon_cge_s8, gen_helper_neon_cge_s16 },
+                        { gen_helper_neon_ceq_u8, gen_helper_neon_ceq_u16 },
+                    };
+                    NeonGenTwoOpFn *genfn;
+                    int comp;
+                    bool reverse;
+                    TCGv_i32 tcg_zero = tcg_const_i32(0);
+
+                    /* comp = index into [CMGT, CMGE, CMEQ, CMLE, CMLT] */
+                    comp = (opcode - 0x8) * 2 + u;
+                    /* ...but LE, LT are implemented as reverse GE, GT */
+                    reverse = (comp > 2);
+                    if (reverse) {
+                        comp = 4 - comp;
+                    }
+                    genfn = fns[comp][size];
+                    if (reverse) {
+                        genfn(tcg_res, tcg_zero, tcg_op);
+                    } else {
+                        genfn(tcg_res, tcg_op, tcg_zero);
+                    }
+                    tcg_temp_free_i32(tcg_zero);
+                    break;
+                }
+                case 0xb: /* ABS, NEG */
+                    if (u) {
+                        TCGv_i32 tcg_zero = tcg_const_i32(0);
+                        if (size) {
+                            gen_helper_neon_sub_u16(tcg_res, tcg_zero, tcg_op);
+                        } else {
+                            gen_helper_neon_sub_u8(tcg_res, tcg_zero, tcg_op);
+                        }
+                        tcg_temp_free_i32(tcg_zero);
+                    } else {
+                        if (size) {
+                            gen_helper_neon_abs_s16(tcg_res, tcg_op);
+                        } else {
+                            gen_helper_neon_abs_s8(tcg_res, tcg_op);
+                        }
+                    }
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+
+            write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+
+            tcg_temp_free_i32(tcg_res);
+            tcg_temp_free_i32(tcg_op);
+        }
+    }
+    if (!is_q) {
+        clear_vec_high(s, rd);
+    }
 }
 
 /* C3.6.18 AdvSIMD vector x indexed element
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 18/21] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (16 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 17/21] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 17:47   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 19/21] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the 2-reg-misc CNT, NOT and RBIT instructions.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helper.h        |  1 +
 target-arm/neon_helper.c   | 12 ++++++++++++
 target-arm/translate-a64.c | 34 ++++++++++++++++++++++++++++------
 3 files changed, 41 insertions(+), 6 deletions(-)

diff --git a/target-arm/helper.h b/target-arm/helper.h
index 70872df..a3c507e 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -319,6 +319,7 @@ DEF_HELPER_1(neon_cls_s8, i32, i32)
 DEF_HELPER_1(neon_cls_s16, i32, i32)
 DEF_HELPER_1(neon_cls_s32, i32, i32)
 DEF_HELPER_1(neon_cnt_u8, i32, i32)
+DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32)
 
 DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32)
 DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32)
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index be6fbd9..b4c8690 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1133,6 +1133,18 @@ uint32_t HELPER(neon_cnt_u8)(uint32_t x)
     return x;
 }
 
+/* Reverse bits in each 8 bit word */
+uint32_t HELPER(neon_rbit_u8)(uint32_t x)
+{
+    x =  ((x & 0xf0f0f0f0) >> 4)
+       | ((x & 0x0f0f0f0f) << 4);
+    x =  ((x & 0x88888888) >> 3)
+       | ((x & 0x44444444) >> 1)
+       | ((x & 0x22222222) << 1)
+       | ((x & 0x11111111) << 3);
+    return x;
+}
+
 #define NEON_QDMULH16(dest, src1, src2, round) do { \
     uint32_t tmp = (int32_t)(int16_t) src1 * (int16_t) src2; \
     if ((tmp ^ (tmp << 1)) & SIGNBIT) { \
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index d484967..3a3489b 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6250,6 +6250,12 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
     TCGCond cond;
 
     switch (opcode) {
+    case 0x5: /* NOT */
+        /* This opcode is shared with CNT and RBIT but we have earlier
+         * enforced that size == 3 if and only if this is the NOT insn.
+         */
+        tcg_gen_not_i64(tcg_rd, tcg_rn);
+        break;
     case 0xa: /* CMLT */
         /* 64 bit integer comparison against zero, result is
          * test ? (2^64 - 1) : 0. We implement via setcond(!test) and
@@ -7413,13 +7419,19 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
     case 0x1: /* REV16 */
         unsupported_encoding(s, insn);
         return;
-    case 0x5: /* CNT, NOT, RBIT  */
-        if ((u == 0 && size > 0) ||
-            (u == 1 && size > 1)) {
-            unallocated_encoding(s);
-            return;
+    case 0x5: /* CNT, NOT, RBIT */
+        if (u && size == 0) {
+            /* NOT: adjust size so we can use the 64-bits-at-a-time loop. */
+            size = 3;
+            break;
+        } else if (u && size == 1) {
+            /* RBIT */
+            break;
+        } else if (!u && size == 0) {
+            /* CNT */
+            break;
         }
-        unsupported_encoding(s, insn);
+        unallocated_encoding(s);
         return;
     case 0x2: /* SADDLP, UADDLP */
     case 0x4: /* CLS, CLZ */
@@ -7581,6 +7593,16 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
             } else {
                 /* Use helpers for 8 and 16 bit elements */
                 switch (opcode) {
+                case 0x5: /* CNT, RBIT */
+                    /* For these two insns size is part of the opcode specifier
+                     * (handled earlier); they always operate on byte elements.
+                     */
+                    if (u) {
+                        gen_helper_neon_rbit_u8(tcg_res, tcg_op);
+                    } else {
+                        gen_helper_neon_cnt_u8(tcg_res, tcg_op);
+                    }
+                    break;
                 case 0x8: /* CMGT, CMGE */
                 case 0x9: /* CMEQ, CMLE */
                 case 0xa: /* CMLT */
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 19/21] target-arm: A64: Add narrowing 2-reg-misc instructions
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (17 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 18/21] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 20:52   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 20/21] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add the narrowing integer instructions in the 2-reg-misc class.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 85 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 83 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 3a3489b..d6979d9 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -75,6 +75,8 @@ typedef struct AArch64DecodeTable {
 /* Function prototype for gen_ functions for calling Neon helpers */
 typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
 
 /* initialize TCG globals.  */
 void a64_translate_init(void)
@@ -7399,6 +7401,79 @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
     }
 }
 
+static void handle_2misc_narrow(DisasContext *s, int opcode, bool u, bool is_q,
+                                int size, int rn, int rd)
+{
+    /* Handle 2-reg-misc ops which are narrowing (so each 2*size element
+     * in the source becomes a size element in the destination).
+     */
+    int pass;
+    TCGv_i32 tcg_res[2];
+    int destelt = is_q ? 2 : 0;
+
+    for (pass = 0; pass < 2; pass++) {
+        TCGv_i64 tcg_op = tcg_temp_new_i64();
+        NeonGenNarrowFn *genfn = NULL;
+        NeonGenNarrowEnvFn *genenvfn = NULL;
+
+        read_vec_element(s, tcg_op, rn, pass, MO_64);
+        tcg_res[pass] = tcg_temp_new_i32();
+
+        switch (opcode) {
+        case 0x12: /* XTN, SQXTUN */
+        {
+            static NeonGenNarrowFn * const xtnfns[3] = {
+                gen_helper_neon_narrow_u8,
+                gen_helper_neon_narrow_u16,
+                tcg_gen_trunc_i64_i32,
+            };
+            static NeonGenNarrowEnvFn * const sqxtunfns[3] = {
+                gen_helper_neon_unarrow_sat8,
+                gen_helper_neon_unarrow_sat16,
+                gen_helper_neon_unarrow_sat32,
+            };
+            if (u) {
+                genenvfn = sqxtunfns[size];
+            } else {
+                genfn = xtnfns[size];
+            }
+            break;
+        }
+        case 0x14: /* SQXTN, UQXTN */
+        {
+            static NeonGenNarrowEnvFn * const fns[3][2] = {
+                { gen_helper_neon_narrow_sat_s8,
+                  gen_helper_neon_narrow_sat_u8 },
+                { gen_helper_neon_narrow_sat_s16,
+                  gen_helper_neon_narrow_sat_u16 },
+                { gen_helper_neon_narrow_sat_s32,
+                  gen_helper_neon_narrow_sat_u32 },
+            };
+            genenvfn = fns[size][u];
+            break;
+        }
+        default:
+            g_assert_not_reached();
+        }
+
+        if (genfn) {
+            genfn(tcg_res[pass], tcg_op);
+        } else {
+            genenvfn(tcg_res[pass], cpu_env, tcg_op);
+        }
+
+        tcg_temp_free_i64(tcg_op);
+    }
+
+    for (pass = 0; pass < 2; pass++) {
+        write_vec_element_i32(s, tcg_res[pass], rd, destelt + pass, MO_32);
+        tcg_temp_free_i32(tcg_res[pass]);
+    }
+    if (!is_q) {
+        clear_vec_high(s, rd);
+    }
+}
+
 /* C3.6.17 AdvSIMD two reg misc
  *   31  30  29 28       24 23  22 21       17 16    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+-----------+--------+-----+------+------+
@@ -7433,11 +7508,17 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         }
         unallocated_encoding(s);
         return;
+    case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */
+    case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */
+        if (size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        handle_2misc_narrow(s, opcode, u, is_q, size, rn, rd);
+        return;
     case 0x2: /* SADDLP, UADDLP */
     case 0x4: /* CLS, CLZ */
     case 0x6: /* SADALP, UADALP */
-    case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */
-    case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */
         if (size == 3) {
             unallocated_encoding(s);
             return;
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 20/21] target-arm: A64: Add 2-reg-misc REV* instructions
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (18 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 19/21] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 20:57   ` Richard Henderson
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 21/21] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
  2014-01-26 19:33 ` [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

From: Alex Bennée <alex.bennee@linaro.org>

Add the byte-reverse operations REV64, REV32 and REV16 from the
two-reg-misc group.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 42 +++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 41 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index d6979d9..db89327 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7474,6 +7474,46 @@ static void handle_2misc_narrow(DisasContext *s, int opcode, bool u, bool is_q,
     }
 }
 
+static void handle_rev(DisasContext *s, int opcode, bool u,
+                       bool is_q, int size, int rn, int rd)
+{
+    int op = (opcode & 1) << 1 | u;
+    int opsz = op + size;
+    int ibits = 3 - opsz;
+    int revmask = (1 << ibits) - 1;
+    int dsize = is_q ? 128 : 64;
+    int esize = 8 << size;
+    int elements = dsize / esize;
+    int i;
+    TCGv_i64 tcg_rd_hi, tcg_rd, tcg_rn;
+
+    if (opsz >= 3) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    tcg_rn = tcg_temp_new_i64();
+    tcg_rd = tcg_const_i64(0);
+    tcg_rd_hi = tcg_const_i64(0);
+
+    for (i = 0; i < elements; i++) {
+        int e_rev = (i & 0xf) ^ revmask;
+        int off = e_rev * esize;
+        read_vec_element(s, tcg_rn, rn, i, size);
+        if (off >= 64) {
+            tcg_gen_deposit_i64(tcg_rd_hi, tcg_rd_hi, tcg_rn, off - 64, esize);
+        } else {
+            tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, off, esize);
+        }
+    }
+    write_vec_element(s, tcg_rd, rd, 0, MO_64);
+    write_vec_element(s, tcg_rd_hi, rd, 1, MO_64);
+
+    tcg_temp_free_i64(tcg_rd_hi);
+    tcg_temp_free_i64(tcg_rd);
+    tcg_temp_free_i64(tcg_rn);
+}
+
 /* C3.6.17 AdvSIMD two reg misc
  *   31  30  29 28       24 23  22 21       17 16    12 11 10 9    5 4    0
  * +---+---+---+-----------+------+-----------+--------+-----+------+------+
@@ -7492,7 +7532,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
     switch (opcode) {
     case 0x0: /* REV64, REV32 */
     case 0x1: /* REV16 */
-        unsupported_encoding(s, insn);
+        handle_rev(s, opcode, u, is_q, size, rn, rd);
         return;
     case 0x5: /* CNT, NOT, RBIT */
         if (u && size == 0) {
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 21/21] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (19 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 20/21] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
@ 2014-01-26 19:25 ` Peter Maydell
  2014-01-28 20:59   ` Richard Henderson
  2014-01-26 19:33 ` [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
  21 siblings, 1 reply; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:25 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Add the SIMD FNEG and FABS instructions in the SIMD 2-reg-misc group.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index db89327..797e364 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6247,7 +6247,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
 {
     /* Handle 64->64 opcodes which are shared between the scalar and
      * vector 2-reg-misc groups. We cover every integer opcode where size == 3
-     * is valid in either group.
+     * is valid in either group and also the double-precision fp ops.
      */
     TCGCond cond;
 
@@ -6285,6 +6285,12 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
             tcg_temp_free_i64(tcg_zero);
         }
         break;
+    case 0x2f: /* FABS */
+        gen_helper_vfp_absd(tcg_rd, tcg_rn);
+        break;
+    case 0x6f: /* FNEG */
+        gen_helper_vfp_negd(tcg_rd, tcg_rn);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -7604,6 +7610,13 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         opcode |= (extract32(size, 1, 1) << 5) | (u << 6);
         size = extract32(size, 0, 1) ? 3 : 2;
         switch (opcode) {
+        case 0x2f: /* FABS */
+        case 0x6f: /* FNEG */
+            if (size == 3 && !is_q) {
+                unallocated_encoding(s);
+                return;
+            }
+            break;
         case 0x16: /* FCVTN, FCVTN2 */
         case 0x17: /* FCVTL, FCVTL2 */
         case 0x18: /* FRINTN */
@@ -7615,7 +7628,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         case 0x2c: /* FCMGT (zero) */
         case 0x2d: /* FCMEQ (zero) */
         case 0x2e: /* FCMLT (zero) */
-        case 0x2f: /* FABS */
         case 0x38: /* FRINTP */
         case 0x39: /* FRINTZ */
         case 0x3a: /* FCVTPS */
@@ -7631,7 +7643,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         case 0x5d: /* UCVTF */
         case 0x6c: /* FCMGE (zero) */
         case 0x6d: /* FCMLE (zero) */
-        case 0x6f: /* FNEG */
         case 0x79: /* FRINTI */
         case 0x7a: /* FCVTPU */
         case 0x7b: /* FCVTZU */
@@ -7708,6 +7719,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                         tcg_temp_free_i32(tcg_zero);
                     }
                     break;
+                case 0x2f: /* FABS */
+                    gen_helper_vfp_abss(tcg_res, tcg_op);
+                    break;
+                case 0x6f: /* FNEG */
+                    gen_helper_vfp_negs(tcg_res, tcg_op);
+                    break;
                 default:
                     g_assert_not_reached();
                 }
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets
  2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
                   ` (20 preceding siblings ...)
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 21/21] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
@ 2014-01-26 19:33 ` Peter Maydell
  21 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-26 19:33 UTC (permalink / raw)
  To: QEMU Developers
  Cc: Peter Crosthwaite, Laurent Desnogues, Patch Tracking,
	Michael Matz, Claudio Fontana, Dirk Mueller, Will Newton,
	kvmarm@lists.cs.columbia.edu, Richard Henderson

On 26 January 2014 19:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> This patch series is kind of in two parts. The first 8 patches
> are the "Neon second set" that's already been pretty much reviewed;
> I'm resending them just because there were a few minor nits that
> came up in the last round which have been fixed:
>  * patch 6 added the missing SQDMULH/SQRDMULH unallocated/unimplemented
>    case in the SIMD 3-same initial decode
>  * patch 7 now uses the _i32 vector element accessors
>  * patch 8 fixed a few codestyle nits
> (RTH: these all seemed trivial enough that I've left your
> reviewed-by tags in place).
>
> Patches 9..21 are new, and fill in a number of gaps that bring
> us up to (and past) parity with the SuSE tree for coverage:
>  * more SIMD 3-same ops, including basically all the integer ones
>  * the "scalar pairwise" instruction group
>  * more SIMD scalar-3-same ops, including all the integer ones
>  * some (but not all) of the 2-reg misc and scalar 2-reg misc groups
>    (by the same rationale as with 3-same, we aim for "anything
>    implemented in the SuSE tree" plus enough to make it reasonably
>    likely we've got the general function structure correct)

I forgot to mention, but you can find a git tree with these in at
git://git.linaro.org/people/peter.maydell/qemu-arm.git a64-neon

thanks
-- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 09/21] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 09/21] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
@ 2014-01-28 17:05   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:05 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Implement the SIMD 3-reg-same instructions SQADD, UQADD,
> SQSUB, UQSUB, SSHL, USHL, SQSHl, UQSHL, SRSHL, URSHL,
> SQRSHL, UQRSHL; these are all simple calls to existing
> Neon helpers. We also enable SSHL, USHL, SRSHL and URSHL
> for the 3-reg-same-scalar category (but not the others
> because they can have non-size-64 operands and the
> scalar_3reg_same function doesn't support that yet.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 134 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 112 insertions(+), 22 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
@ 2014-01-28 17:13   ` Richard Henderson
  2014-01-28 17:19     ` Peter Maydell
  0 siblings, 1 reply; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:13 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> @@ -6773,9 +6801,9 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
>              case 0x8: /* SSHL, USHL */
>              {
>                  static NeonGenTwoOpFn * const fns[3][2] = {
> -                    { gen_helper_neon_shl_u8, gen_helper_neon_shl_s8 },
> -                    { gen_helper_neon_shl_u16, gen_helper_neon_shl_s16 },
> -                    { gen_helper_neon_shl_u32, gen_helper_neon_shl_s32 },
> +                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
> +                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
> +                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
>                  };
>                  genfn = fns[size][u];
>                  break;

Oops, I clearly missed this in the review of the previous patch.
But these bug fixes ought to be folded into the previous.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>

> +            case 0xd: /* SMIN, UMIN */
> +            {
> +                static NeonGenTwoOpFn * const fns[3][2] = {
> +                    { gen_helper_neon_min_s8, gen_helper_neon_min_u8 },
> +                    { gen_helper_neon_min_s16, gen_helper_neon_min_u16 },
> +                    { gen_helper_neon_min_s32, gen_helper_neon_min_u32 },
> +                };
> +                genfn = fns[size][u];
> +                break;

Out of curiosity, what logic are you applying to using the neon helper
functions and implementing stuff inline?

It appears that you're implementing all of the 64-bit stuff inline, but using
the existing 32-bit helpers, even if the helpers are remarkably small, like
some of these.


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns
  2014-01-28 17:13   ` Richard Henderson
@ 2014-01-28 17:19     ` Peter Maydell
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-28 17:19 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Crosthwaite, Patch Tracking, Michael Matz, Alexander Graf,
	QEMU Developers, Claudio Fontana, Dirk Mueller, Will Newton,
	Laurent Desnogues, Alex Bennée, kvmarm@lists.cs.columbia.edu,
	Christoffer Dall

On 28 January 2014 17:13, Richard Henderson <rth@twiddle.net> wrote:
> On 01/26/2014 11:25 AM, Peter Maydell wrote:
>> @@ -6773,9 +6801,9 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
>>              case 0x8: /* SSHL, USHL */
>>              {
>>                  static NeonGenTwoOpFn * const fns[3][2] = {
>> -                    { gen_helper_neon_shl_u8, gen_helper_neon_shl_s8 },
>> -                    { gen_helper_neon_shl_u16, gen_helper_neon_shl_s16 },
>> -                    { gen_helper_neon_shl_u32, gen_helper_neon_shl_s32 },
>> +                    { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
>> +                    { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
>> +                    { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
>>                  };
>>                  genfn = fns[size][u];
>>                  break;
>
> Oops, I clearly missed this in the review of the previous patch.
> But these bug fixes ought to be folded into the previous.

Heh, yes. I missed this too, I think I must have just
done a 'fix same bug in all these cases' edit and not
noticed some of them were preexisting from the previous
patch.

> Otherwise,
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
>
>> +            case 0xd: /* SMIN, UMIN */
>> +            {
>> +                static NeonGenTwoOpFn * const fns[3][2] = {
>> +                    { gen_helper_neon_min_s8, gen_helper_neon_min_u8 },
>> +                    { gen_helper_neon_min_s16, gen_helper_neon_min_u16 },
>> +                    { gen_helper_neon_min_s32, gen_helper_neon_min_u32 },
>> +                };
>> +                genfn = fns[size][u];
>> +                break;
>
> Out of curiosity, what logic are you applying to using the neon helper
> functions and implementing stuff inline?
>
> It appears that you're implementing all of the 64-bit stuff inline, but using
> the existing 32-bit helpers, even if the helpers are remarkably small, like
> some of these.

My logic runs roughly "if there's a helper already, then use it
since we know it to have the correct semantics; otherwise implement
in whatever seems most straightforward". Since the A32 Neon was
generally done almost all with helpers this tends to result in the
effect you note.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
@ 2014-01-28 17:21   ` Richard Henderson
  2014-01-28 18:27     ` Peter Maydell
  0 siblings, 1 reply; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:21 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> +/* Helper functions for pairwise 32 bit comparisons */
> +static void gen_pmax_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
> +{
> +    tcg_gen_movcond_i32(TCG_COND_GE, res, op1, op2, op1, op2);
> +}
> +
> +static void gen_pmax_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
> +{
> +    tcg_gen_movcond_i32(TCG_COND_GEU, res, op1, op2, op1, op2);
> +}
> +
> +static void gen_pmin_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
> +{
> +    tcg_gen_movcond_i32(TCG_COND_LE, res, op1, op2, op1, op2);
> +}
> +
> +static void gen_pmin_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
> +{
> +    tcg_gen_movcond_i32(TCG_COND_LEU, res, op1, op2, op1, op2);
> +}

These are exactly the sort of helpers I expected to see in the previous patch.
Thus my question re neon_{min,max}_{s,u}32.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 12/21] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 12/21] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
@ 2014-01-28 17:23   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:23 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> We have macros for marking TCGv values as unused, checking if they
> are unused and comparing them to each other. However these only exist
> for TCGv_i32 and TCGv_i64; add them for TCGv_ptr as well.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  tcg/tcg.h | 3 +++
>  1 file changed, 3 insertions(+)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 13/21] target-arm: A64: Implement scalar pairwise ops
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 13/21] target-arm: A64: Implement scalar pairwise ops Peter Maydell
@ 2014-01-28 17:28   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:28 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Implement the instructions in the scalar pairwise group (C3.6.8).
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 113 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 15/21] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 15/21] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
@ 2014-01-28 17:34   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:34 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> +    do_cmop:
> +        tcg_gen_setcondi_i64(tcg_invert_cond(cond), tcg_rd, tcg_rn, 0);
> +        tcg_gen_subi_i64(tcg_rd, tcg_rd, 1);
> +        break;

Another instance of (!test - 1) instead of -(test).

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 16/21] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 16/21] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
@ 2014-01-28 17:40   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:40 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Add a skeleton decode for the SIMD 2-reg misc group.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 110 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 109 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 17/21] target-arm: A64: Implement 2-register misc compares, ABS, NEG
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 17/21] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
@ 2014-01-28 17:44   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:44 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Implement the simple 2-register-misc operations we can share
> with the scalar-two-register-misc code. (SUQADD, USQADD, SQABS,
> SQNEG also fall into this category, but aren't implemented in
> the scalar-2-register case yet either.)
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 136 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 134 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 18/21] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 18/21] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
@ 2014-01-28 17:47   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 17:47 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Implement the 2-reg-misc CNT, NOT and RBIT instructions.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/helper.h        |  1 +
>  target-arm/neon_helper.c   | 12 ++++++++++++
>  target-arm/translate-a64.c | 34 ++++++++++++++++++++++++++++------
>  3 files changed, 41 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD
  2014-01-28 17:21   ` Richard Henderson
@ 2014-01-28 18:27     ` Peter Maydell
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-28 18:27 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Crosthwaite, Patch Tracking, Michael Matz, Alexander Graf,
	QEMU Developers, Claudio Fontana, Dirk Mueller, Will Newton,
	Laurent Desnogues, Alex Bennée, kvmarm@lists.cs.columbia.edu,
	Christoffer Dall

On 28 January 2014 17:21, Richard Henderson <rth@twiddle.net> wrote:
> On 01/26/2014 11:25 AM, Peter Maydell wrote:
>> +/* Helper functions for pairwise 32 bit comparisons */
>> +static void gen_pmax_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
>> +{
>> +    tcg_gen_movcond_i32(TCG_COND_GE, res, op1, op2, op1, op2);
>> +}
>> +
>> +static void gen_pmax_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
>> +{
>> +    tcg_gen_movcond_i32(TCG_COND_GEU, res, op1, op2, op1, op2);
>> +}
>> +
>> +static void gen_pmin_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
>> +{
>> +    tcg_gen_movcond_i32(TCG_COND_LE, res, op1, op2, op1, op2);
>> +}
>> +
>> +static void gen_pmin_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
>> +{
>> +    tcg_gen_movcond_i32(TCG_COND_LEU, res, op1, op2, op1, op2);
>> +}
>
> These are exactly the sort of helpers I expected to see in the previous patch.
> Thus my question re neon_{min,max}_{s,u}32.

OK, I'll move these to the previous patch and use them there too.
I suspect that the reason for teh helpers existing is they predate
the movcond op.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 02/21] target-arm: A64: Add SIMD three-different ABDL instructions
  2014-01-26 19:24 ` [Qemu-devel] [PATCH 02/21] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
@ 2014-01-28 18:59   ` Peter Maydell
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Maydell @ 2014-01-28 18:59 UTC (permalink / raw)
  To: QEMU Developers
  Cc: Peter Crosthwaite, Laurent Desnogues, Patch Tracking,
	Michael Matz, Claudio Fontana, Dirk Mueller, Will Newton,
	kvmarm@lists.cs.columbia.edu, Richard Henderson

On 26 January 2014 19:24, Peter Maydell <peter.maydell@linaro.org> wrote:
> Implement the absolute-difference instructions in the SIMD
> three-different group: SABAL, SABAL2, UABAL, UABAL2, SABDL,
> SABDL2, UABDL, UABDL2.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Richard Henderson <rth@twiddle.net>

Just a note that if anybody's doing anything with this
patch via 'git am' or similar then they should be careful,
because git's patch-applying algorithm seems to like to swap
the order of the two hunks when it applies this. (Easy to
tell if it has because it won't compile, fortunately.)

thanks
-- PMM

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 19/21] target-arm: A64: Add narrowing 2-reg-misc instructions
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 19/21] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
@ 2014-01-28 20:52   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 20:52 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Add the narrowing integer instructions in the 2-reg-misc class.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 85 ++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 83 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 20/21] target-arm: A64: Add 2-reg-misc REV* instructions
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 20/21] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
@ 2014-01-28 20:57   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 20:57 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> +    int elements = dsize / esize;
> +    int i;
> +    TCGv_i64 tcg_rd_hi, tcg_rd, tcg_rn;
> +
> +    if (opsz >= 3) {
> +        unallocated_encoding(s);
> +        return;
> +    }
> +
> +    tcg_rn = tcg_temp_new_i64();
> +    tcg_rd = tcg_const_i64(0);
> +    tcg_rd_hi = tcg_const_i64(0);
> +
> +    for (i = 0; i < elements; i++) {
> +        int e_rev = (i & 0xf) ^ revmask;
> +        int off = e_rev * esize;
> +        read_vec_element(s, tcg_rn, rn, i, size);
> +        if (off >= 64) {
> +            tcg_gen_deposit_i64(tcg_rd_hi, tcg_rd_hi, tcg_rn, off - 64, esize);
> +        } else {
> +            tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, off, esize);
> +        }
> +    }
> +    write_vec_element(s, tcg_rd, rd, 0, MO_64);
> +    write_vec_element(s, tcg_rd_hi, rd, 1, MO_64);

Surely a special case for esize == 8 is in order, since that's bswap.

Otherwise,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 21/21] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 21/21] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
@ 2014-01-28 20:59   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 20:59 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Add the SIMD FNEG and FABS instructions in the SIMD 2-reg-misc group.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 23 ++++++++++++++++++++---
>  1 file changed, 20 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 14/21] target-arm: A64: Implement remaining integer scalar-3-same insns
  2014-01-26 19:25 ` [Qemu-devel] [PATCH 14/21] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
@ 2014-01-28 21:01   ` Richard Henderson
  0 siblings, 0 replies; 39+ messages in thread
From: Richard Henderson @ 2014-01-28 21:01 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, Laurent Desnogues, patches, Michael Matz,
	Alexander Graf, Claudio Fontana, Dirk Mueller, Will Newton,
	Alex Bennée, kvmarm, Christoffer Dall

On 01/26/2014 11:25 AM, Peter Maydell wrote:
> Implement the remaining integer instructions in the scalar-three-reg-same
> group: SQADD, UQADD, SQSUB, UQSUB, SQSHL, UQSHL, SQRSHL, UQRSHL,
> SQDMULH, SQRDMULH.
> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/translate-a64.c | 106 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 87 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-01-28 21:01 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-26 19:24 [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 01/21] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 02/21] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
2014-01-28 18:59   ` Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 03/21] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 04/21] target-arm: A64: Add top level decode for SIMD 3-same group Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 05/21] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 06/21] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 07/21] target-arm: A64: Add simple SIMD 3-same floating point ops Peter Maydell
2014-01-26 19:24 ` [Qemu-devel] [PATCH 08/21] target-arm: A64: Add SIMD shift by immediate Peter Maydell
2014-01-26 19:25 ` [Qemu-devel] [PATCH 09/21] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
2014-01-28 17:05   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 10/21] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
2014-01-28 17:13   ` Richard Henderson
2014-01-28 17:19     ` Peter Maydell
2014-01-26 19:25 ` [Qemu-devel] [PATCH 11/21] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
2014-01-28 17:21   ` Richard Henderson
2014-01-28 18:27     ` Peter Maydell
2014-01-26 19:25 ` [Qemu-devel] [PATCH 12/21] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
2014-01-28 17:23   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 13/21] target-arm: A64: Implement scalar pairwise ops Peter Maydell
2014-01-28 17:28   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 14/21] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
2014-01-28 21:01   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 15/21] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
2014-01-28 17:34   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 16/21] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
2014-01-28 17:40   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 17/21] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
2014-01-28 17:44   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 18/21] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
2014-01-28 17:47   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 19/21] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
2014-01-28 20:52   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 20/21] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
2014-01-28 20:57   ` Richard Henderson
2014-01-26 19:25 ` [Qemu-devel] [PATCH 21/21] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
2014-01-28 20:59   ` Richard Henderson
2014-01-26 19:33 ` [Qemu-devel] [PATCH 00/21] A64: Add Neon instructions, second and third sets Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).