[Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set
@ 2014-02-07 21:49 Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns Peter Maydell
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

This is the fourth set of A64 Neon support patches. This set
provides complete implementations of the categories:
 * scalar x indexed
 * vector x indexed
 * scalar three-different
and fills in all the previously missing instructions in:
 * vector 3-reg-same
 * scalar 3-reg-same

It includes adding support in softfloat for "fused multiply,
add and then divide by two", because the A64 VRECPS instruction
requires it.

By my reckoning the parts that are still missing for Neon are:
 * parts of scalar shift-immediate and vector shift-immediate
 * parts of scalar-2-misc
 * most of vector-3-diff
 * parts of vector-2-misc

Alex Bennée (2):
  target-arm: A64: Implement SIMD FP compare and set insns
  target-arm: A64: Implement floating point pairwise insns

Peter Maydell (6):
  target-arm: A64: Implement plain vector SIMD indexed element insns
  target-arm: A64: Implement long vector x indexed insns
  target-arm: A64: Implement SIMD scalar indexed instructions
  target-arm: A64: Implement scalar three different instructions
  softfloat: Support halving the result of muladd operation
  target-arm: A64: Implement remaining 3-same instructions

 fpu/softfloat.c            |  38 ++
 include/fpu/softfloat.h    |   3 +
 target-arm/helper-a64.c    | 105 +++++
 target-arm/helper-a64.h    |   9 +
 target-arm/helper.h        |   2 +
 target-arm/neon_helper.c   |  16 +
 target-arm/translate-a64.c | 936 +++++++++++++++++++++++++++++++++++++++++----
 7 files changed, 1033 insertions(+), 76 deletions(-)

-- 
1.8.5

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-11 14:52   ` Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 2/8] target-arm: A64: Implement long vector x indexed insns Peter Maydell
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement all the SIMD vector x indexed element instructions
in the subcategory which are not 'long' ops.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helper-a64.c    |  26 +++++
 target-arm/helper-a64.h    |   2 +
 target-arm/translate-a64.c | 245 ++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 272 insertions(+), 1 deletion(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index 6ca958a..fe90a5c 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -123,6 +123,32 @@ uint64_t HELPER(vfp_cmped_a64)(float64 x, float64 y, void *fp_status)
     return float_rel_to_flags(float64_compare(x, y, fp_status));
 }
 
+float32 HELPER(vfp_mulxs)(float32 a, float32 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    if ((float32_is_zero(a) && float32_is_infinity(b)) ||
+        (float32_is_infinity(a) && float32_is_zero(b))) {
+        /* 2.0 with the sign bit set to sign(A) XOR sign(B) */
+        return make_float32((1U << 30) |
+                            ((float32_val(a) ^ float32_val(b)) & (1U << 31)));
+    }
+    return float32_mul(a, b, fpst);
+}
+
+float64 HELPER(vfp_mulxd)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    if ((float64_is_zero(a) && float64_is_infinity(b)) ||
+        (float64_is_infinity(a) && float64_is_zero(b))) {
+        /* 2.0 with the sign bit set to sign(A) XOR sign(B) */
+        return make_float64((1ULL << 62) |
+                            ((float64_val(a) ^ float64_val(b)) & (1ULL << 63)));
+    }
+    return float64_mul(a, b, fpst);
+}
+
 uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
                           uint32_t rn, uint32_t numregs)
 {
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 99832ee..84310e8 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -27,3 +27,5 @@ DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
 DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
 DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
 DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, i32)
+DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
+DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index d60223a..b7f1ecf 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7813,7 +7813,250 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    /* This encoding has two kinds of instruction:
+     *  normal, where we perform elt x idxelt => elt for each
+     *     element in the vector
+     *  long, where we perform elt x idxelt and generate a result of
+     *     double the width of the input element
+     * The long ops have a 'part' specifier (ie come in INSN, INSN2 pairs).
+     */
+    bool is_q = extract32(insn, 30, 1);
+    bool u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int l = extract32(insn, 21, 1);
+    int m = extract32(insn, 20, 1);
+    /* Note that the Rm field here is only 4 bits, not 5 as it usually is */
+    int rm = extract32(insn, 16, 4);
+    int opcode = extract32(insn, 12, 4);
+    int h = extract32(insn, 11, 1);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+    bool is_long = false;
+    bool is_fp = false;
+    int index;
+    TCGv_ptr fpst;
+
+    switch (opcode) {
+    case 0x0: /* MLA */
+    case 0x4: /* MLS */
+        if (!u) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+    case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+    case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
+        is_long = true;
+        break;
+    case 0x3: /* SQDMLAL, SQDMLAL2 */
+    case 0x7: /* SQDMLSL, SQDMLSL2 */
+    case 0xb: /* SQDMULL, SQDMULL2 */
+        is_long = true;
+        /* fall through */
+    case 0xc: /* SQDMULH */
+    case 0xd: /* SQRDMULH */
+    case 0x8: /* MUL */
+        if (u) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    case 0x1: /* FMLA */
+    case 0x5: /* FMLS */
+        if (u) {
+            unallocated_encoding(s);
+            return;
+        }
+        /* fall through */
+    case 0x9: /* FMUL, FMULX */
+        if (!extract32(size, 1, 1)) {
+            unallocated_encoding(s);
+            return;
+        }
+        is_fp = true;
+        break;
+    }
+
+    if (is_fp) {
+        /* low bit of size indicates single/double */
+        size = extract32(size, 0, 1) ? 3 : 2;
+        if (size == 2) {
+            index = h << 1 | l;
+        } else {
+            if (l || !is_q) {
+                unallocated_encoding(s);
+                return;
+            }
+            index = h;
+        }
+        rm |= (m << 4);
+    } else {
+        switch (size) {
+        case 1:
+            index = h << 2 | l << 1 | m;
+            break;
+        case 2:
+            index = h << 1 | l;
+            rm |= (m << 4);
+            break;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+    }
+
+    if (is_long) {
+        unsupported_encoding(s, insn);
+        return;
+    }
+
+    if (is_fp) {
+        fpst = get_fpstatus_ptr();
+    } else {
+        TCGV_UNUSED_PTR(fpst);
+    }
+
+    if (size == 3) {
+        TCGv_i64 tcg_idx = tcg_temp_new_i64();
+        int pass;
+
+        assert(is_fp && is_q && !is_long);
+
+        read_vec_element(s, tcg_idx, rm, index, MO_64);
+
+        for (pass = 0; pass < 2; pass++) {
+            TCGv_i64 tcg_op = tcg_temp_new_i64();
+            TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+            read_vec_element(s, tcg_op, rn, pass, MO_64);
+
+            switch (opcode) {
+            case 0x5: /* FMLS */
+                /* As usual for ARM, separate negation for fused multiply-add */
+                gen_helper_vfp_negd(tcg_op, tcg_op);
+                /* fall through */
+            case 0x1: /* FMLA */
+                read_vec_element(s, tcg_res, rd, pass, MO_64);
+                gen_helper_vfp_muladdd(tcg_res, tcg_op, tcg_idx, tcg_res, fpst);
+                break;
+            case 0x9: /* FMUL, FMULX */
+                if (u) {
+                    gen_helper_vfp_mulxd(tcg_res, tcg_op, tcg_idx, fpst);
+                } else {
+                    gen_helper_vfp_muld(tcg_res, tcg_op, tcg_idx, fpst);
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            write_vec_element(s, tcg_res, rd, pass, MO_64);
+            tcg_temp_free_i64(tcg_op);
+            tcg_temp_free_i64(tcg_res);
+        }
+
+        tcg_temp_free_i64(tcg_idx);
+    } else if (!is_long) {
+        /* 32 bit floating point, or 16 or 32 bit integer */
+        TCGv_i32 tcg_idx = tcg_temp_new_i32();
+        int pass;
+
+        read_vec_element_i32(s, tcg_idx, rm, index, size);
+
+        if (size == 1) {
+            /* The simplest way to handle the 16x16 indexed ops is to duplicate
+             * the index into both halves of the 32 bit tcg_idx and then use
+             * the usual Neon helpers.
+             */
+            tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
+        }
+
+        for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+            TCGv_i32 tcg_op = tcg_temp_new_i32();
+            TCGv_i32 tcg_res = tcg_temp_new_i32();
+
+            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
+
+            switch (opcode) {
+            case 0x0: /* MLA */
+            case 0x4: /* MLS */
+            case 0x8: /* MUL */
+            {
+                static NeonGenTwoOpFn * const fns[2][2] = {
+                    { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
+                    { tcg_gen_add_i32, tcg_gen_sub_i32 },
+                };
+                NeonGenTwoOpFn *genfn;
+                bool is_sub = opcode == 0x4;
+
+                if (size == 1) {
+                    gen_helper_neon_mul_u16(tcg_res, tcg_op, tcg_idx);
+                } else {
+                    tcg_gen_mul_i32(tcg_res, tcg_op, tcg_idx);
+                }
+                if (opcode == 0x8) {
+                    break;
+                }
+                read_vec_element_i32(s, tcg_op, rd, pass, MO_32);
+                genfn = fns[size - 1][is_sub];
+                genfn(tcg_res, tcg_op, tcg_res);
+                break;
+            }
+            case 0x5: /* FMLS */
+                /* As usual for ARM, separate negation for fused multiply-add */
+                gen_helper_vfp_negs(tcg_op, tcg_op);
+                /* fall through */
+            case 0x1: /* FMLA */
+                read_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+                gen_helper_vfp_muladds(tcg_res, tcg_op, tcg_idx, tcg_res, fpst);
+                break;
+            case 0x9: /* FMUL, FMULX */
+                if (u) {
+                    gen_helper_vfp_mulxs(tcg_res, tcg_op, tcg_idx, fpst);
+                } else {
+                    gen_helper_vfp_muls(tcg_res, tcg_op, tcg_idx, fpst);
+                }
+                break;
+            case 0xc: /* SQDMULH */
+                if (size == 1) {
+                    gen_helper_neon_qdmulh_s16(tcg_res, cpu_env,
+                                               tcg_op, tcg_idx);
+                } else {
+                    gen_helper_neon_qdmulh_s32(tcg_res, cpu_env,
+                                               tcg_op, tcg_idx);
+                }
+                break;
+            case 0xd: /* SQRDMULH */
+                if (size == 1) {
+                    gen_helper_neon_qrdmulh_s16(tcg_res, cpu_env,
+                                                tcg_op, tcg_idx);
+                } else {
+                    gen_helper_neon_qrdmulh_s32(tcg_res, cpu_env,
+                                                tcg_op, tcg_idx);
+                }
+                break;
+            default:
+                g_assert_not_reached();
+            }
+
+            write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+            tcg_temp_free_i32(tcg_op);
+            tcg_temp_free_i32(tcg_res);
+        }
+
+        tcg_temp_free_i32(tcg_idx);
+
+        if (!is_q) {
+            clear_vec_high(s, rd);
+        }
+    } else {
+        /* long ops: 16x16->32 or 32x32->64 */
+    }
+
+    if (!TCGV_IS_UNUSED_PTR(fpst)) {
+        tcg_temp_free_ptr(fpst);
+    }
 }
 
 /* C3.6.19 Crypto AES
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 2/8] target-arm: A64: Implement long vector x indexed insns
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Implement SIMD scalar indexed instructions Peter Maydell
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the 'long' operations in the vector x indexed
element category.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 144 +++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 139 insertions(+), 5 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index b7f1ecf..64b8c86 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7906,11 +7906,6 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
         }
     }
 
-    if (is_long) {
-        unsupported_encoding(s, insn);
-        return;
-    }
-
     if (is_fp) {
         fpst = get_fpstatus_ptr();
     } else {
@@ -8052,6 +8047,145 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
         }
     } else {
         /* long ops: 16x16->32 or 32x32->64 */
+        TCGv_i64 tcg_res[2];
+        int pass;
+        bool satop = extract32(opcode, 0, 1);
+        TCGMemOp memop = MO_32;
+
+        if (satop || !u) {
+            memop |= MO_SIGN;
+        }
+
+        if (size == 2) {
+            TCGv_i64 tcg_idx = tcg_temp_new_i64();
+
+            read_vec_element(s, tcg_idx, rm, index, memop);
+
+            for (pass = 0; pass < 2; pass++) {
+                TCGv_i64 tcg_op = tcg_temp_new_i64();
+                TCGv_i64 tcg_passres;
+
+                read_vec_element(s, tcg_op, rn, pass + (is_q * 2), memop);
+
+                tcg_res[pass] = tcg_temp_new_i64();
+
+                if (opcode == 0xa || opcode == 0xb) {
+                    /* Non-accumulating ops */
+                    tcg_passres = tcg_res[pass];
+                } else {
+                    tcg_passres = tcg_temp_new_i64();
+                }
+
+                tcg_gen_mul_i64(tcg_passres, tcg_op, tcg_idx);
+                tcg_temp_free_i64(tcg_op);
+
+                if (satop) {
+                    /* saturating, doubling */
+                    gen_helper_neon_addl_saturate_s64(tcg_passres, cpu_env,
+                                                      tcg_passres, tcg_passres);
+                }
+
+                if (opcode == 0xa || opcode == 0xb) {
+                    continue;
+                }
+
+                /* Accumulating op: handle accumulate step */
+                read_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+
+                switch (opcode) {
+                case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+                    tcg_gen_add_i64(tcg_res[pass], tcg_res[pass], tcg_passres);
+                    break;
+                case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+                    tcg_gen_sub_i64(tcg_res[pass], tcg_res[pass], tcg_passres);
+                    break;
+                case 0x7: /* SQDMLSL, SQDMLSL2 */
+                    tcg_gen_neg_i64(tcg_passres, tcg_passres);
+                    /* fall through */
+                case 0x3: /* SQDMLAL, SQDMLAL2 */
+                    gen_helper_neon_addl_saturate_s64(tcg_res[pass], cpu_env,
+                                                      tcg_res[pass],
+                                                      tcg_passres);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+                tcg_temp_free_i64(tcg_passres);
+            }
+            tcg_temp_free_i64(tcg_idx);
+        } else {
+            TCGv_i32 tcg_idx = tcg_temp_new_i32();
+
+            assert(size == 1);
+            read_vec_element_i32(s, tcg_idx, rm, index, size);
+
+            /* The simplest way to handle the 16x16 indexed ops is to duplicate
+             * the index into both halves of the 32 bit tcg_idx and then use
+             * the usual Neon helpers.
+             */
+            tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
+
+            for (pass = 0; pass < 2; pass++) {
+                TCGv_i32 tcg_op = tcg_temp_new_i32();
+                TCGv_i64 tcg_passres;
+
+                read_vec_element_i32(s, tcg_op, rn, pass + (is_q * 2), MO_32);
+                tcg_res[pass] = tcg_temp_new_i64();
+
+                if (opcode == 0xa || opcode == 0xb) {
+                    /* Non-accumulating ops */
+                    tcg_passres = tcg_res[pass];
+                } else {
+                    tcg_passres = tcg_temp_new_i64();
+                }
+
+                if (memop & MO_SIGN) {
+                    gen_helper_neon_mull_s16(tcg_passres, tcg_op, tcg_idx);
+                } else {
+                    gen_helper_neon_mull_u16(tcg_passres, tcg_op, tcg_idx);
+                }
+                if (satop) {
+                    gen_helper_neon_addl_saturate_s32(tcg_passres, cpu_env,
+                                                      tcg_passres, tcg_passres);
+                }
+                tcg_temp_free_i32(tcg_op);
+
+                if (opcode == 0xa || opcode == 0xb) {
+                    continue;
+                }
+
+                /* Accumulating op: handle accumulate step */
+                read_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+
+                switch (opcode) {
+                case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+                    gen_helper_neon_addl_u32(tcg_res[pass], tcg_res[pass],
+                                             tcg_passres);
+                    break;
+                case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+                    gen_helper_neon_subl_u32(tcg_res[pass], tcg_res[pass],
+                                             tcg_passres);
+                    break;
+                case 0x7: /* SQDMLSL, SQDMLSL2 */
+                    gen_helper_neon_negl_u32(tcg_passres, tcg_passres);
+                    /* fall through */
+                case 0x3: /* SQDMLAL, SQDMLAL2 */
+                    gen_helper_neon_addl_saturate_s32(tcg_res[pass], cpu_env,
+                                                      tcg_res[pass],
+                                                      tcg_passres);
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+                tcg_temp_free_i64(tcg_passres);
+            }
+            tcg_temp_free_i32(tcg_idx);
+        }
+
+        for (pass = 0; pass < 2; pass++) {
+            write_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+            tcg_temp_free_i64(tcg_res[pass]);
+        }
     }
 
     if (!TCGV_IS_UNUSED_PTR(fpst)) {
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 3/8] target-arm: A64: Implement SIMD scalar indexed instructions
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 2/8] target-arm: A64: Implement long vector x indexed insns Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 4/8] target-arm: A64: Implement scalar three different instructions Peter Maydell
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the SIMD scalar indexed instructions. The encoding
here is nearly identical to the vector indexed grouping, so
we combine the two.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 115 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 64b8c86..0ea8eb2 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6322,17 +6322,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
     }
 }
 
-/* C3.6.13 AdvSIMD scalar x indexed element
- *  31 30  29 28       24 23  22 21  20  19  16 15 12  11  10 9    5 4    0
- * +-----+---+-----------+------+---+---+------+-----+---+---+------+------+
- * | 0 1 | U | 1 1 1 1 1 | size | L | M |  Rm  | opc | H | 0 |  Rn  |  Rd  |
- * +-----+---+-----------+------+---+---+------+-----+---+---+------+------+
- */
-static void disas_simd_scalar_indexed(DisasContext *s, uint32_t insn)
-{
-    unsupported_encoding(s, insn);
-}
-
 /* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
 static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
                                  int immh, int immb, int opcode, int rn, int rd)
@@ -7805,13 +7794,18 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
     }
 }
 
-/* C3.6.18 AdvSIMD vector x indexed element
+/* C3.6.13 AdvSIMD scalar x indexed element
+ *  31 30  29 28       24 23  22 21  20  19  16 15 12  11  10 9    5 4    0
+ * +-----+---+-----------+------+---+---+------+-----+---+---+------+------+
+ * | 0 1 | U | 1 1 1 1 1 | size | L | M |  Rm  | opc | H | 0 |  Rn  |  Rd  |
+ * +-----+---+-----------+------+---+---+------+-----+---+---+------+------+
+ * C3.6.18 AdvSIMD vector x indexed element
  *   31  30  29 28       24 23  22 21  20  19  16 15 12  11  10 9    5 4    0
  * +---+---+---+-----------+------+---+---+------+-----+---+---+------+------+
  * | 0 | Q | U | 0 1 1 1 1 | size | L | M |  Rm  | opc | H | 0 |  Rn  |  Rd  |
  * +---+---+---+-----------+------+---+---+------+-----+---+---+------+------+
  */
-static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
+static void disas_simd_indexed(DisasContext *s, uint32_t insn)
 {
     /* This encoding has two kinds of instruction:
      *  normal, where we perform elt x idxelt => elt for each
@@ -7820,6 +7814,7 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
      *     double the width of the input element
      * The long ops have a 'part' specifier (ie come in INSN, INSN2 pairs).
      */
+    bool is_scalar = extract32(insn, 28, 1);
     bool is_q = extract32(insn, 30, 1);
     bool u = extract32(insn, 29, 1);
     int size = extract32(insn, 22, 2);
@@ -7839,7 +7834,7 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
     switch (opcode) {
     case 0x0: /* MLA */
     case 0x4: /* MLS */
-        if (!u) {
+        if (!u || is_scalar) {
             unallocated_encoding(s);
             return;
         }
@@ -7847,6 +7842,10 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
     case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
     case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
     case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
+        if (is_scalar) {
+            unallocated_encoding(s);
+            return;
+        }
         is_long = true;
         break;
     case 0x3: /* SQDMLAL, SQDMLAL2 */
@@ -7856,12 +7855,17 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
         /* fall through */
     case 0xc: /* SQDMULH */
     case 0xd: /* SQRDMULH */
-    case 0x8: /* MUL */
         if (u) {
             unallocated_encoding(s);
             return;
         }
         break;
+    case 0x8: /* MUL */
+        if (u || is_scalar) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     case 0x1: /* FMLA */
     case 0x5: /* FMLS */
         if (u) {
@@ -7920,7 +7924,7 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
 
         read_vec_element(s, tcg_idx, rm, index, MO_64);
 
-        for (pass = 0; pass < 2; pass++) {
+        for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) {
             TCGv_i64 tcg_op = tcg_temp_new_i64();
             TCGv_i64 tcg_res = tcg_temp_new_i64();
 
@@ -7951,15 +7955,28 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
             tcg_temp_free_i64(tcg_res);
         }
 
+        if (is_scalar) {
+            clear_vec_high(s, rd);
+        }
+
         tcg_temp_free_i64(tcg_idx);
     } else if (!is_long) {
-        /* 32 bit floating point, or 16 or 32 bit integer */
+        /* 32 bit floating point, or 16 or 32 bit integer.
+         * For the 16 bit scalar case we use the usual Neon helpers and
+         * rely on the fact that 0 op 0 == 0 with no side effects.
+         */
         TCGv_i32 tcg_idx = tcg_temp_new_i32();
-        int pass;
+        int pass, maxpasses;
+
+        if (is_scalar) {
+            maxpasses = 1;
+        } else {
+            maxpasses = is_q ? 4 : 2;
+        }
 
         read_vec_element_i32(s, tcg_idx, rm, index, size);
 
-        if (size == 1) {
+        if (size == 1 && !is_scalar) {
             /* The simplest way to handle the 16x16 indexed ops is to duplicate
              * the index into both halves of the 32 bit tcg_idx and then use
              * the usual Neon helpers.
@@ -7967,11 +7984,11 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
             tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
         }
 
-        for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+        for (pass = 0; pass < maxpasses; pass++) {
             TCGv_i32 tcg_op = tcg_temp_new_i32();
             TCGv_i32 tcg_res = tcg_temp_new_i32();
 
-            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
+            read_vec_element_i32(s, tcg_op, rn, pass, is_scalar ? size : MO_32);
 
             switch (opcode) {
             case 0x0: /* MLA */
@@ -8035,7 +8052,12 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
                 g_assert_not_reached();
             }
 
-            write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+            if (is_scalar) {
+                write_fp_sreg(s, rd, tcg_res);
+            } else {
+                write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+            }
+
             tcg_temp_free_i32(tcg_op);
             tcg_temp_free_i32(tcg_res);
         }
@@ -8061,11 +8083,18 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
 
             read_vec_element(s, tcg_idx, rm, index, memop);
 
-            for (pass = 0; pass < 2; pass++) {
+            for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) {
                 TCGv_i64 tcg_op = tcg_temp_new_i64();
                 TCGv_i64 tcg_passres;
+                int passelt;
 
-                read_vec_element(s, tcg_op, rn, pass + (is_q * 2), memop);
+                if (is_scalar) {
+                    passelt = 0;
+                } else {
+                    passelt = pass + (is_q * 2);
+                }
+
+                read_vec_element(s, tcg_op, rn, passelt, memop);
 
                 tcg_res[pass] = tcg_temp_new_i64();
 
@@ -8113,23 +8142,35 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
                 tcg_temp_free_i64(tcg_passres);
             }
             tcg_temp_free_i64(tcg_idx);
+
+            if (is_scalar) {
+                clear_vec_high(s, rd);
+            }
         } else {
             TCGv_i32 tcg_idx = tcg_temp_new_i32();
 
             assert(size == 1);
             read_vec_element_i32(s, tcg_idx, rm, index, size);
 
-            /* The simplest way to handle the 16x16 indexed ops is to duplicate
-             * the index into both halves of the 32 bit tcg_idx and then use
-             * the usual Neon helpers.
-             */
-            tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
+            if (!is_scalar) {
+                /* The simplest way to handle the 16x16 indexed ops is to
+                 * duplicate the index into both halves of the 32 bit tcg_idx
+                 * and then use the usual Neon helpers.
+                 */
+                tcg_gen_deposit_i32(tcg_idx, tcg_idx, tcg_idx, 16, 16);
+            }
 
-            for (pass = 0; pass < 2; pass++) {
+            for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) {
                 TCGv_i32 tcg_op = tcg_temp_new_i32();
                 TCGv_i64 tcg_passres;
 
-                read_vec_element_i32(s, tcg_op, rn, pass + (is_q * 2), MO_32);
+                if (is_scalar) {
+                    read_vec_element_i32(s, tcg_op, rn, pass, size);
+                } else {
+                    read_vec_element_i32(s, tcg_op, rn,
+                                         pass + (is_q * 2), MO_32);
+                }
+
                 tcg_res[pass] = tcg_temp_new_i64();
 
                 if (opcode == 0xa || opcode == 0xb) {
@@ -8180,6 +8221,14 @@ static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
                 tcg_temp_free_i64(tcg_passres);
             }
             tcg_temp_free_i32(tcg_idx);
+
+            if (is_scalar) {
+                tcg_gen_ext32u_i64(tcg_res[0], tcg_res[0]);
+            }
+        }
+
+        if (is_scalar) {
+            tcg_res[1] = tcg_const_i64(0);
         }
 
         for (pass = 0; pass < 2; pass++) {
@@ -8238,7 +8287,7 @@ static const AArch64DecodeTable data_proc_simd[] = {
     { 0x0e200800, 0x9f3e0c00, disas_simd_two_reg_misc },
     { 0x0e300800, 0x9f3e0c00, disas_simd_across_lanes },
     { 0x0e000400, 0x9fe08400, disas_simd_copy },
-    { 0x0f000000, 0x9f000400, disas_simd_indexed_vector },
+    { 0x0f000000, 0x9f000400, disas_simd_indexed }, /* vector indexed */
     /* simd_mod_imm decode is a subset of simd_shift_imm, so must precede it */
     { 0x0f000400, 0x9ff80400, disas_simd_mod_imm },
     { 0x0f000400, 0x9f800400, disas_simd_shift_imm },
@@ -8250,7 +8299,7 @@ static const AArch64DecodeTable data_proc_simd[] = {
     { 0x5e200800, 0xdf3e0c00, disas_simd_scalar_two_reg_misc },
     { 0x5e300800, 0xdf3e0c00, disas_simd_scalar_pairwise },
     { 0x5e000400, 0xdfe08400, disas_simd_scalar_copy },
-    { 0x5f000000, 0xdf000400, disas_simd_scalar_indexed },
+    { 0x5f000000, 0xdf000400, disas_simd_indexed }, /* scalar indexed */
     { 0x5f000400, 0xdf800400, disas_simd_scalar_shift_imm },
     { 0x4e280800, 0xff3e0c00, disas_crypto_aes },
     { 0x5e000000, 0xff208c00, disas_crypto_three_reg_sha },
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 4/8] target-arm: A64: Implement scalar three different instructions
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
                   ` (2 preceding siblings ...)
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Implement SIMD scalar indexed instructions Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Implement SIMD FP compare and set insns Peter Maydell
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the scalar three different instruction group:
it only has three instructions in it.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 95 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 0ea8eb2..e38f7f5 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5838,7 +5838,100 @@ static void disas_simd_scalar_shift_imm(DisasContext *s, uint32_t insn)
  */
 static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
 {
-    unsupported_encoding(s, insn);
+    bool is_u = extract32(insn, 29, 1);
+    int size = extract32(insn, 22, 2);
+    int opcode = extract32(insn, 12, 4);
+    int rm = extract32(insn, 16, 5);
+    int rn = extract32(insn, 5, 5);
+    int rd = extract32(insn, 0, 5);
+
+    if (is_u) {
+        unallocated_encoding(s);
+        return;
+    }
+
+    switch (opcode) {
+    case 0x9: /* SQDMLAL, SQDMLAL2 */
+    case 0xb: /* SQDMLSL, SQDMLSL2 */
+    case 0xd: /* SQDMULL, SQDMULL2 */
+        if (size == 0 || size == 3) {
+            unallocated_encoding(s);
+            return;
+        }
+        break;
+    default:
+        unallocated_encoding(s);
+        return;
+    }
+
+    if (size == 2) {
+        TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+        TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+        TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+        read_vec_element(s, tcg_op1, rn, 0, MO_32 | MO_SIGN);
+        read_vec_element(s, tcg_op2, rm, 0, MO_32 | MO_SIGN);
+
+        tcg_gen_mul_i64(tcg_res, tcg_op1, tcg_op2);
+        gen_helper_neon_addl_saturate_s64(tcg_res, cpu_env, tcg_res, tcg_res);
+
+        switch (opcode) {
+        case 0xd: /* SQDMULL, SQDMULL2 */
+            break;
+        case 0xb: /* SQDMLSL, SQDMLSL2 */
+            tcg_gen_neg_i64(tcg_res, tcg_res);
+            /* fall through */
+        case 0x9: /* SQDMLAL, SQDMLAL2 */
+            read_vec_element(s, tcg_op1, rd, 0, MO_64);
+            gen_helper_neon_addl_saturate_s64(tcg_res, cpu_env,
+                                              tcg_res, tcg_op1);
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        write_fp_dreg(s, rd, tcg_res);
+
+        tcg_temp_free_i64(tcg_op1);
+        tcg_temp_free_i64(tcg_op2);
+        tcg_temp_free_i64(tcg_res);
+    } else {
+        TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+        TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+        TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+        read_vec_element_i32(s, tcg_op1, rn, 0, MO_16);
+        read_vec_element_i32(s, tcg_op2, rm, 0, MO_16);
+
+        gen_helper_neon_mull_s16(tcg_res, tcg_op1, tcg_op2);
+        gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env, tcg_res, tcg_res);
+
+        switch (opcode) {
+        case 0xd: /* SQDMULL, SQDMULL2 */
+            break;
+        case 0xb: /* SQDMLSL, SQDMLSL2 */
+            gen_helper_neon_negl_u32(tcg_res, tcg_res);
+            /* fall through */
+        case 0x9: /* SQDMLAL, SQDMLAL2 */
+        {
+            TCGv_i64 tcg_op3 = tcg_temp_new_i64();
+            read_vec_element(s, tcg_op3, rd, 0, MO_32);
+            gen_helper_neon_addl_saturate_s32(tcg_res, cpu_env,
+                                              tcg_res, tcg_op3);
+            tcg_temp_free_i64(tcg_op3);
+            break;
+        }
+        default:
+            g_assert_not_reached();
+        }
+
+        tcg_gen_ext32u_i64(tcg_res, tcg_res);
+        write_fp_dreg(s, rd, tcg_res);
+
+        tcg_temp_free_i32(tcg_op1);
+        tcg_temp_free_i32(tcg_op2);
+        tcg_temp_free_i64(tcg_res);
+    }
 }
 
 static void handle_3same_64(DisasContext *s, int opcode, bool u,
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 5/8] target-arm: A64: Implement SIMD FP compare and set insns
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
                   ` (3 preceding siblings ...)
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 4/8] target-arm: A64: Implement scalar three different instructions Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Implement floating point pairwise insns Peter Maydell
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

From: Alex Bennée <alex.bennee@linaro.org>

This adds all forms of the SIMD floating point and set instructions:

  FCM(GT|GE|EQ|LE|LT)

Most of the heavy lifting is done by either the existing neon helpers or
some new helpers for the 64bit double cases. Most of the code paths are
common although the 2misc versions are a little special as they compare
against zero.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[PMM: fixed some minor bugs, added the 2-misc-scalar encoding]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helper-a64.c    |  19 +++++
 target-arm/helper-a64.h    |   3 +
 target-arm/translate-a64.c | 197 ++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 207 insertions(+), 12 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index fe90a5c..b4cab51 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -179,3 +179,22 @@ uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
     }
     return result;
 }
+
+/* 64bit/double versions of the neon float compare functions */
+uint64_t HELPER(neon_ceq_f64)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return -float64_eq_quiet(a, b, fpst);
+}
+
+uint64_t HELPER(neon_cge_f64)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return -float64_le(b, a, fpst);
+}
+
+uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    return -float64_lt(b, a, fpst);
+}
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index 84310e8..bf20466 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -29,3 +29,6 @@ DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
 DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, i32)
 DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
 DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
+DEF_HELPER_FLAGS_3(neon_ceq_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
+DEF_HELPER_FLAGS_3(neon_cge_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
+DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index e38f7f5..9b8c324 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -77,6 +77,8 @@ typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
 typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
 typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
 typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
+typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
+typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
 
 /* initialize TCG globals.  */
 void a64_translate_init(void)
@@ -6049,6 +6051,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x1a: /* FADD */
                 gen_helper_vfp_addd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x1c: /* FCMEQ */
+                gen_helper_neon_ceq_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x1e: /* FMAX */
                 gen_helper_vfp_maxd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6064,6 +6069,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x5b: /* FMUL */
                 gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x5c: /* FCMGE */
+                gen_helper_neon_cge_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x5f: /* FDIV */
                 gen_helper_vfp_divd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6071,6 +6079,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
                 gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
                 gen_helper_vfp_absd(tcg_res, tcg_res);
                 break;
+            case 0x7c: /* FCMGT */
+                gen_helper_neon_cgt_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             default:
                 g_assert_not_reached();
             }
@@ -6093,6 +6104,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x1a: /* FADD */
                 gen_helper_vfp_adds(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x1c: /* FCMEQ */
+                gen_helper_neon_ceq_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x1e: /* FMAX */
                 gen_helper_vfp_maxs(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6111,6 +6125,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x5b: /* FMUL */
                 gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x5c: /* FCMGE */
+                gen_helper_neon_cge_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x5f: /* FDIV */
                 gen_helper_vfp_divs(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6118,6 +6135,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
                 gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
                 gen_helper_vfp_abss(tcg_res, tcg_res);
                 break;
+            case 0x7c: /* FCMGT */
+                gen_helper_neon_cgt_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             default:
                 g_assert_not_reached();
             }
@@ -6168,15 +6188,15 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
         int fpopcode = opcode | (extract32(size, 1, 1) << 5) | (u << 6);
         switch (fpopcode) {
         case 0x1b: /* FMULX */
-        case 0x1c: /* FCMEQ */
         case 0x1f: /* FRECPS */
         case 0x3f: /* FRSQRTS */
-        case 0x5c: /* FCMGE */
         case 0x5d: /* FACGE */
-        case 0x7c: /* FCMGT */
         case 0x7d: /* FACGT */
             unsupported_encoding(s, insn);
             return;
+        case 0x1c: /* FCMEQ */
+        case 0x5c: /* FCMGE */
+        case 0x7c: /* FCMGT */
         case 0x7a: /* FABD */
             break;
         default:
@@ -6361,6 +6381,115 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
     }
 }
 
+static void handle_2misc_fcmp_zero(DisasContext *s, int opcode,
+                                   bool is_scalar, bool is_u, bool is_q,
+                                   int size, int rn, int rd)
+{
+    bool is_double = (size == 3);
+    TCGv_ptr fpst = get_fpstatus_ptr();
+
+    if (is_double) {
+        TCGv_i64 tcg_op = tcg_temp_new_i64();
+        TCGv_i64 tcg_zero = tcg_const_i64(0);
+        TCGv_i64 tcg_res = tcg_temp_new_i64();
+        NeonGenTwoDoubleOPFn *genfn;
+        bool swap = false;
+        int pass;
+
+        switch (opcode) {
+        case 0x2e: /* FCMLT (zero) */
+            swap = true;
+            /* fallthrough */
+        case 0x2c: /* FCMGT (zero) */
+            genfn = gen_helper_neon_cgt_f64;
+            break;
+        case 0x2d: /* FCMEQ (zero) */
+            genfn = gen_helper_neon_ceq_f64;
+            break;
+        case 0x6d: /* FCMLE (zero) */
+            swap = true;
+            /* fall through */
+        case 0x6c: /* FCMGE (zero) */
+            genfn = gen_helper_neon_cge_f64;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) {
+            read_vec_element(s, tcg_op, rn, pass, MO_64);
+            if (swap) {
+                genfn(tcg_res, tcg_zero, tcg_op, fpst);
+            } else {
+                genfn(tcg_res, tcg_op, tcg_zero, fpst);
+            }
+            write_vec_element(s, tcg_res, rd, pass, MO_64);
+        }
+        if (is_scalar) {
+            clear_vec_high(s, rd);
+        }
+
+        tcg_temp_free_i64(tcg_res);
+        tcg_temp_free_i64(tcg_zero);
+        tcg_temp_free_i64(tcg_op);
+    } else {
+        TCGv_i32 tcg_op = tcg_temp_new_i32();
+        TCGv_i32 tcg_zero = tcg_const_i32(0);
+        TCGv_i32 tcg_res = tcg_temp_new_i32();
+        NeonGenTwoSingleOPFn *genfn;
+        bool swap = false;
+        int pass, maxpasses;
+
+        switch (opcode) {
+        case 0x2e: /* FCMLT (zero) */
+            swap = true;
+            /* fall through */
+        case 0x2c: /* FCMGT (zero) */
+            genfn = gen_helper_neon_cgt_f32;
+            break;
+        case 0x2d: /* FCMEQ (zero) */
+            genfn = gen_helper_neon_ceq_f32;
+            break;
+        case 0x6d: /* FCMLE (zero) */
+            swap = true;
+            /* fall through */
+        case 0x6c: /* FCMGE (zero) */
+            genfn = gen_helper_neon_cge_f32;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+
+        if (is_scalar) {
+            maxpasses = 1;
+        } else {
+            maxpasses = is_q ? 4 : 2;
+        }
+
+        for (pass = 0; pass < maxpasses; pass++) {
+            read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
+            if (swap) {
+                genfn(tcg_res, tcg_zero, tcg_op, fpst);
+            } else {
+                genfn(tcg_res, tcg_op, tcg_zero, fpst);
+            }
+            if (is_scalar) {
+                write_fp_sreg(s, rd, tcg_res);
+            } else {
+                write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+            }
+        }
+        tcg_temp_free_i32(tcg_res);
+        tcg_temp_free_i32(tcg_zero);
+        tcg_temp_free_i32(tcg_op);
+        if (!is_q && !is_scalar) {
+            clear_vec_high(s, rd);
+        }
+    }
+
+    tcg_temp_free_ptr(fpst);
+}
+
 /* C3.6.12 AdvSIMD scalar two reg misc
  *  31 30  29 28       24 23  22 21       17 16    12 11 10 9    5 4    0
  * +-----+---+-----------+------+-----------+--------+-----+------+------+
@@ -6390,9 +6519,47 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
             return;
         }
         break;
+    case 0xc ... 0xf:
+    case 0x16 ... 0x1d:
+    case 0x1f:
+        /* Floating point: U, size[1] and opcode indicate operation;
+         * size[0] indicates single or double precision.
+         */
+        opcode |= (extract32(size, 1, 1) << 5) | (u << 6);
+        size = extract32(size, 0, 1) ? 3 : 2;
+        switch (opcode) {
+        case 0x2c: /* FCMGT (zero) */
+        case 0x2d: /* FCMEQ (zero) */
+        case 0x2e: /* FCMLT (zero) */
+        case 0x6c: /* FCMGE (zero) */
+        case 0x6d: /* FCMLE (zero) */
+            handle_2misc_fcmp_zero(s, opcode, true, u, true, size, rn, rd);
+            return;
+        case 0x1a: /* FCVTNS */
+        case 0x1b: /* FCVTMS */
+        case 0x1c: /* FCVTAS */
+        case 0x1d: /* SCVTF */
+        case 0x3a: /* FCVTPS */
+        case 0x3b: /* FCVTZS */
+        case 0x3d: /* FRECPE */
+        case 0x3f: /* FRECPX */
+        case 0x56: /* FCVTXN, FCVTXN2 */
+        case 0x5a: /* FCVTNU */
+        case 0x5b: /* FCVTMU */
+        case 0x5c: /* FCVTAU */
+        case 0x5d: /* UCVTF */
+        case 0x7a: /* FCVTPU */
+        case 0x7b: /* FCVTZU */
+        case 0x7d: /* FRSQRTE */
+            unsupported_encoding(s, insn);
+            return;
+        default:
+            unallocated_encoding(s);
+            return;
+        }
+        break;
     default:
         /* Other categories of encoding in this class:
-         *  + floating point (single and double)
          *  + SUQADD/USQADD/SQABS/SQNEG : size 8, 16, 32 or 64
          *  + SQXTN/SQXTN2/SQXTUN/SQXTUN2/UQXTN/UQXTN2:
          *    narrowing saturate ops: size 64/32/16 -> 32/16/8
@@ -7101,12 +7268,9 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
         unsupported_encoding(s, insn);
         return;
     case 0x1b: /* FMULX */
-    case 0x1c: /* FCMEQ */
     case 0x1f: /* FRECPS */
     case 0x3f: /* FRSQRTS */
-    case 0x5c: /* FCMGE */
     case 0x5d: /* FACGE */
-    case 0x7c: /* FCMGT */
     case 0x7d: /* FACGT */
     case 0x19: /* FMLA */
     case 0x39: /* FMLS */
@@ -7114,13 +7278,16 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
         return;
     case 0x18: /* FMAXNM */
     case 0x1a: /* FADD */
+    case 0x1c: /* FCMEQ */
     case 0x1e: /* FMAX */
     case 0x38: /* FMINNM */
     case 0x3a: /* FSUB */
     case 0x3e: /* FMIN */
     case 0x5b: /* FMUL */
+    case 0x5c: /* FCMGE */
     case 0x5f: /* FDIV */
     case 0x7a: /* FABD */
+    case 0x7c: /* FCMGT */
         handle_3same_float(s, size, elements, fpopcode, rd, rn, rm);
         return;
     default:
@@ -7700,6 +7867,17 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
                 return;
             }
             break;
+        case 0x2c: /* FCMGT (zero) */
+        case 0x2d: /* FCMEQ (zero) */
+        case 0x2e: /* FCMLT (zero) */
+        case 0x6c: /* FCMGE (zero) */
+        case 0x6d: /* FCMLE (zero) */
+            if (size == 3 && !is_q) {
+                unallocated_encoding(s);
+                return;
+            }
+            handle_2misc_fcmp_zero(s, opcode, false, u, is_q, size, rn, rd);
+            return;
         case 0x16: /* FCVTN, FCVTN2 */
         case 0x17: /* FCVTL, FCVTL2 */
         case 0x18: /* FRINTN */
@@ -7708,9 +7886,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         case 0x1b: /* FCVTMS */
         case 0x1c: /* FCVTAS */
         case 0x1d: /* SCVTF */
-        case 0x2c: /* FCMGT (zero) */
-        case 0x2d: /* FCMEQ (zero) */
-        case 0x2e: /* FCMLT (zero) */
         case 0x38: /* FRINTP */
         case 0x39: /* FRINTZ */
         case 0x3a: /* FCVTPS */
@@ -7724,8 +7899,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
         case 0x5b: /* FCVTMU */
         case 0x5c: /* FCVTAU */
         case 0x5d: /* UCVTF */
-        case 0x6c: /* FCMGE (zero) */
-        case 0x6d: /* FCMLE (zero) */
         case 0x79: /* FRINTI */
         case 0x7a: /* FCVTPU */
         case 0x7b: /* FCVTZU */
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 6/8] target-arm: A64: Implement floating point pairwise insns
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
                   ` (4 preceding siblings ...)
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Implement SIMD FP compare and set insns Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation Peter Maydell
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

From: Alex Bennée <alex.bennee@linaro.org>

Add support for the floating-point pairwise operations
FADDP, FMAXP, FMAXNMP, FMINP and FMINNMP. To do this we use the
code which was previously handling only integer pairwise operations,
and push the integer-specific decode and handling of unallocated
cases up one level in the call tree, so we can also call it from
the floating-point section of the decoder.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/translate-a64.c | 124 +++++++++++++++++++++++++++++++--------------
 1 file changed, 86 insertions(+), 38 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 9b8c324..e5de1ec 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7105,39 +7105,22 @@ static void gen_min_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
     tcg_gen_movcond_i32(TCG_COND_LEU, res, op1, op2, op1, op2);
 }
 
-/* Pairwise op subgroup of C3.6.16. */
-static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
+/* Pairwise op subgroup of C3.6.16.
+ *
+ * This is called directly or via the handle_3same_float for float pairwise
+ * operations where the opcode and size are calculated differently.
+ */
+static void handle_simd_3same_pair(DisasContext *s, int is_q, int u, int opcode,
+                                   int size, int rn, int rm, int rd)
 {
-    int is_q = extract32(insn, 30, 1);
-    int u = extract32(insn, 29, 1);
-    int size = extract32(insn, 22, 2);
-    int opcode = extract32(insn, 11, 5);
-    int rm = extract32(insn, 16, 5);
-    int rn = extract32(insn, 5, 5);
-    int rd = extract32(insn, 0, 5);
+    TCGv_ptr fpst;
     int pass;
 
-    if (size == 3 && !is_q) {
-        unallocated_encoding(s);
-        return;
-    }
-
-    switch (opcode) {
-    case 0x14: /* SMAXP, UMAXP */
-    case 0x15: /* SMINP, UMINP */
-        if (size == 3) {
-            unallocated_encoding(s);
-            return;
-        }
-        break;
-    case 0x17:
-        if (u) {
-            unallocated_encoding(s);
-            return;
-        }
-        break;
-    default:
-        g_assert_not_reached();
+    /* Floating point operations need fpst */
+    if (opcode >= 0x58) {
+        fpst = get_fpstatus_ptr();
+    } else {
+        TCGV_UNUSED_PTR(fpst);
     }
 
     /* These operations work on the concatenated rm:rn, with each pair of
@@ -7155,9 +7138,28 @@ static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
             read_vec_element(s, tcg_op2, passreg, 1, MO_64);
             tcg_res[pass] = tcg_temp_new_i64();
 
-            /* The only 64 bit pairwise integer op is ADDP */
-            assert(opcode == 0x17);
-            tcg_gen_add_i64(tcg_res[pass], tcg_op1, tcg_op2);
+            switch (opcode) {
+            case 0x17: /* ADDP */
+                tcg_gen_add_i64(tcg_res[pass], tcg_op1, tcg_op2);
+                break;
+            case 0x58: /* FMAXNMP */
+                gen_helper_vfp_maxnumd(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5a: /* FADDP */
+                gen_helper_vfp_addd(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5e: /* FMAXP */
+                gen_helper_vfp_maxd(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x78: /* FMINNMP */
+                gen_helper_vfp_minnumd(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x7e: /* FMINP */
+                gen_helper_vfp_mind(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            default:
+                g_assert_not_reached();
+            }
 
             tcg_temp_free_i64(tcg_op1);
             tcg_temp_free_i64(tcg_op2);
@@ -7174,7 +7176,7 @@ static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
         for (pass = 0; pass < maxpass; pass++) {
             TCGv_i32 tcg_op1 = tcg_temp_new_i32();
             TCGv_i32 tcg_op2 = tcg_temp_new_i32();
-            NeonGenTwoOpFn *genfn;
+            NeonGenTwoOpFn *genfn = NULL;
             int passreg = pass < (maxpass / 2) ? rn : rm;
             int passelt = (is_q && (pass & 1)) ? 2 : 0;
 
@@ -7213,11 +7215,30 @@ static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
                 genfn = fns[size][u];
                 break;
             }
+            /* The FP operations are all on single floats (32 bit) */
+            case 0x58: /* FMAXNMP */
+                gen_helper_vfp_maxnums(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5a: /* FADDP */
+                gen_helper_vfp_adds(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x5e: /* FMAXP */
+                gen_helper_vfp_maxs(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x78: /* FMINNMP */
+                gen_helper_vfp_minnums(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
+            case 0x7e: /* FMINP */
+                gen_helper_vfp_mins(tcg_res[pass], tcg_op1, tcg_op2, fpst);
+                break;
             default:
                 g_assert_not_reached();
             }
 
-            genfn(tcg_res[pass], tcg_op1, tcg_op2);
+            /* FP ops called directly, otherwise call now */
+            if (genfn) {
+                genfn(tcg_res[pass], tcg_op1, tcg_op2);
+            }
 
             tcg_temp_free_i32(tcg_op1);
             tcg_temp_free_i32(tcg_op2);
@@ -7231,6 +7252,10 @@ static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
             clear_vec_high(s, rd);
         }
     }
+
+    if (!TCGV_IS_UNUSED_PTR(fpst)) {
+        tcg_temp_free_ptr(fpst);
+    }
 }
 
 /* Floating point op subgroup of C3.6.16. */
@@ -7264,8 +7289,12 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
     case 0x5e: /* FMAXP */
     case 0x78: /* FMINNMP */
     case 0x7e: /* FMINP */
-        /* pairwise ops */
-        unsupported_encoding(s, insn);
+        if (size && !is_q) {
+            unallocated_encoding(s);
+            return;
+        }
+        handle_simd_3same_pair(s, is_q, 0, fpopcode, size ? MO_64 : MO_32,
+                               rn, rm, rd);
         return;
     case 0x1b: /* FMULX */
     case 0x1f: /* FRECPS */
@@ -7615,9 +7644,28 @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
     case 0x17: /* ADDP */
     case 0x14: /* SMAXP, UMAXP */
     case 0x15: /* SMINP, UMINP */
+    {
         /* Pairwise operations */
-        disas_simd_3same_pair(s, insn);
+        int is_q = extract32(insn, 30, 1);
+        int u = extract32(insn, 29, 1);
+        int size = extract32(insn, 22, 2);
+        int rm = extract32(insn, 16, 5);
+        int rn = extract32(insn, 5, 5);
+        int rd = extract32(insn, 0, 5);
+        if (opcode == 0x17) {
+            if (u || (size == 3 && !is_q)) {
+                unallocated_encoding(s);
+                return;
+            }
+        } else {
+            if (size == 3) {
+                unallocated_encoding(s);
+                return;
+            }
+        }
+        handle_simd_3same_pair(s, is_q, u, opcode, size, rn, rm, rd);
         break;
+    }
     case 0x18 ... 0x31:
         /* floating point ops, sz[1] and U are part of opcode */
         disas_simd_3same_float(s, insn);
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
                   ` (5 preceding siblings ...)
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Implement floating point pairwise insns Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-10 20:15   ` Richard Henderson
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Implement remaining 3-same instructions Peter Maydell
  2014-02-10 20:15 ` [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Richard Henderson
  8 siblings, 1 reply; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

The ARMv8 instruction set includes a fused floating point
reciprocal square root step instruction which demands an
"(x * y + z) / 2" fused operation. Support this by adding
a flag to the softfloat muladd operations which requests
that the result is halved before rounding.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 fpu/softfloat.c         | 38 ++++++++++++++++++++++++++++++++++++++
 include/fpu/softfloat.h |  3 +++
 2 files changed, 41 insertions(+)

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index e0ea599..c8f0370 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -2372,6 +2372,17 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM)
             }
         }
         /* Zero plus something non-zero : just return the something */
+        if (flags & float_muladd_halve_result) {
+            if (cExp == 0) {
+                shift32RightJamming(cSig, 1, &cSig);
+            } else if (cExp == 1) {
+                shift32RightJamming(cSig, 1, &cSig);
+                cSig |= (1 << 22);
+                cExp = 0;
+            } else {
+                cExp--;
+            }
+        }
         return packFloat32(cSign ^ signflip, cExp, cSig);
     }
 
@@ -2408,6 +2419,9 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM)
             /* Throw out the special case of c being an exact zero now */
             shift64RightJamming(pSig64, 32, &pSig64);
             pSig = pSig64;
+            if (flags & float_muladd_halve_result) {
+                pExp--;
+            }
             return roundAndPackFloat32(zSign, pExp - 1,
                                        pSig STATUS_VAR);
         }
@@ -2472,6 +2486,10 @@ float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS_PARAM)
         zSig64 <<= shiftcount;
         zExp -= shiftcount;
     }
+    if (flags & float_muladd_halve_result) {
+        zExp--;
+    }
+
     shift64RightJamming(zSig64, 32, &zSig64);
     return roundAndPackFloat32(zSign, zExp, zSig64 STATUS_VAR);
 }
@@ -4088,6 +4106,17 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM)
             }
         }
         /* Zero plus something non-zero : just return the something */
+        if (flags & float_muladd_halve_result) {
+            if (cExp == 0) {
+                shift64RightJamming(cSig, 1, &cSig);
+            } else if (cExp == 1) {
+                shift64RightJamming(cSig, 1, &cSig);
+                cSig |= (1ULL << 51);
+                cExp = 0;
+            } else {
+                cExp--;
+            }
+        }
         return packFloat64(cSign ^ signflip, cExp, cSig);
     }
 
@@ -4123,6 +4152,9 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM)
         if (!cSig) {
             /* Throw out the special case of c being an exact zero now */
             shift128RightJamming(pSig0, pSig1, 64, &pSig0, &pSig1);
+            if (flags & float_muladd_halve_result) {
+                pExp--;
+            }
             return roundAndPackFloat64(zSign, pExp - 1,
                                        pSig1 STATUS_VAR);
         }
@@ -4159,6 +4191,9 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM)
             zExp--;
         }
         shift128RightJamming(zSig0, zSig1, 64, &zSig0, &zSig1);
+        if (flags & float_muladd_halve_result) {
+            zExp--;
+        }
         return roundAndPackFloat64(zSign, zExp, zSig1 STATUS_VAR);
     } else {
         /* Subtraction */
@@ -4209,6 +4244,9 @@ float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS_PARAM)
                 zExp -= (shiftcount + 64);
             }
         }
+        if (flags & float_muladd_halve_result) {
+            zExp--;
+        }
         return roundAndPackFloat64(zSign, zExp, zSig0 STATUS_VAR);
     }
 }
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 806ae13..4b4df88 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -249,11 +249,14 @@ void float_raise( int8 flags STATUS_PARAM);
 | Using these differs from negating an input or output before calling
 | the muladd function in that this means that a NaN doesn't have its
 | sign bit inverted before it is propagated.
+| We also support halving the result before rounding, as a special
+| case to support the ARM fused-sqrt-step instruction FRSQRTS.
 *----------------------------------------------------------------------------*/
 enum {
     float_muladd_negate_c = 1,
     float_muladd_negate_product = 2,
     float_muladd_negate_result = 4,
+    float_muladd_halve_result = 8,
 };
 
 /*----------------------------------------------------------------------------
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [Qemu-devel] [PATCH 8/8] target-arm: A64: Implement remaining 3-same instructions
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
                   ` (6 preceding siblings ...)
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation Peter Maydell
@ 2014-02-07 21:49 ` Peter Maydell
  2014-02-10 20:15 ` [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Richard Henderson
  8 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-07 21:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson

Implement the remaining instructions in the SIMD 3-reg-same
and scalar-3-reg-same groups: FMULX, FRECPS, FRSQRTS, FACGE,
FACGT, FMLA and FMLS.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
 target-arm/helper-a64.c    | 60 ++++++++++++++++++++++++++++++++++++++++++++++
 target-arm/helper-a64.h    |  4 ++++
 target-arm/helper.h        |  2 ++
 target-arm/neon_helper.c   | 16 +++++++++++++
 target-arm/translate-a64.c | 52 ++++++++++++++++++++++++++++++++++++----
 5 files changed, 130 insertions(+), 4 deletions(-)

diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
index b4cab51..c2ce33e 100644
--- a/target-arm/helper-a64.c
+++ b/target-arm/helper-a64.c
@@ -198,3 +198,63 @@ uint64_t HELPER(neon_cgt_f64)(float64 a, float64 b, void *fpstp)
     float_status *fpst = fpstp;
     return -float64_lt(b, a, fpst);
 }
+
+/* Reciprocal step and sqrt step. Note that unlike the A32/T32
+ * versions, these do a fully fused multiply-add or
+ * multiply-add-and-halve.
+ */
+#define float32_two make_float32(0x40000000)
+#define float32_three make_float32(0x40400000)
+#define float32_one_point_five make_float32(0x3fc00000)
+
+#define float64_two make_float64(0x4000000000000000ULL)
+#define float64_three make_float64(0x4008000000000000ULL)
+#define float64_one_point_five make_float64(0x3FF8000000000000ULL)
+
+float32 HELPER(recpsf_f32)(float32 a, float32 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    a = float32_chs(a);
+    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
+        (float32_is_infinity(b) && float32_is_zero(a))) {
+        return float32_two;
+    }
+    return float32_muladd(a, b, float32_two, 0, fpst);
+}
+
+float64 HELPER(recpsf_f64)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    a = float64_chs(a);
+    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
+        (float64_is_infinity(b) && float64_is_zero(a))) {
+        return float64_two;
+    }
+    return float64_muladd(a, b, float64_two, 0, fpst);
+}
+
+float32 HELPER(rsqrtsf_f32)(float32 a, float32 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    a = float32_chs(a);
+    if ((float32_is_infinity(a) && float32_is_zero(b)) ||
+        (float32_is_infinity(b) && float32_is_zero(a))) {
+        return float32_one_point_five;
+    }
+    return float32_muladd(a, b, float32_three, float_muladd_halve_result, fpst);
+}
+
+float64 HELPER(rsqrtsf_f64)(float64 a, float64 b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+
+    a = float64_chs(a);
+    if ((float64_is_infinity(a) && float64_is_zero(b)) ||
+        (float64_is_infinity(b) && float64_is_zero(a))) {
+        return float64_one_point_five;
+    }
+    return float64_muladd(a, b, float64_three, float_muladd_halve_result, fpst);
+}
diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
index bf20466..ab9933c 100644
--- a/target-arm/helper-a64.h
+++ b/target-arm/helper-a64.h
@@ -32,3 +32,7 @@ DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
 DEF_HELPER_FLAGS_3(neon_ceq_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
 DEF_HELPER_FLAGS_3(neon_cge_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
 DEF_HELPER_FLAGS_3(neon_cgt_f64, TCG_CALL_NO_RWG, i64, i64, i64, ptr)
+DEF_HELPER_FLAGS_3(recpsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
+DEF_HELPER_FLAGS_3(recpsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
+DEF_HELPER_FLAGS_3(rsqrtsf_f32, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
+DEF_HELPER_FLAGS_3(rsqrtsf_f64, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 951e6ad..7c60121 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -382,6 +382,8 @@ DEF_HELPER_3(neon_cge_f32, i32, i32, i32, ptr)
 DEF_HELPER_3(neon_cgt_f32, i32, i32, i32, ptr)
 DEF_HELPER_3(neon_acge_f32, i32, i32, i32, ptr)
 DEF_HELPER_3(neon_acgt_f32, i32, i32, i32, ptr)
+DEF_HELPER_3(neon_acge_f64, i64, i64, i64, ptr)
+DEF_HELPER_3(neon_acgt_f64, i64, i64, i64, ptr)
 
 /* iwmmxt_helper.c */
 DEF_HELPER_2(iwmmxt_maddsq, i64, i64, i64)
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index b4c8690..13752ba 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1823,6 +1823,22 @@ uint32_t HELPER(neon_acgt_f32)(uint32_t a, uint32_t b, void *fpstp)
     return -float32_lt(f1, f0, fpst);
 }
 
+uint64_t HELPER(neon_acge_f64)(uint64_t a, uint64_t b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    float64 f0 = float64_abs(make_float64(a));
+    float64 f1 = float64_abs(make_float64(b));
+    return -float64_le(f1, f0, fpst);
+}
+
+uint64_t HELPER(neon_acgt_f64)(uint64_t a, uint64_t b, void *fpstp)
+{
+    float_status *fpst = fpstp;
+    float64 f0 = float64_abs(make_float64(a));
+    float64 f1 = float64_abs(make_float64(b));
+    return -float64_lt(f1, f0, fpst);
+}
+
 #define ELEM(V, N, SIZE) (((V) >> ((N) * (SIZE))) & ((1ull << (SIZE)) - 1))
 
 void HELPER(neon_qunzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index e5de1ec..2861b13 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6045,18 +6045,33 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             read_vec_element(s, tcg_op2, rm, pass, MO_64);
 
             switch (fpopcode) {
+            case 0x39: /* FMLS */
+                /* As usual for ARM, separate negation for fused multiply-add */
+                gen_helper_vfp_negd(tcg_op1, tcg_op1);
+                /* fall through */
+            case 0x19: /* FMLA */
+                read_vec_element(s, tcg_res, rd, pass, MO_64);
+                gen_helper_vfp_muladdd(tcg_res, tcg_op1, tcg_op2,
+                                       tcg_res, fpst);
+                break;
             case 0x18: /* FMAXNM */
                 gen_helper_vfp_maxnumd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
             case 0x1a: /* FADD */
                 gen_helper_vfp_addd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x1b: /* FMULX */
+                gen_helper_vfp_mulxd(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x1c: /* FCMEQ */
                 gen_helper_neon_ceq_f64(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
             case 0x1e: /* FMAX */
                 gen_helper_vfp_maxd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x1f: /* FRECPS */
+                gen_helper_recpsf_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x38: /* FMINNM */
                 gen_helper_vfp_minnumd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6066,12 +6081,18 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x3e: /* FMIN */
                 gen_helper_vfp_mind(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x3f: /* FRSQRTS */
+                gen_helper_rsqrtsf_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x5b: /* FMUL */
                 gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
             case 0x5c: /* FCMGE */
                 gen_helper_neon_cge_f64(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x5d: /* FACGE */
+                gen_helper_neon_acge_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x5f: /* FDIV */
                 gen_helper_vfp_divd(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6082,6 +6103,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x7c: /* FCMGT */
                 gen_helper_neon_cgt_f64(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x7d: /* FACGT */
+                gen_helper_neon_acgt_f64(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             default:
                 g_assert_not_reached();
             }
@@ -6101,15 +6125,30 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
 
             switch (fpopcode) {
+            case 0x39: /* FMLS */
+                /* As usual for ARM, separate negation for fused multiply-add */
+                gen_helper_vfp_negs(tcg_op1, tcg_op1);
+                /* fall through */
+            case 0x19: /* FMLA */
+                read_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+                gen_helper_vfp_muladds(tcg_res, tcg_op1, tcg_op2,
+                                       tcg_res, fpst);
+                break;
             case 0x1a: /* FADD */
                 gen_helper_vfp_adds(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x1b: /* FMULX */
+                gen_helper_vfp_mulxs(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x1c: /* FCMEQ */
                 gen_helper_neon_ceq_f32(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
             case 0x1e: /* FMAX */
                 gen_helper_vfp_maxs(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x1f: /* FRECPS */
+                gen_helper_recpsf_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x18: /* FMAXNM */
                 gen_helper_vfp_maxnums(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6122,12 +6161,18 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x3e: /* FMIN */
                 gen_helper_vfp_mins(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x3f: /* FRSQRTS */
+                gen_helper_rsqrtsf_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x5b: /* FMUL */
                 gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
             case 0x5c: /* FCMGE */
                 gen_helper_neon_cge_f32(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x5d: /* FACGE */
+                gen_helper_neon_acge_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             case 0x5f: /* FDIV */
                 gen_helper_vfp_divs(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
@@ -6138,6 +6183,9 @@ static void handle_3same_float(DisasContext *s, int size, int elements,
             case 0x7c: /* FCMGT */
                 gen_helper_neon_cgt_f32(tcg_res, tcg_op1, tcg_op2, fpst);
                 break;
+            case 0x7d: /* FACGT */
+                gen_helper_neon_acgt_f32(tcg_res, tcg_op1, tcg_op2, fpst);
+                break;
             default:
                 g_assert_not_reached();
             }
@@ -6192,8 +6240,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
         case 0x3f: /* FRSQRTS */
         case 0x5d: /* FACGE */
         case 0x7d: /* FACGT */
-            unsupported_encoding(s, insn);
-            return;
         case 0x1c: /* FCMEQ */
         case 0x5c: /* FCMGE */
         case 0x7c: /* FCMGT */
@@ -7303,8 +7349,6 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
     case 0x7d: /* FACGT */
     case 0x19: /* FMLA */
     case 0x39: /* FMLS */
-        unsupported_encoding(s, insn);
-        return;
     case 0x18: /* FMAXNM */
     case 0x1a: /* FADD */
     case 0x1c: /* FCMEQ */
-- 
1.8.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation Peter Maydell
@ 2014-02-10 20:15   ` Richard Henderson
  2014-02-10 21:31     ` Peter Maydell
  0 siblings, 1 reply; 13+ messages in thread
From: Richard Henderson @ 2014-02-10 20:15 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 02/07/2014 01:49 PM, Peter Maydell wrote:
>          /* Zero plus something non-zero : just return the something */
> +        if (flags & float_muladd_halve_result) {
> +            if (cExp == 0) {
> +                shift32RightJamming(cSig, 1, &cSig);
> +            } else if (cExp == 1) {
> +                shift32RightJamming(cSig, 1, &cSig);
> +                cSig |= (1 << 22);
> +                cExp = 0;
> +            } else {
> +                cExp--;
> +            }

Surely better to just use roundAndPackFloat for this case.  Looks to me that
you're missing inexact and underflow exceptions at least.

> +        }
>          return packFloat32(cSign ^ signflip, cExp, cSig);
>      }


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set
  2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
                   ` (7 preceding siblings ...)
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Implement remaining 3-same instructions Peter Maydell
@ 2014-02-10 20:15 ` Richard Henderson
  8 siblings, 0 replies; 13+ messages in thread
From: Richard Henderson @ 2014-02-10 20:15 UTC (permalink / raw)
  To: Peter Maydell, qemu-devel
  Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
	Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
	Alex Bennée, kvmarm, Christoffer Dall

On 02/07/2014 01:49 PM, Peter Maydell wrote:
> This is the fourth set of A64 Neon support patches. This set
> provides complete implementations of the categories:
>  * scalar x indexed
>  * vector x indexed
>  * scalar three-different
> and fills in all the previously missing instructions in:
>  * vector 3-reg-same
>  * scalar 3-reg-same
> 
> It includes adding support in softfloat for "fused multiply,
> add and then divide by two", because the A64 VRECPS instruction
> requires it.
> 
> By my reckoning the parts that are still missing for Neon are:
>  * parts of scalar shift-immediate and vector shift-immediate
>  * parts of scalar-2-misc
>  * most of vector-3-diff
>  * parts of vector-2-misc

Except for the nit in 7/8,

Reviewed-by: Richard Henderson <rth@twiddle.net>


r~

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation
  2014-02-10 20:15   ` Richard Henderson
@ 2014-02-10 21:31     ` Peter Maydell
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-10 21:31 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Peter Crosthwaite, Patch Tracking, Michael Matz, Alexander Graf,
	QEMU Developers, Claudio Fontana, Dirk Mueller, Will Newton,
	Laurent Desnogues, Alex Bennée, kvmarm@lists.cs.columbia.edu,
	Christoffer Dall

On 10 February 2014 20:15, Richard Henderson <rth@twiddle.net> wrote:
> On 02/07/2014 01:49 PM, Peter Maydell wrote:
>>          /* Zero plus something non-zero : just return the something */
>> +        if (flags & float_muladd_halve_result) {
>> +            if (cExp == 0) {
>> +                shift32RightJamming(cSig, 1, &cSig);
>> +            } else if (cExp == 1) {
>> +                shift32RightJamming(cSig, 1, &cSig);
>> +                cSig |= (1 << 22);
>> +                cExp = 0;
>> +            } else {
>> +                cExp--;
>> +            }
>
> Surely better to just use roundAndPackFloat for this case.  Looks to me that
> you're missing inexact and underflow exceptions at least.

Good point.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns
  2014-02-07 21:49 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns Peter Maydell
@ 2014-02-11 14:52   ` Peter Maydell
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Maydell @ 2014-02-11 14:52 UTC (permalink / raw)
  To: QEMU Developers
  Cc: Peter Crosthwaite, Laurent Desnogues, Patch Tracking,
	Michael Matz, Claudio Fontana, Dirk Mueller, Will Newton,
	kvmarm@lists.cs.columbia.edu, Richard Henderson

On 7 February 2014 21:49, Peter Maydell <peter.maydell@linaro.org> wrote:
> Implement all the SIMD vector x indexed element instructions
> in the subcategory which are not 'long' ops.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  target-arm/helper-a64.c    |  26 +++++
>  target-arm/helper-a64.h    |   2 +
>  target-arm/translate-a64.c | 245 ++++++++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 272 insertions(+), 1 deletion(-)
>
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 6ca958a..fe90a5c 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -123,6 +123,32 @@ uint64_t HELPER(vfp_cmped_a64)(float64 x, float64 y, void *fp_status)
>      return float_rel_to_flags(float64_compare(x, y, fp_status));
>  }
>
> +float32 HELPER(vfp_mulxs)(float32 a, float32 b, void *fpstp)
> +{
> +    float_status *fpst = fpstp;
> +
> +    if ((float32_is_zero(a) && float32_is_infinity(b)) ||
> +        (float32_is_infinity(a) && float32_is_zero(b))) {
> +        /* 2.0 with the sign bit set to sign(A) XOR sign(B) */
> +        return make_float32((1U << 30) |
> +                            ((float32_val(a) ^ float32_val(b)) & (1U << 31)));
> +    }
> +    return float32_mul(a, b, fpst);
> +}
> +
> +float64 HELPER(vfp_mulxd)(float64 a, float64 b, void *fpstp)
> +{
> +    float_status *fpst = fpstp;
> +
> +    if ((float64_is_zero(a) && float64_is_infinity(b)) ||
> +        (float64_is_infinity(a) && float64_is_zero(b))) {
> +        /* 2.0 with the sign bit set to sign(A) XOR sign(B) */
> +        return make_float64((1ULL << 62) |
> +                            ((float64_val(a) ^ float64_val(b)) & (1ULL << 63)));
> +    }
> +    return float64_mul(a, b, fpst);
> +}
> +
>  uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
>                            uint32_t rn, uint32_t numregs)
>  {
> diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h
> index 99832ee..84310e8 100644
> --- a/target-arm/helper-a64.h
> +++ b/target-arm/helper-a64.h
> @@ -27,3 +27,5 @@ DEF_HELPER_3(vfp_cmpes_a64, i64, f32, f32, ptr)
>  DEF_HELPER_3(vfp_cmpd_a64, i64, f64, f64, ptr)
>  DEF_HELPER_3(vfp_cmped_a64, i64, f64, f64, ptr)
>  DEF_HELPER_FLAGS_5(simd_tbl, TCG_CALL_NO_RWG_SE, i64, env, i64, i64, i32, i32)
> +DEF_HELPER_FLAGS_3(vfp_mulxs, TCG_CALL_NO_RWG, f32, f32, f32, ptr)
> +DEF_HELPER_FLAGS_3(vfp_mulxd, TCG_CALL_NO_RWG, f64, f64, f64, ptr)
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index d60223a..b7f1ecf 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -7813,7 +7813,250 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
>   */
>  static void disas_simd_indexed_vector(DisasContext *s, uint32_t insn)
>  {
> -    unsupported_encoding(s, insn);
> +    /* This encoding has two kinds of instruction:
> +     *  normal, where we perform elt x idxelt => elt for each
> +     *     element in the vector
> +     *  long, where we perform elt x idxelt and generate a result of
> +     *     double the width of the input element
> +     * The long ops have a 'part' specifier (ie come in INSN, INSN2 pairs).
> +     */
> +    bool is_q = extract32(insn, 30, 1);
> +    bool u = extract32(insn, 29, 1);
> +    int size = extract32(insn, 22, 2);
> +    int l = extract32(insn, 21, 1);
> +    int m = extract32(insn, 20, 1);
> +    /* Note that the Rm field here is only 4 bits, not 5 as it usually is */
> +    int rm = extract32(insn, 16, 4);
> +    int opcode = extract32(insn, 12, 4);
> +    int h = extract32(insn, 11, 1);
> +    int rn = extract32(insn, 5, 5);
> +    int rd = extract32(insn, 0, 5);
> +    bool is_long = false;
> +    bool is_fp = false;
> +    int index;
> +    TCGv_ptr fpst;
> +
> +    switch (opcode) {
> +    case 0x0: /* MLA */
> +    case 0x4: /* MLS */
> +        if (!u) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        break;
> +    case 0x2: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
> +    case 0x6: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
> +    case 0xa: /* SMULL, SMULL2, UMULL, UMULL2 */
> +        is_long = true;
> +        break;
> +    case 0x3: /* SQDMLAL, SQDMLAL2 */
> +    case 0x7: /* SQDMLSL, SQDMLSL2 */
> +    case 0xb: /* SQDMULL, SQDMULL2 */
> +        is_long = true;
> +        /* fall through */
> +    case 0xc: /* SQDMULH */
> +    case 0xd: /* SQRDMULH */
> +    case 0x8: /* MUL */
> +        if (u) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        break;
> +    case 0x1: /* FMLA */
> +    case 0x5: /* FMLS */
> +        if (u) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        /* fall through */
> +    case 0x9: /* FMUL, FMULX */
> +        if (!extract32(size, 1, 1)) {
> +            unallocated_encoding(s);
> +            return;
> +        }
> +        is_fp = true;
> +        break;
> +    }

This is missing the
    default:
        unallocated_encoding(s);
        return;

so the unallocated opcodes will fall through and
assert later. I think this is a trivial enough fixup
that I'm going to just fix it in target-arm.next
since I already put this patch in there before I
noticed, unless anybody disagrees.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-02-11 14:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-07 21:49 [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Implement plain vector SIMD indexed element insns Peter Maydell
2014-02-11 14:52   ` Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 2/8] target-arm: A64: Implement long vector x indexed insns Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Implement SIMD scalar indexed instructions Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 4/8] target-arm: A64: Implement scalar three different instructions Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Implement SIMD FP compare and set insns Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Implement floating point pairwise insns Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 7/8] softfloat: Support halving the result of muladd operation Peter Maydell
2014-02-10 20:15   ` Richard Henderson
2014-02-10 21:31     ` Peter Maydell
2014-02-07 21:49 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Implement remaining 3-same instructions Peter Maydell
2014-02-10 20:15 ` [Qemu-devel] [PATCH 0/8] A64: Neon support, fourth set Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).