* [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2
@ 2014-01-17 18:44 Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
` (8 more replies)
0 siblings, 9 replies; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
This is the second set of patches for A64 Neon. The last patch
set did a complete coverage of some of the smaller and simpler
instruction groupings; this patch set attacks a few of the
larger groupings (3-same; scalar 3-same; 3-different; shift-imm;
scalar shift-imm) and doesn't attempt complete coverage of them.
My rule of thumb was to (a) implement enough instructions to
demonstrate that the general decode and set of sub-functions/loops
was adequate (b) aim to cover at least as much as the SuSE tree.
Remaining things in SuSE 1.6 tree not yet implemented:
3-reg-same MLA, MLS, MUL, PMUL, SMAX, UMAX, SMIN, UMIN,
SSHL, USHL, SQSHL, UQSHL, SRSHL, URSHL
2-reg-misc XTN, FABS, FNEG, NOT
thanks
-- PMM
Alex Bennée (2):
target-arm: A64: Add logic ops from SIMD 3 same group
target-arm: A64: Add SIMD shift by immediate
Peter Maydell (6):
target-arm: A64: Add SIMD three-different multiply accumulate insns
target-arm: A64: Add SIMD three-different ABDL instructions
target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops
target-arm: A64: Add top level decode for SIMD 3-same group
target-arm: A64: Add integer ops from SIMD 3-same group
target-arm: A64: Add simple SIMD 3-same floating point ops
target-arm/translate-a64.c | 1190 +++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 1186 insertions(+), 4 deletions(-)
--
1.8.5
^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 1/8] target-arm: A64: Add SIMD three-different multiply accumulate insns
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 2/8] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
` (7 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
Add support for the multiply-accumulate instructions from the
SIMD three-different instructions group (C3.6.15):
* skeleton decode of unallocated encodings and split of
the group into its three sub-parts
* framework for handling the 64x64->128 widening subpart
* implementation of the multiply-accumulate instructions
SMLAL, SMLAL2, UMLAL, UMLAL2, SMLSL, SMLSL2, UMLSL, UMLSL2,
UMULL, UMULL2, SMULL, SMULL2
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 208 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 207 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 003bd9b..c5062e1 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5545,6 +5545,154 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
unsupported_encoding(s, insn);
}
+static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
+ int opcode, int rd, int rn, int rm)
+{
+ /* 3-reg-different widening insns: 64 x 64 -> 128 */
+ TCGv_i64 tcg_res[2];
+ int pass, accop;
+
+ tcg_res[0] = tcg_temp_new_i64();
+ tcg_res[1] = tcg_temp_new_i64();
+
+ /* Does this op do an adding accumulate, a subtracting accumulate,
+ * or no accumulate at all?
+ */
+ switch (opcode) {
+ case 5:
+ case 8:
+ case 9:
+ accop = 1;
+ break;
+ case 10:
+ case 11:
+ accop = -1;
+ break;
+ default:
+ accop = 0;
+ break;
+ }
+
+ if (accop != 0) {
+ read_vec_element(s, tcg_res[0], rd, 0, MO_64);
+ read_vec_element(s, tcg_res[1], rd, 1, MO_64);
+ }
+
+ /* size == 2 means two 32x32->64 operations; this is worth special
+ * casing because we can generally handle it inline.
+ */
+ if (size == 2) {
+ for (pass = 0; pass < 2; pass++) {
+ TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+ TCGv_i64 tcg_passres;
+ TCGMemOp memop = MO_32 | (is_u ? 0 : MO_SIGN);
+
+ int elt = pass + is_q * 2;
+
+ read_vec_element(s, tcg_op1, rn, elt, memop);
+ read_vec_element(s, tcg_op2, rm, elt, memop);
+
+ if (accop == 0) {
+ tcg_passres = tcg_res[pass];
+ } else {
+ tcg_passres = tcg_temp_new_i64();
+ }
+
+ switch (opcode) {
+ case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+ case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+ case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
+ tcg_gen_mul_i64(tcg_passres, tcg_op1, tcg_op2);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ if (accop > 0) {
+ tcg_gen_add_i64(tcg_res[pass], tcg_res[pass], tcg_passres);
+ tcg_temp_free_i64(tcg_passres);
+ } else if (accop < 0) {
+ tcg_gen_sub_i64(tcg_res[pass], tcg_res[pass], tcg_passres);
+ tcg_temp_free_i64(tcg_passres);
+ }
+
+ tcg_temp_free_i64(tcg_op1);
+ tcg_temp_free_i64(tcg_op2);
+ }
+ } else {
+ /* size 0 or 1, generally helper functions */
+ for (pass = 0; pass < 2; pass++) {
+ TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+ TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+ TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+ TCGv_i64 tcg_passres;
+ int elt = pass + is_q * 2;
+
+ read_vec_element(s, tcg_tmp, rn, elt, MO_32);
+ tcg_gen_trunc_i64_i32(tcg_op1, tcg_tmp);
+ read_vec_element(s, tcg_tmp, rm, elt, MO_32);
+ tcg_gen_trunc_i64_i32(tcg_op2, tcg_tmp);
+ tcg_temp_free_i64(tcg_tmp);
+
+ if (accop == 0) {
+ tcg_passres = tcg_res[pass];
+ } else {
+ tcg_passres = tcg_temp_new_i64();
+ }
+
+ switch (opcode) {
+ case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
+ case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
+ case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
+ if (size == 0) {
+ if (is_u) {
+ gen_helper_neon_mull_u8(tcg_passres, tcg_op1, tcg_op2);
+ } else {
+ gen_helper_neon_mull_s8(tcg_passres, tcg_op1, tcg_op2);
+ }
+ } else {
+ if (is_u) {
+ gen_helper_neon_mull_u16(tcg_passres, tcg_op1, tcg_op2);
+ } else {
+ gen_helper_neon_mull_s16(tcg_passres, tcg_op1, tcg_op2);
+ }
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ tcg_temp_free_i32(tcg_op1);
+ tcg_temp_free_i32(tcg_op2);
+
+ if (accop > 0) {
+ if (size == 0) {
+ gen_helper_neon_addl_u16(tcg_res[pass], tcg_res[pass],
+ tcg_passres);
+ } else {
+ gen_helper_neon_addl_u32(tcg_res[pass], tcg_res[pass],
+ tcg_passres);
+ }
+ tcg_temp_free_i64(tcg_passres);
+ } else if (accop < 0) {
+ if (size == 0) {
+ gen_helper_neon_subl_u16(tcg_res[pass], tcg_res[pass],
+ tcg_passres);
+ } else {
+ gen_helper_neon_subl_u32(tcg_res[pass], tcg_res[pass],
+ tcg_passres);
+ }
+ tcg_temp_free_i64(tcg_passres);
+ }
+ }
+ }
+
+ write_vec_element(s, tcg_res[0], rd, 0, MO_64);
+ write_vec_element(s, tcg_res[1], rd, 1, MO_64);
+ tcg_temp_free_i64(tcg_res[0]);
+ tcg_temp_free_i64(tcg_res[1]);
+}
+
/* C3.6.15 AdvSIMD three different
* 31 30 29 28 24 23 22 21 20 16 15 12 11 10 9 5 4 0
* +---+---+---+-----------+------+---+------+--------+-----+------+------+
@@ -5553,7 +5701,65 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
*/
static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ /* Instructions in this group fall into three basic classes
+ * (in each case with the operation working on each element in
+ * the input vectors):
+ * (1) widening 64 x 64 -> 128 (with possibly Vd as an extra
+ * 128 bit input)
+ * (2) wide 64 x 128 -> 128
+ * (3) narrowing 128 x 128 -> 64
+ * Here we do initial decode, catch unallocated cases and
+ * dispatch to separate functions for each class.
+ */
+ int is_q = extract32(insn, 30, 1);
+ int is_u = extract32(insn, 29, 1);
+ int size = extract32(insn, 22, 2);
+ int opcode = extract32(insn, 12, 4);
+ int rm = extract32(insn, 16, 5);
+ int rn = extract32(insn, 5, 5);
+ int rd = extract32(insn, 0, 5);
+
+ switch (opcode) {
+ case 1: /* SADDW, SADDW2, UADDW, UADDW2 */
+ case 3: /* SSUBW, SSUBW2, USUBW, USUBW2 */
+ /* 64 x 128 -> 128 */
+ unsupported_encoding(s, insn);
+ break;
+ case 4: /* ADDHN, ADDHN2, RADDHN, RADDHN2 */
+ case 6: /* SUBHN, SUBHN2, RSUBHN, RSUBHN2 */
+ /* 128 x 128 -> 64 */
+ unsupported_encoding(s, insn);
+ break;
+ case 9:
+ case 11:
+ case 13:
+ case 14:
+ if (is_u) {
+ unallocated_encoding(s);
+ return;
+ }
+ /* fall through */
+ case 0:
+ case 2:
+ case 5:
+ case 7:
+ unsupported_encoding(s, insn);
+ break;
+ case 8:
+ case 10:
+ case 12:
+ /* 64 x 64 -> 128 */
+ if (size == 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ handle_3rd_widening(s, is_q, is_u, size, opcode, rd, rn, rm);
+ break;
+ default:
+ /* opcode 15 not allocated */
+ unallocated_encoding(s);
+ break;
+ }
}
/* C3.6.16 AdvSIMD three same
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 2/8] target-arm: A64: Add SIMD three-different ABDL instructions
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
` (6 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
Implement the absolute-difference instructions in the SIMD
three-different group: SABAL, SABAL2, UABAL, UABAL2, SABDL,
SABDL2, UABDL, UABDL2.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index c5062e1..78f5eb9 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5600,6 +5600,21 @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
}
switch (opcode) {
+ case 5: /* SABAL, SABAL2, UABAL, UABAL2 */
+ case 7: /* SABDL, SABDL2, UABDL, UABDL2 */
+ {
+ TCGv_i64 tcg_tmp1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_tmp2 = tcg_temp_new_i64();
+
+ tcg_gen_sub_i64(tcg_tmp1, tcg_op1, tcg_op2);
+ tcg_gen_sub_i64(tcg_tmp2, tcg_op2, tcg_op1);
+ tcg_gen_movcond_i64(is_u ? TCG_COND_GEU : TCG_COND_GE,
+ tcg_passres,
+ tcg_op1, tcg_op2, tcg_tmp1, tcg_tmp2);
+ tcg_temp_free_i64(tcg_tmp1);
+ tcg_temp_free_i64(tcg_tmp2);
+ break;
+ }
case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
@@ -5642,6 +5657,22 @@ static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
}
switch (opcode) {
+ case 5: /* SABAL, SABAL2, UABAL, UABAL2 */
+ case 7: /* SABDL, SABDL2, UABDL, UABDL2 */
+ if (size == 0) {
+ if (is_u) {
+ gen_helper_neon_abdl_u16(tcg_passres, tcg_op1, tcg_op2);
+ } else {
+ gen_helper_neon_abdl_s16(tcg_passres, tcg_op1, tcg_op2);
+ }
+ } else {
+ if (is_u) {
+ gen_helper_neon_abdl_u32(tcg_passres, tcg_op1, tcg_op2);
+ } else {
+ gen_helper_neon_abdl_s32(tcg_passres, tcg_op1, tcg_op2);
+ }
+ }
+ break;
case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */
case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */
case 12: /* UMULL, UMULL2, SMULL, SMULL2 */
@@ -5741,10 +5772,10 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
/* fall through */
case 0:
case 2:
- case 5:
- case 7:
unsupported_encoding(s, insn);
break;
+ case 5:
+ case 7:
case 8:
case 10:
case 12:
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 3/8] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 2/8] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-21 19:26 ` Richard Henderson
2014-01-17 18:44 ` [Qemu-devel] [PATCH 4/8] target-arm: A64: Add top level decode for SIMD 3-same group Peter Maydell
` (5 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
Implement the add, sub and compare ops from the SIMD "scalar three same"
group.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 131 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 130 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 78f5eb9..2b3f3a3 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5501,6 +5501,58 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
unsupported_encoding(s, insn);
}
+static void handle_3same_64(DisasContext *s, int opcode, bool u,
+ TCGv_i64 tcg_rd, TCGv_i64 tcg_rn, TCGv_i64 tcg_rm)
+{
+ /* Handle 64x64->64 opcodes which are shared between the scalar
+ * and vector 3-same groups. We cover every opcode where size == 3
+ * is valid in either the three-reg-same (integer, not pairwise)
+ * or scalar-three-reg-same groups. (Some opcodes are not yet
+ * implemented.)
+ */
+ TCGCond cond;
+
+ switch (opcode) {
+ case 0x6: /* CMGT, CMHI */
+ /* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
+ * We implement this using setcond (!test) and subtracting 1.
+ */
+ cond = u ? TCG_COND_GTU : TCG_COND_GT;
+ do_cmop:
+ tcg_gen_setcond_i64(tcg_invert_cond(cond), tcg_rd, tcg_rn, tcg_rm);
+ tcg_gen_subi_i64(tcg_rd, tcg_rd, 1);
+ break;
+ case 0x7: /* CMGE, CMHS */
+ cond = u ? TCG_COND_GEU : TCG_COND_GE;
+ goto do_cmop;
+ case 0x11: /* CMTST, CMEQ */
+ if (u) {
+ cond = TCG_COND_EQ;
+ goto do_cmop;
+ }
+ /* CMTST : test is "if (X & Y != 0)". */
+ tcg_gen_and_i64(tcg_rd, tcg_rn, tcg_rm);
+ tcg_gen_setcondi_i64(TCG_COND_EQ, tcg_rd, tcg_rd, 0);
+ tcg_gen_subi_i64(tcg_rd, tcg_rd, 1);
+ break;
+ case 0x10: /* ADD, SUB */
+ if (u) {
+ tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
+ } else {
+ tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
+ }
+ break;
+ case 0x1: /* SQADD */
+ case 0x5: /* SQSUB */
+ case 0x8: /* SSHL, USHL */
+ case 0x9: /* SQSHL, UQSHL */
+ case 0xa: /* SRSHL, URSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
+ default:
+ g_assert_not_reached();
+ }
+}
+
/* C3.6.11 AdvSIMD scalar three same
* 31 30 29 28 24 23 22 21 20 16 15 11 10 9 5 4 0
* +-----+---+-----------+------+---+------+--------+---+------+------+
@@ -5509,7 +5561,84 @@ static void disas_simd_scalar_three_reg_diff(DisasContext *s, uint32_t insn)
*/
static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int opcode = extract32(insn, 11, 5);
+ int rm = extract32(insn, 16, 5);
+ int size = extract32(insn, 22, 2);
+ bool u = extract32(insn, 29, 1);
+ TCGv_i64 tcg_rn;
+ TCGv_i64 tcg_rm;
+ TCGv_i64 tcg_rd;
+
+ if (opcode >= 0x18) {
+ /* Floating point: U, size[1] and opcode indicate operation */
+ int fpopcode = opcode | (extract32(size, 1, 1) << 5) | (u << 6);
+ switch (fpopcode) {
+ case 0x1b: /* FMULX */
+ case 0x1c: /* FCMEQ */
+ case 0x1f: /* FRECPS */
+ case 0x3f: /* FRSQRTS */
+ case 0x5c: /* FCMGE */
+ case 0x5d: /* FACGE */
+ case 0x7a: /* FABD */
+ case 0x7c: /* FCMGT */
+ case 0x7d: /* FACGT */
+ unsupported_encoding(s, insn);
+ return;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+ }
+
+ switch (opcode) {
+ case 0x1: /* SQADD, UQADD */
+ case 0x5: /* SQSUB, UQSUB */
+ case 0x8: /* SSHL, USHL */
+ case 0xa: /* SRSHL, URSHL */
+ unsupported_encoding(s, insn);
+ return;
+ case 0x6: /* CMGT, CMHI */
+ case 0x7: /* CMGE, CMHS */
+ case 0x11: /* CMTST, CMEQ */
+ case 0x10: /* ADD, SUB (vector) */
+ if (size != 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ case 0x9: /* SQSHL, UQSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
+ unsupported_encoding(s, insn);
+ return;
+ case 0x16: /* SQDMULH, SQRDMULH (vector) */
+ if (size != 1 && size != 2) {
+ unallocated_encoding(s);
+ return;
+ }
+ unsupported_encoding(s, insn);
+ return;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+
+ tcg_rn = read_fp_dreg(s, rn); /* op1 */
+ tcg_rm = read_fp_dreg(s, rm); /* op2 */
+ tcg_rd = tcg_temp_new_i64();
+
+ /* For the moment we only support the opcodes which are
+ * 64-bit-width only. The size != 3 cases will
+ * be handled later when the relevant ops are implemented.
+ */
+ handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+
+ write_fp_dreg(s, rd, tcg_rd);
+
+ tcg_temp_free_i64(tcg_rn);
+ tcg_temp_free_i64(tcg_rm);
+ tcg_temp_free_i64(tcg_rd);
}
/* C3.6.12 AdvSIMD scalar two reg misc
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 4/8] target-arm: A64: Add top level decode for SIMD 3-same group
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
` (2 preceding siblings ...)
2014-01-17 18:44 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
` (4 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
Add top level decode for the A64 SIMD three regs same group
(C3.6.16), splitting it into the pairwise, logical, float and
integer subgroups.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 45 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 44 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 2b3f3a3..a92d69d 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5922,6 +5922,30 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
}
}
+/* Logic op (opcode == 3) subgroup of C3.6.16. */
+static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
+{
+ unsupported_encoding(s, insn);
+}
+
+/* Pairwise op subgroup of C3.6.16. */
+static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
+{
+ unsupported_encoding(s, insn);
+}
+
+/* Floating point op subgroup of C3.6.16. */
+static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
+{
+ unsupported_encoding(s, insn);
+}
+
+/* Integer op subgroup of C3.6.16. */
+static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
+{
+ unsupported_encoding(s, insn);
+}
+
/* C3.6.16 AdvSIMD three same
* 31 30 29 28 24 23 22 21 20 16 15 11 10 9 5 4 0
* +---+---+---+-----------+------+---+------+--------+---+------+------+
@@ -5930,7 +5954,26 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
*/
static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int opcode = extract32(insn, 11, 5);
+
+ switch (opcode) {
+ case 0x3: /* logic ops */
+ disas_simd_3same_logic(s, insn);
+ break;
+ case 0x17: /* ADDP */
+ case 0x14: /* SMAXP, UMAXP */
+ case 0x15: /* SMINP, UMINP */
+ /* Pairwise operations */
+ disas_simd_3same_pair(s, insn);
+ break;
+ case 0x18 ... 0x31:
+ /* floating point ops, sz[1] and U are part of opcode */
+ disas_simd_3same_float(s, insn);
+ break;
+ default:
+ disas_simd_3same_int(s, insn);
+ break;
+ }
}
/* C3.6.17 AdvSIMD two reg misc
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 5/8] target-arm: A64: Add logic ops from SIMD 3 same group
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
` (3 preceding siblings ...)
2014-01-17 18:44 ` [Qemu-devel] [PATCH 4/8] target-arm: A64: Add top level decode for SIMD 3-same group Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-21 19:35 ` Richard Henderson
2014-01-17 18:44 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
` (3 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
From: Alex Bennée <alex.bennee@linaro.org>
Add support for the logical operations (ORR, AND, BIC, ORN, EOR, BSL,
BIT and BIF) from the SIMD 3 register same group (C3.6.16).
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 68 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 67 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index a92d69d..82f8e8e 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5925,7 +5925,73 @@ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn)
/* Logic op (opcode == 3) subgroup of C3.6.16. */
static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int rm = extract32(insn, 16, 5);
+ int size = extract32(insn, 22, 2);
+ bool is_u = extract32(insn, 29, 1);
+ bool is_q = extract32(insn, 30, 1);
+ TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+ TCGv_i64 tcg_res[2];
+ int pass;
+
+ tcg_res[0] = tcg_temp_new_i64();
+ tcg_res[1] = tcg_temp_new_i64();
+
+ for (pass = 0; pass < (is_q ? 2 : 1); pass++) {
+ read_vec_element(s, tcg_op1, rn, pass, MO_64);
+ read_vec_element(s, tcg_op2, rm, pass, MO_64);
+
+ if (!is_u) {
+ /* AND, BIC, ORR, ORN */
+ if (extract32(size, 0, 1)) {
+ tcg_gen_not_i64(tcg_op2, tcg_op2);
+ }
+ if (extract32(size, 1, 1)) {
+ tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2);
+ } else {
+ tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2);
+ }
+ } else {
+ if (size != 0) {
+ /* B* ops need res loaded to operate on */
+ read_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+ }
+
+ switch (size) {
+ case 0: /* EOR */
+ tcg_gen_xor_i64(tcg_res[pass], tcg_op1, tcg_op2);
+ break;
+ case 1: /* BSL bitwise select */
+ tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_op2);
+ tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+ tcg_gen_xor_i64(tcg_res[pass], tcg_op2, tcg_op1);
+ break;
+ case 2: /* BIT, bitwise insert if true */
+ tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+ tcg_gen_and_i64(tcg_op1, tcg_op1, tcg_op2);
+ tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
+ break;
+ case 3: /* BIF, bitwise insert if false */
+ tcg_gen_xor_i64(tcg_op1, tcg_op1, tcg_res[pass]);
+ tcg_gen_andc_i64(tcg_op1, tcg_op1, tcg_op2);
+ tcg_gen_xor_i64(tcg_res[pass], tcg_res[pass], tcg_op1);
+ break;
+ }
+ }
+ }
+
+ write_vec_element(s, tcg_res[0], rd, 0, MO_64);
+ if (!is_q) {
+ tcg_gen_movi_i64(tcg_res[1], 0);
+ }
+ write_vec_element(s, tcg_res[1], rd, 1, MO_64);
+
+ tcg_temp_free_i64(tcg_op1);
+ tcg_temp_free_i64(tcg_op2);
+ tcg_temp_free_i64(tcg_res[0]);
+ tcg_temp_free_i64(tcg_res[1]);
}
/* Pairwise op subgroup of C3.6.16. */
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
` (4 preceding siblings ...)
2014-01-17 18:44 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-21 19:37 ` Richard Henderson
2014-01-21 19:40 ` Richard Henderson
2014-01-17 18:44 ` [Qemu-devel] [PATCH 7/8] target-arm: A64: Add simple SIMD 3-same floating point ops Peter Maydell
` (2 subsequent siblings)
8 siblings, 2 replies; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
Add some of the integer operations in the SIMD 3-same group:
specifically, the comparisons, addition and subtraction.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 141 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 140 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 82f8e8e..893c05b 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6009,7 +6009,146 @@ static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
/* Integer op subgroup of C3.6.16. */
static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int is_q = extract32(insn, 30, 1);
+ int u = extract32(insn, 29, 1);
+ int size = extract32(insn, 22, 2);
+ int opcode = extract32(insn, 11, 5);
+ int rm = extract32(insn, 16, 5);
+ int rn = extract32(insn, 5, 5);
+ int rd = extract32(insn, 0, 5);
+ int pass;
+
+ switch (opcode) {
+ case 0x13: /* MUL, PMUL */
+ if (u && size != 0) {
+ unallocated_encoding(s);
+ return;
+ }
+ /* fall through */
+ case 0x0: /* SHADD, UHADD */
+ case 0x2: /* SRHADD, URHADD */
+ case 0x4: /* SHSUB, UHSUB */
+ case 0xc: /* SMAX, UMAX */
+ case 0xd: /* SMIN, UMIN */
+ case 0xe: /* SABD, UABD */
+ case 0xf: /* SABA, UABA */
+ case 0x12: /* MLA, MLS */
+ if (size == 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ case 0x1: /* SQADD */
+ case 0x5: /* SQSUB */
+ case 0x8: /* SSHL, USHL */
+ case 0x9: /* SQSHL, UQSHL */
+ case 0xa: /* SRSHL, URSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
+ if (size == 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+ unsupported_encoding(s, insn);
+ return;
+ default:
+ if (size == 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ }
+
+ if (size == 3) {
+ for (pass = 0; pass < (is_q ? 2 : 1); pass++) {
+ TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+ TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_op1, rn, pass, MO_64);
+ read_vec_element(s, tcg_op2, rm, pass, MO_64);
+
+ handle_3same_64(s, opcode, u, tcg_res, tcg_op1, tcg_op2);
+
+ write_vec_element(s, tcg_res, rd, pass, MO_64);
+
+ tcg_temp_free_i64(tcg_res);
+ tcg_temp_free_i64(tcg_op1);
+ tcg_temp_free_i64(tcg_op2);
+ }
+ } else {
+ for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+ TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+ TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+ TCGv_i32 tcg_res = tcg_temp_new_i32();
+ TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+ typedef void NeonGenFn(TCGv_i32, TCGv_i32, TCGv_i32);
+ NeonGenFn *genfn;
+
+ read_vec_element(s, tcg_tmp, rn, pass, MO_32);
+ tcg_gen_trunc_i64_i32(tcg_op1, tcg_tmp);
+ read_vec_element(s, tcg_tmp, rm, pass, MO_32);
+ tcg_gen_trunc_i64_i32(tcg_op2, tcg_tmp);
+
+ switch (opcode) {
+ case 0x6: /* CMGT, CMHI */
+ {
+ NeonGenFn *ceqtstfns[3][2] = {
+ { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_u8 },
+ { gen_helper_neon_cgt_s16, gen_helper_neon_cgt_u16 },
+ { gen_helper_neon_cgt_s32, gen_helper_neon_cgt_u32 },
+ };
+ genfn = ceqtstfns[size][u];
+ break;
+ }
+ case 0x7: /* CMGE, CMHS */
+ {
+ NeonGenFn *ceqtstfns[3][2] = {
+ { gen_helper_neon_cge_s8, gen_helper_neon_cge_u8 },
+ { gen_helper_neon_cge_s16, gen_helper_neon_cge_u16 },
+ { gen_helper_neon_cge_s32, gen_helper_neon_cge_u32 },
+ };
+ genfn = ceqtstfns[size][u];
+ break;
+ }
+ case 0x10: /* ADD, SUB */
+ {
+ NeonGenFn *addfns[3][2] = {
+ { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 },
+ { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
+ { tcg_gen_add_i32, tcg_gen_sub_i32 },
+ };
+ genfn = addfns[size][u];
+ break;
+ }
+ case 0x11: /* CMTST, CMEQ */
+ {
+ NeonGenFn *ceqtstfns[3][2] = {
+ { gen_helper_neon_tst_u8, gen_helper_neon_ceq_u8 },
+ { gen_helper_neon_tst_u16, gen_helper_neon_ceq_u16 },
+ { gen_helper_neon_tst_u32, gen_helper_neon_ceq_u32 },
+ };
+ genfn = ceqtstfns[size][u];
+ break;
+ }
+ default:
+ g_assert_not_reached();
+ }
+
+ genfn(tcg_res, tcg_op1, tcg_op2);
+
+ tcg_gen_extu_i32_i64(tcg_tmp, tcg_res);
+ write_vec_element(s, tcg_tmp, rd, pass, MO_32);
+
+ tcg_temp_free_i32(tcg_res);
+ tcg_temp_free_i32(tcg_op1);
+ tcg_temp_free_i32(tcg_op2);
+ tcg_temp_free_i64(tcg_tmp);
+ }
+ }
+
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
}
/* C3.6.16 AdvSIMD three same
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 7/8] target-arm: A64: Add simple SIMD 3-same floating point ops
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
` (5 preceding siblings ...)
2014-01-17 18:44 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Add SIMD shift by immediate Peter Maydell
2014-01-21 21:42 ` [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Richard Henderson
8 siblings, 0 replies; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
Implement a simple subset of the SIMD 3-same floating point
operations. This includes a common helper function used for both
scalar and vector ops; FABD is the only currently implemented
shared op.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 191 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 189 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 893c05b..9980759 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5553,6 +5553,132 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
}
}
+/* Handle the 3-same-operands float operations; shared by the scalar
+ * and vector encodings. The caller must filter out any encodings
+ * not allocated for the encoding it is dealing with.
+ */
+static void handle_3same_float(DisasContext *s, int size, int elements,
+ int fpopcode, int rd, int rn, int rm)
+{
+ int pass;
+ TCGv_ptr fpst = get_fpstatus_ptr();
+
+ for (pass = 0; pass < elements; pass++) {
+ if (size) {
+ /* Double */
+ TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+ TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_op1, rn, pass, MO_64);
+ read_vec_element(s, tcg_op2, rm, pass, MO_64);
+
+ switch (fpopcode) {
+ case 0x18: /* FMAXNM */
+ gen_helper_vfp_maxnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x1a: /* FADD */
+ gen_helper_vfp_addd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x1e: /* FMAX */
+ gen_helper_vfp_maxd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x38: /* FMINNM */
+ gen_helper_vfp_minnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x3a: /* FSUB */
+ gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x3e: /* FMIN */
+ gen_helper_vfp_mind(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x5b: /* FMUL */
+ gen_helper_vfp_muld(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x5f: /* FDIV */
+ gen_helper_vfp_divd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x7a: /* FABD */
+ gen_helper_vfp_subd(tcg_res, tcg_op1, tcg_op2, fpst);
+ gen_helper_vfp_absd(tcg_res, tcg_res);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ write_vec_element(s, tcg_res, rd, pass, MO_64);
+
+ tcg_temp_free_i64(tcg_res);
+ tcg_temp_free_i64(tcg_op1);
+ tcg_temp_free_i64(tcg_op2);
+ } else {
+ /* Single */
+ TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+ TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+ TCGv_i32 tcg_res = tcg_temp_new_i32();
+ TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_tmp, rn, pass, MO_32);
+ tcg_gen_trunc_i64_i32(tcg_op1, tcg_tmp);
+ read_vec_element(s, tcg_tmp, rm, pass, MO_32);
+ tcg_gen_trunc_i64_i32(tcg_op2, tcg_tmp);
+
+ switch (fpopcode) {
+ case 0x1a: /* FADD */
+ gen_helper_vfp_adds(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x1e: /* FMAX */
+ gen_helper_vfp_maxs(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x18: /* FMAXNM */
+ gen_helper_vfp_maxnums(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x38: /* FMINNM */
+ gen_helper_vfp_minnums(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x3a: /* FSUB */
+ gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x3e: /* FMIN */
+ gen_helper_vfp_mins(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x5b: /* FMUL */
+ gen_helper_vfp_muls(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x5f: /* FDIV */
+ gen_helper_vfp_divs(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x7a: /* FABD */
+ gen_helper_vfp_subs(tcg_res, tcg_op1, tcg_op2, fpst);
+ gen_helper_vfp_abss(tcg_res, tcg_res);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ tcg_gen_extu_i32_i64(tcg_tmp, tcg_res);
+ if (elements == 1) {
+ /* scalar single so clear high part */
+ write_vec_element(s, tcg_tmp, rd, pass, MO_64);
+ } else {
+ write_vec_element(s, tcg_tmp, rd, pass, MO_32);
+ }
+
+ tcg_temp_free_i64(tcg_tmp);
+ tcg_temp_free_i32(tcg_res);
+ tcg_temp_free_i32(tcg_op1);
+ tcg_temp_free_i32(tcg_op2);
+ }
+ }
+
+ tcg_temp_free_ptr(fpst);
+
+ if ((elements << size) < 4) {
+ /* scalar, or non-quad vector op */
+ clear_vec_high(s, rd);
+ }
+}
+
/* C3.6.11 AdvSIMD scalar three same
* 31 30 29 28 24 23 22 21 20 16 15 11 10 9 5 4 0
* +-----+---+-----------+------+---+------+--------+---+------+------+
@@ -5581,15 +5707,19 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
case 0x3f: /* FRSQRTS */
case 0x5c: /* FCMGE */
case 0x5d: /* FACGE */
- case 0x7a: /* FABD */
case 0x7c: /* FCMGT */
case 0x7d: /* FACGT */
unsupported_encoding(s, insn);
return;
+ case 0x7a: /* FABD */
+ break;
default:
unallocated_encoding(s);
return;
}
+
+ handle_3same_float(s, extract32(size, 0, 1), 1, fpopcode, rd, rn, rm);
+ return;
}
switch (opcode) {
@@ -6003,7 +6133,64 @@ static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
/* Floating point op subgroup of C3.6.16. */
static void disas_simd_3same_float(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ /* For floating point ops, the U, size[1] and opcode bits
+ * together indicate the operation. size[0] indicates single
+ * or double.
+ */
+ int fpopcode = extract32(insn, 11, 5)
+ | (extract32(insn, 23, 1) << 5)
+ | (extract32(insn, 29, 1) << 6);
+ int is_q = extract32(insn, 30, 1);
+ int size = extract32(insn, 22, 1);
+ int rm = extract32(insn, 16, 5);
+ int rn = extract32(insn, 5, 5);
+ int rd = extract32(insn, 0, 5);
+
+ int datasize = is_q ? 128 : 64;
+ int esize = 32 << size;
+ int elements = datasize / esize;
+
+ if (size == 1 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ switch (fpopcode) {
+ case 0x58: /* FMAXNMP */
+ case 0x5a: /* FADDP */
+ case 0x5e: /* FMAXP */
+ case 0x78: /* FMINNMP */
+ case 0x7e: /* FMINP */
+ /* pairwise ops */
+ unsupported_encoding(s, insn);
+ return;
+ case 0x1b: /* FMULX */
+ case 0x1c: /* FCMEQ */
+ case 0x1f: /* FRECPS */
+ case 0x3f: /* FRSQRTS */
+ case 0x5c: /* FCMGE */
+ case 0x5d: /* FACGE */
+ case 0x7c: /* FCMGT */
+ case 0x7d: /* FACGT */
+ case 0x19: /* FMLA */
+ case 0x39: /* FMLS */
+ unsupported_encoding(s, insn);
+ return;
+ case 0x18: /* FMAXNM */
+ case 0x1a: /* FADD */
+ case 0x1e: /* FMAX */
+ case 0x38: /* FMINNM */
+ case 0x3a: /* FSUB */
+ case 0x3e: /* FMIN */
+ case 0x5b: /* FMUL */
+ case 0x5f: /* FDIV */
+ case 0x7a: /* FABD */
+ handle_3same_float(s, size, elements, fpopcode, rd, rn, rm);
+ return;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
}
/* Integer op subgroup of C3.6.16. */
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [Qemu-devel] [PATCH 8/8] target-arm: A64: Add SIMD shift by immediate
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
` (6 preceding siblings ...)
2014-01-17 18:44 ` [Qemu-devel] [PATCH 7/8] target-arm: A64: Add simple SIMD 3-same floating point ops Peter Maydell
@ 2014-01-17 18:44 ` Peter Maydell
2014-01-21 21:43 ` Richard Henderson
2014-01-21 21:42 ` [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Richard Henderson
8 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2014-01-17 18:44 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall, Richard Henderson
From: Alex Bennée <alex.bennee@linaro.org>
This implements a subset of the AdvSIMD shift operations (namely all the
none saturating or narrowing ones). The actual shift generation code
itself is common for both the scalar and vector cases but wrapped with
either vector element iteration or the fp reg access.
The rounding operations need to take special care to correctly reflect
the result of adding rounding bits on high bits as the intermediates do
not truncate.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 385 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 383 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 9980759..bfcce09 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5479,15 +5479,224 @@ static void disas_simd_scalar_pairwise(DisasContext *s, uint32_t insn)
unsupported_encoding(s, insn);
}
+/*
+ * Common SSHR[RA]/USHR[RA] - Shift right (optional rounding/accumulate)
+ *
+ * This code is handles the common shifting code and is used by both
+ * the vector and scalar code.
+ */
+static void handle_shri_with_rndacc(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
+ TCGv_i64 tcg_rnd, bool accumulate,
+ bool is_u, int size, int shift)
+{
+ bool extended_result = false;
+ bool round = !TCGV_IS_UNUSED_I64(tcg_rnd);
+ int ext_lshift = 0;
+ TCGv_i64 tcg_src_hi;
+
+ if (round && size == 3) {
+ extended_result = true;
+ ext_lshift = 64 - shift;
+ tcg_src_hi = tcg_temp_new_i64();
+ } else if (shift == 64) {
+ if (!accumulate && is_u) {
+ /* result is zero */
+ tcg_gen_movi_i64(tcg_res, 0);
+ return;
+ }
+ }
+
+ /* Deal with the rounding step */
+ if (round) {
+ if (extended_result) {
+ TCGv_i64 tcg_zero = tcg_const_i64(0);
+ if (!is_u) {
+ /* take care of sign extending tcg_res */
+ tcg_gen_sari_i64(tcg_src_hi, tcg_src, 63);
+ tcg_gen_add2_i64(tcg_src, tcg_src_hi,
+ tcg_src, tcg_src_hi,
+ tcg_rnd, tcg_zero);
+ } else {
+ tcg_gen_add2_i64(tcg_src, tcg_src_hi,
+ tcg_src, tcg_zero,
+ tcg_rnd, tcg_zero);
+ }
+ tcg_temp_free_i64(tcg_zero);
+ } else {
+ tcg_gen_add_i64(tcg_src, tcg_src, tcg_rnd);
+ }
+ }
+
+ /* Now do the shift right */
+ if (round && extended_result) {
+ /* extended case, >64 bit precision required */
+ if (ext_lshift == 0) {
+ /* special case, only high bits matter */
+ tcg_gen_mov_i64(tcg_src, tcg_src_hi);
+ } else {
+ tcg_gen_shri_i64(tcg_src, tcg_src, shift);
+ tcg_gen_shli_i64(tcg_src_hi, tcg_src_hi, ext_lshift);
+ tcg_gen_or_i64(tcg_src, tcg_src, tcg_src_hi);
+ }
+ } else {
+ if (is_u) {
+ if (shift == 64) {
+ /* essentially shifting in 64 zeros */
+ tcg_gen_movi_i64(tcg_src, 0);
+ } else {
+ tcg_gen_shri_i64(tcg_src, tcg_src, shift);
+ }
+ } else {
+ if (shift == 64) {
+ /* effectively extending the sign-bit */
+ tcg_gen_sari_i64(tcg_src, tcg_src, 63);
+ } else {
+ tcg_gen_sari_i64(tcg_src, tcg_src, shift);
+ }
+ }
+ }
+
+ if (accumulate) {
+ tcg_gen_add_i64(tcg_res, tcg_res, tcg_src);
+ } else {
+ tcg_gen_mov_i64(tcg_res, tcg_src);
+ }
+
+ if (extended_result) {
+ tcg_temp_free(tcg_src_hi);
+ }
+}
+
+/* Common SHL/SLI - Shift left with an optional insert */
+static void handle_shli_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
+ bool insert, int shift)
+{
+ tcg_gen_shli_i64(tcg_src, tcg_src, shift);
+ if (insert) {
+ /* SLI */
+ uint64_t mask = (1ULL << shift) - 1;
+ tcg_gen_andi_i64(tcg_res, tcg_res, mask);
+ tcg_gen_or_i64(tcg_res, tcg_res, tcg_src);
+ } else {
+ tcg_gen_mov_i64(tcg_res, tcg_src);
+ }
+}
+
+/* SSHR[RA]/USHR[RA] - Scalar shift right (optional rounding/accumulate) */
+static void handle_scalar_simd_shri(DisasContext *s,
+ bool is_u, int immh, int immb,
+ int opcode, int rn, int rd)
+{
+ const int size = 3;
+ int immhb = immh << 3 | immb;
+ int shift = 2 * (8 << size) - immhb;
+ bool accumulate = false;
+ bool round = false;
+ TCGv_i64 tcg_rn;
+ TCGv_i64 tcg_rd;
+ TCGv_i64 tcg_round;
+
+ if (!extract32(immh, 3, 1)) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ switch (opcode) {
+ case 0x02: /* SSRA / USRA (accumulate) */
+ accumulate = true;
+ break;
+ case 0x04: /* SRSHR / URSHR (rounding) */
+ round = true;
+ break;
+ case 0x06: /* SRSRA / URSRA (accum + rounding) */
+ accumulate = round = true;
+ break;
+ }
+
+ if (round) {
+ uint64_t round_const = 1ULL << (shift - 1);
+ tcg_round = tcg_const_i64(round_const);
+ } else {
+ TCGV_UNUSED_I64(tcg_round);
+ }
+
+ tcg_rn = read_fp_dreg(s, rn);
+ tcg_rd = accumulate ? read_fp_dreg(s, rd) : tcg_temp_new_i64();
+
+ handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+ accumulate, is_u, size, shift);
+
+ write_fp_dreg(s, rd, tcg_rd);
+
+ tcg_temp_free_i64(tcg_rn);
+ tcg_temp_free_i64(tcg_rd);
+ if (round) {
+ tcg_temp_free_i64(tcg_round);
+ }
+}
+
+/* SHL/SLI - Scalar shift left */
+static void handle_scalar_simd_shli(DisasContext *s, bool insert,
+ int immh, int immb, int opcode,
+ int rn, int rd)
+{
+ int size = 32 - clz32(immh) - 1;
+ int immhb = immh << 3 | immb;
+ int shift = immhb - (8 << size);
+ TCGv_i64 tcg_rn = new_tmp_a64(s);
+ TCGv_i64 tcg_rd = new_tmp_a64(s);
+
+ if (!extract32(immh, 3, 1)) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ tcg_rn = read_fp_dreg(s, rn);
+ tcg_rd = insert ? read_fp_dreg(s, rd) : tcg_temp_new_i64();
+
+ handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift);
+
+ write_fp_dreg(s, rd, tcg_rd);
+
+ tcg_temp_free_i64(tcg_rn);
+ tcg_temp_free_i64(tcg_rd);
+
+ return;
+}
+
/* C3.6.9 AdvSIMD scalar shift by immediate
* 31 30 29 28 23 22 19 18 16 15 11 10 9 5 4 0
* +-----+---+-------------+------+------+--------+---+------+------+
* | 0 1 | U | 1 1 1 1 1 0 | immh | immb | opcode | 1 | Rn | Rd |
* +-----+---+-------------+------+------+--------+---+------+------+
+ *
+ * This is the scalar version so it works on a fixed sized registers
*/
static void disas_simd_scalar_shift_imm(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int opcode = extract32(insn, 11, 5);
+ int immb = extract32(insn, 16, 3);
+ int immh = extract32(insn, 19, 4);
+ bool is_u = extract32(insn, 29, 1);
+
+ switch (opcode) {
+ case 0x00: /* SSHR / USHR */
+ case 0x02: /* SSRA / USRA */
+ case 0x04: /* SRSHR / URSHR */
+ case 0x06: /* SRSRA / URSRA */
+ handle_scalar_simd_shri(s, is_u, immh, immb, opcode, rn, rd);
+ break;
+ case 0x0a: /* SHL / SLI */
+ handle_scalar_simd_shli(s, is_u, immh, immb, opcode, rn, rd);
+ break;
+ default:
+ unsupported_encoding(s, insn);
+ break;
+ }
+
+ return;
}
/* C3.6.10 AdvSIMD scalar three different
@@ -5793,6 +6002,150 @@ static void disas_simd_scalar_indexed(DisasContext *s, uint32_t insn)
unsupported_encoding(s, insn);
}
+/* SSHR[RA]/USHR[RA] - Vector shift right (optional rounding/accumulate) */
+static void handle_vec_simd_shri(DisasContext *s, bool is_q, bool is_u,
+ int immh, int immb, int opcode, int rn, int rd)
+{
+ int size = 32 - clz32(immh) - 1;
+ int immhb = immh << 3 | immb;
+ int shift = 2 * (8 << size) - immhb;
+ bool accumulate = false;
+ bool round = false;
+ int dsize = is_q ? 128 : 64;
+ int esize = 8 << size;
+ int elements = dsize/esize;
+ TCGMemOp memop = size | (is_u ? 0 : MO_SIGN);
+ TCGv_i64 tcg_rn = new_tmp_a64(s);
+ TCGv_i64 tcg_rd = new_tmp_a64(s);
+ TCGv_i64 tcg_round;
+ int i;
+
+ if (extract32(immh, 3, 1) && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (size > 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ switch (opcode) {
+ case 0x02: /* SSRA / USRA (accumulate) */
+ accumulate = true;
+ break;
+ case 0x04: /* SRSHR / URSHR (rounding) */
+ round = true;
+ break;
+ case 0x06: /* SRSRA / URSRA (accum + rounding) */
+ accumulate = round = true;
+ break;
+ }
+
+ if (round) {
+ uint64_t round_const = 1ULL << (shift - 1);
+ tcg_round = tcg_const_i64(round_const);
+ } else {
+ TCGV_UNUSED_I64(tcg_round);
+ }
+
+ for (i = 0; i < elements; i++) {
+ read_vec_element(s, tcg_rn, rn, i, memop);
+ if (accumulate) {
+ read_vec_element(s, tcg_rd, rd, i, memop);
+ }
+
+ handle_shri_with_rndacc(tcg_rd, tcg_rn, tcg_round,
+ accumulate, is_u, size, shift);
+
+ write_vec_element(s, tcg_rd, rd, i, size);
+ }
+
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
+
+ if (round) {
+ tcg_temp_free_i64(tcg_round);
+ }
+}
+
+/* SHL/SLI - Vector shift left */
+static void handle_vec_simd_shli(DisasContext *s, bool is_q, bool insert,
+ int immh, int immb, int opcode, int rn, int rd)
+{
+ int size = 32 - clz32(immh) - 1;
+ int immhb = immh << 3 | immb;
+ int shift = immhb - (8 << size);
+ int dsize = is_q ? 128 : 64;
+ int esize = 8 << size;
+ int elements = dsize/esize;
+ TCGv_i64 tcg_rn = new_tmp_a64(s);
+ TCGv_i64 tcg_rd = new_tmp_a64(s);
+ int i;
+
+ if (extract32(immh, 3, 1) && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (size > 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ for (i = 0; i < elements; i++) {
+ read_vec_element(s, tcg_rn, rn, i, size);
+ if (insert) {
+ read_vec_element(s, tcg_rd, rd, i, size);
+ }
+
+ handle_shli_with_ins(tcg_rd, tcg_rn, insert, shift);
+
+ write_vec_element(s, tcg_rd, rd, i, size);
+ }
+
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
+
+ return;
+}
+
+/* USHLL/SHLL - Vector shift left with widening */
+static void handle_vec_simd_wshli(DisasContext *s, bool is_q, bool is_u,
+ int immh, int immb, int opcode, int rn, int rd)
+{
+ int size = 32 - clz32(immh) - 1;
+ int immhb = immh << 3 | immb;
+ int shift = immhb - (8 << size);
+ int dsize = 64;
+ int esize = 8 << size;
+ int elements = dsize/esize;
+ TCGv_i64 tcg_rn = new_tmp_a64(s);
+ TCGv_i64 tcg_rd = new_tmp_a64(s);
+ int i;
+
+ if (size >= 3) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ /* For the LL variants the store is larger than the load,
+ * so if rd == rn we would overwrite parts of our input.
+ * So load everything right now and use shifts in the main loop.
+ */
+ read_vec_element(s, tcg_rn, rn, is_q ? 1 : 0, MO_64);
+
+ for (i = 0; i < elements; i++) {
+ tcg_gen_shri_i64(tcg_rd, tcg_rn, i * esize);
+ ext_and_shift_reg(tcg_rd, tcg_rd, size | (!is_u << 2), 0);
+ tcg_gen_shli_i64(tcg_rd, tcg_rd, shift);
+ write_vec_element(s, tcg_rd, rd, i, size + 1);
+ }
+}
+
+
/* C3.6.14 AdvSIMD shift by immediate
* 31 30 29 28 23 22 19 18 16 15 11 10 9 5 4 0
* +---+---+---+-------------+------+------+--------+---+------+------+
@@ -5801,7 +6154,35 @@ static void disas_simd_scalar_indexed(DisasContext *s, uint32_t insn)
*/
static void disas_simd_shift_imm(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int opcode = extract32(insn, 11, 5);
+ int immb = extract32(insn, 16, 3);
+ int immh = extract32(insn, 19, 4);
+ bool is_u = extract32(insn, 29, 1);
+ bool is_q = extract32(insn, 30, 1);
+
+ switch (opcode) {
+ case 0x00: /* SSHR / USHR */
+ case 0x02: /* SSRA / USRA (accumulate) */
+ case 0x04: /* SRSHR / URSHR (rounding) */
+ case 0x06: /* SRSRA / URSRA (accum + rounding) */
+ handle_vec_simd_shri(s, is_q, is_u, immh, immb, opcode, rn, rd);
+ break;
+ case 0x0a: /* SHL / SLI */
+ handle_vec_simd_shli(s, is_q, is_u, immh, immb, opcode, rn, rd);
+ break;
+ case 0x14: /* SSHLL / USHLL */
+ handle_vec_simd_wshli(s, is_q, is_u, immh, immb, opcode, rn, rd);
+ break;
+ default:
+ /* We don't currently implement any of the Narrow or saturating shifts;
+ * nor do we implement the fixed-point conversions in this
+ * encoding group (SCVTF, FCVTZS, UCVTF, FCVTZU).
+ */
+ unsupported_encoding(s, insn);
+ return;
+ }
}
static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size,
--
1.8.5
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 3/8] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops
2014-01-17 18:44 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
@ 2014-01-21 19:26 ` Richard Henderson
0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-01-21 19:26 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall
On 01/17/2014 10:44 AM, Peter Maydell wrote:
> + * We implement this using setcond (!test) and subtracting 1.
FWIW -(test) is more space efficient for x86 host. I doubt there's any
space or speed impact either way for any other host.
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 5/8] target-arm: A64: Add logic ops from SIMD 3 same group
2014-01-17 18:44 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
@ 2014-01-21 19:35 ` Richard Henderson
0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-01-21 19:35 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall
On 01/17/2014 10:44 AM, Peter Maydell wrote:
> + /* AND, BIC, ORR, ORN */
> + if (extract32(size, 0, 1)) {
> + tcg_gen_not_i64(tcg_op2, tcg_op2);
> + }
> + if (extract32(size, 1, 1)) {
> + tcg_gen_or_i64(tcg_res[pass], tcg_op1, tcg_op2);
> + } else {
> + tcg_gen_and_i64(tcg_res[pass], tcg_op1, tcg_op2);
> + }
It would be worthwhile to generate andc and orc directly instead
of a separate not.
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-17 18:44 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
@ 2014-01-21 19:37 ` Richard Henderson
2014-01-21 19:53 ` Peter Maydell
2014-01-21 19:40 ` Richard Henderson
1 sibling, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2014-01-21 19:37 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall
On 01/17/2014 10:44 AM, Peter Maydell wrote:
> + NeonGenFn *ceqtstfns[3][2] = {
You want all of these arrays to be static and const:
static NeonGenFn * const ceqtstfns[3][2] = ...
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-17 18:44 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
2014-01-21 19:37 ` Richard Henderson
@ 2014-01-21 19:40 ` Richard Henderson
1 sibling, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-01-21 19:40 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall
On 01/17/2014 10:44 AM, Peter Maydell wrote:
> + switch (opcode) {
> + case 0x6: /* CMGT, CMHI */
> + {
> + NeonGenFn *ceqtstfns[3][2] = {
...
> + case 0x7: /* CMGE, CMHS */
> + {
> + NeonGenFn *ceqtstfns[3][2] = {
...
> + case 0x11: /* CMTST, CMEQ */
> + {
> + NeonGenFn *ceqtstfns[3][2] = {
Oh, and cut and paste on those names? Perhaps it would be less confusing to
just name them all "fns" and let the specific code block imply the rest?
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-21 19:37 ` Richard Henderson
@ 2014-01-21 19:53 ` Peter Maydell
2014-01-22 4:59 ` Xbing Wang
0 siblings, 1 reply; 19+ messages in thread
From: Peter Maydell @ 2014-01-21 19:53 UTC (permalink / raw)
To: Richard Henderson
Cc: Patch Tracking, Michael Matz, Alexander Graf, QEMU Developers,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm@lists.cs.columbia.edu, Christoffer Dall
On 21 January 2014 19:37, Richard Henderson <rth@twiddle.net> wrote:
> On 01/17/2014 10:44 AM, Peter Maydell wrote:
>> + NeonGenFn *ceqtstfns[3][2] = {
>
> You want all of these arrays to be static and const:
>
> static NeonGenFn * const ceqtstfns[3][2] = ...
I confess I couldn't figure out the right place to
put the 'const' to placate the compiler :-)
thanks
-- PMM
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
` (7 preceding siblings ...)
2014-01-17 18:44 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Add SIMD shift by immediate Peter Maydell
@ 2014-01-21 21:42 ` Richard Henderson
8 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-01-21 21:42 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall
On 01/17/2014 10:44 AM, Peter Maydell wrote:
> Alex Bennée (2):
> target-arm: A64: Add logic ops from SIMD 3 same group
> target-arm: A64: Add SIMD shift by immediate
>
> Peter Maydell (6):
> target-arm: A64: Add SIMD three-different multiply accumulate insns
> target-arm: A64: Add SIMD three-different ABDL instructions
> target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops
> target-arm: A64: Add top level decode for SIMD 3-same group
> target-arm: A64: Add integer ops from SIMD 3-same group
> target-arm: A64: Add simple SIMD 3-same floating point ops
I've responded with individual tweaks to some of the patches,
but I saw nothing actually incorrect in any in any of them, so,
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 8/8] target-arm: A64: Add SIMD shift by immediate
2014-01-17 18:44 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Add SIMD shift by immediate Peter Maydell
@ 2014-01-21 21:43 ` Richard Henderson
0 siblings, 0 replies; 19+ messages in thread
From: Richard Henderson @ 2014-01-21 21:43 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: patches, Michael Matz, Alexander Graf, Claudio Fontana,
Dirk Mueller, Will Newton, Laurent Desnogues, Alex Bennée,
kvmarm, Christoffer Dall
On 01/17/2014 10:44 AM, Peter Maydell wrote:
> +/* Common SHL/SLI - Shift left with an optional insert */
> +static void handle_shli_with_ins(TCGv_i64 tcg_res, TCGv_i64 tcg_src,
> + bool insert, int shift)
> +{
> + tcg_gen_shli_i64(tcg_src, tcg_src, shift);
> + if (insert) {
> + /* SLI */
> + uint64_t mask = (1ULL << shift) - 1;
> + tcg_gen_andi_i64(tcg_res, tcg_res, mask);
> + tcg_gen_or_i64(tcg_res, tcg_res, tcg_src);
This is
tcg_gen_deposit_i64(tcg_res, tcg_res, tcg_src, shift, 64 - shift);
We do already special case such remaining-width deposits for hosts that don't
implement deposit, so we should get the exact same insn sequence for x86.
> + tcg_gen_mov_i64(tcg_res, tcg_src);
Which means for the else you can elide the move and just shift directly into
the result.
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-21 19:53 ` Peter Maydell
@ 2014-01-22 4:59 ` Xbing Wang
2014-01-22 6:13 ` Richard Henderson
0 siblings, 1 reply; 19+ messages in thread
From: Xbing Wang @ 2014-01-22 4:59 UTC (permalink / raw)
To: Peter Maydell, Richard Henderson
Cc: Laurent Desnogues, Patch Tracking, Michael Matz, QEMU Developers,
Alexander Graf, Claudio Fontana, Dirk Mueller, Will Newton,
Alex Bennée, kvmarm@lists.cs.columbia.edu, Christoffer Dall
Making them static makes this function NOT thread-safe (and they're
hidden in thousands of lines of codes), there will be boys who want the
whole file translate-a64.c to be thread-safe. What do you think?
Thanks.
- xbing
On 01/22/2014 03:53 AM, Peter Maydell wrote:
> On 21 January 2014 19:37, Richard Henderson <rth@twiddle.net> wrote:
>> On 01/17/2014 10:44 AM, Peter Maydell wrote:
>>> + NeonGenFn *ceqtstfns[3][2] = {
>> You want all of these arrays to be static and const:
>>
>> static NeonGenFn * const ceqtstfns[3][2] = ...
> I confess I couldn't figure out the right place to
> put the 'const' to placate the compiler :-)
>
> thanks
> -- PMM
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-22 4:59 ` Xbing Wang
@ 2014-01-22 6:13 ` Richard Henderson
2014-01-22 6:48 ` Xbing Wang
0 siblings, 1 reply; 19+ messages in thread
From: Richard Henderson @ 2014-01-22 6:13 UTC (permalink / raw)
To: Xbing Wang, Peter Maydell
Cc: Laurent Desnogues, Patch Tracking, Michael Matz, QEMU Developers,
Alexander Graf, Claudio Fontana, Dirk Mueller, Will Newton,
Alex Bennée, kvmarm@lists.cs.columbia.edu, Christoffer Dall
On 01/21/2014 08:59 PM, Xbing Wang wrote:
> Making them static makes this function NOT thread-safe
Huh? Why in the world would you say that? They're read-only.
r~
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group
2014-01-22 6:13 ` Richard Henderson
@ 2014-01-22 6:48 ` Xbing Wang
0 siblings, 0 replies; 19+ messages in thread
From: Xbing Wang @ 2014-01-22 6:48 UTC (permalink / raw)
To: Richard Henderson, Peter Maydell
Cc: Laurent Desnogues, Patch Tracking, Michael Matz, QEMU Developers,
Alexander Graf, Claudio Fontana, Dirk Mueller, Will Newton,
Alex Bennée, kvmarm@lists.cs.columbia.edu, Christoffer Dall
My fault. Forgot const static. :-)
On 01/22/2014 02:13 PM, Richard Henderson wrote:
> On 01/21/2014 08:59 PM, Xbing Wang wrote:
>> Making them static makes this function NOT thread-safe
> Huh? Why in the world would you say that? They're read-only.
>
>
> r~
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-01-22 6:49 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-17 18:44 [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 1/8] target-arm: A64: Add SIMD three-different multiply accumulate insns Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 2/8] target-arm: A64: Add SIMD three-different ABDL instructions Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 3/8] target-arm: A64: Add SIMD scalar 3 same add, sub and compare ops Peter Maydell
2014-01-21 19:26 ` Richard Henderson
2014-01-17 18:44 ` [Qemu-devel] [PATCH 4/8] target-arm: A64: Add top level decode for SIMD 3-same group Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 5/8] target-arm: A64: Add logic ops from SIMD 3 same group Peter Maydell
2014-01-21 19:35 ` Richard Henderson
2014-01-17 18:44 ` [Qemu-devel] [PATCH 6/8] target-arm: A64: Add integer ops from SIMD 3-same group Peter Maydell
2014-01-21 19:37 ` Richard Henderson
2014-01-21 19:53 ` Peter Maydell
2014-01-22 4:59 ` Xbing Wang
2014-01-22 6:13 ` Richard Henderson
2014-01-22 6:48 ` Xbing Wang
2014-01-21 19:40 ` Richard Henderson
2014-01-17 18:44 ` [Qemu-devel] [PATCH 7/8] target-arm: A64: Add simple SIMD 3-same floating point ops Peter Maydell
2014-01-17 18:44 ` [Qemu-devel] [PATCH 8/8] target-arm: A64: Add SIMD shift by immediate Peter Maydell
2014-01-21 21:43 ` Richard Henderson
2014-01-21 21:42 ` [Qemu-devel] [PATCH 0/8] target-arm: A64 Neon instructions, set 2 Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).