* [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set
@ 2014-02-01 22:59 Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 01/13] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
` (13 more replies)
0 siblings, 14 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 22:59 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
This is the v2 from my 'Neon second and third sets' patch from
last week. The first 8 patches from that were all OK so have gone
into target-arm.next.
Changes v1->v2:
* squashed fixes to patch 2 that were lurking in patch 3 back
into patch 2
* moved the patch 3 min/max helper functions into patch 2,
to use them for plain max/min as well as pairwise max/min
* patch 7: use -(test) not (!test - 1)
* patch 12: special case REV of byte elements to use bswap;
also fixed a shift left by a negative number (we were
calculating revmask too early, before the invalid case was
thrown out)
RTH: you've reviewed everything here except patch 2 (which
I felt had slightly too much churn to retain your tag for)
and patch 12 (for the REV special case code).
thanks
-- PMM
Alex Bennée (1):
target-arm: A64: Add 2-reg-misc REV* instructions
Peter Maydell (12):
target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns
target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same
insns
target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD
tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR
target-arm: A64: Implement scalar pairwise ops
target-arm: A64: Implement remaining integer scalar-3-same insns
target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc
target-arm: A64: Add skeleton decode for SIMD 2-reg misc group
target-arm: A64: Implement 2-register misc compares, ABS, NEG
target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT
target-arm: A64: Add narrowing 2-reg-misc instructions
target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group
target-arm/helper.h | 1 +
target-arm/neon_helper.c | 12 +
target-arm/translate-a64.c | 1211 ++++++++++++++++++++++++++++++++++++++++----
tcg/tcg.h | 3 +
4 files changed, 1136 insertions(+), 91 deletions(-)
--
1.8.5
^ permalink raw reply [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 01/13] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
@ 2014-02-01 22:59 ` Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 02/13] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
` (12 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 22:59 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the SIMD 3-reg-same instructions SQADD, UQADD,
SQSUB, UQSUB, SSHL, USHL, SQSHl, UQSHL, SRSHL, URSHL,
SQRSHL, UQRSHL; these are all simple calls to existing
Neon helpers. We also enable SSHL, USHL, SRSHL and URSHL
for the 3-reg-same-scalar category (but not the others
because they can have non-size-64 operands and the
scalar_3reg_same function doesn't support that yet.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 134 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 112 insertions(+), 22 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 6c1ec1e..e67cdbb 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -74,6 +74,7 @@ typedef struct AArch64DecodeTable {
/* Function prototype for gen_ functions for calling Neon helpers */
typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
/* initialize TCG globals. */
void a64_translate_init(void)
@@ -5738,6 +5739,20 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
TCGCond cond;
switch (opcode) {
+ case 0x1: /* SQADD */
+ if (u) {
+ gen_helper_neon_qadd_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ } else {
+ gen_helper_neon_qadd_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ }
+ break;
+ case 0x5: /* SQSUB */
+ if (u) {
+ gen_helper_neon_qsub_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ } else {
+ gen_helper_neon_qsub_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ }
+ break;
case 0x6: /* CMGT, CMHI */
/* 64 bit integer comparison, result = test ? (2^64 - 1) : 0.
* We implement this using setcond (test) and then negating.
@@ -5760,19 +5775,41 @@ static void handle_3same_64(DisasContext *s, int opcode, bool u,
tcg_gen_setcondi_i64(TCG_COND_NE, tcg_rd, tcg_rd, 0);
tcg_gen_neg_i64(tcg_rd, tcg_rd);
break;
- case 0x10: /* ADD, SUB */
+ case 0x8: /* SSHL, USHL */
if (u) {
- tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
+ gen_helper_neon_shl_u64(tcg_rd, tcg_rn, tcg_rm);
} else {
- tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
+ gen_helper_neon_shl_s64(tcg_rd, tcg_rn, tcg_rm);
}
break;
- case 0x1: /* SQADD */
- case 0x5: /* SQSUB */
- case 0x8: /* SSHL, USHL */
case 0x9: /* SQSHL, UQSHL */
+ if (u) {
+ gen_helper_neon_qshl_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ } else {
+ gen_helper_neon_qshl_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ }
+ break;
case 0xa: /* SRSHL, URSHL */
+ if (u) {
+ gen_helper_neon_rshl_u64(tcg_rd, tcg_rn, tcg_rm);
+ } else {
+ gen_helper_neon_rshl_s64(tcg_rd, tcg_rn, tcg_rm);
+ }
+ break;
case 0xb: /* SQRSHL, UQRSHL */
+ if (u) {
+ gen_helper_neon_qrshl_u64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ } else {
+ gen_helper_neon_qrshl_s64(tcg_rd, cpu_env, tcg_rn, tcg_rm);
+ }
+ break;
+ case 0x10: /* ADD, SUB */
+ if (u) {
+ tcg_gen_sub_i64(tcg_rd, tcg_rn, tcg_rm);
+ } else {
+ tcg_gen_add_i64(tcg_rd, tcg_rn, tcg_rm);
+ }
+ break;
default:
g_assert_not_reached();
}
@@ -5949,10 +5986,10 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
- case 0x8: /* SSHL, USHL */
- case 0xa: /* SRSHL, URSHL */
unsupported_encoding(s, insn);
return;
+ case 0x8: /* SSHL, USHL */
+ case 0xa: /* SRSHL, URSHL */
case 0x6: /* CMGT, CMHI */
case 0x7: /* CMGE, CMHS */
case 0x11: /* CMTST, CMEQ */
@@ -6621,18 +6658,6 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
}
unsupported_encoding(s, insn);
return;
- case 0x1: /* SQADD */
- case 0x5: /* SQSUB */
- case 0x8: /* SSHL, USHL */
- case 0x9: /* SQSHL, UQSHL */
- case 0xa: /* SRSHL, URSHL */
- case 0xb: /* SQRSHL, UQRSHL */
- if (size == 3 && !is_q) {
- unallocated_encoding(s);
- return;
- }
- unsupported_encoding(s, insn);
- return;
case 0x16: /* SQDMULH, SQRDMULH */
if (size == 0 || size == 3) {
unallocated_encoding(s);
@@ -6670,12 +6695,33 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
TCGv_i32 tcg_op1 = tcg_temp_new_i32();
TCGv_i32 tcg_op2 = tcg_temp_new_i32();
TCGv_i32 tcg_res = tcg_temp_new_i32();
- NeonGenTwoOpFn *genfn;
+ NeonGenTwoOpFn *genfn = NULL;
+ NeonGenTwoOpEnvFn *genenvfn = NULL;
read_vec_element_i32(s, tcg_op1, rn, pass, MO_32);
read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
switch (opcode) {
+ case 0x1: /* SQADD, UQADD */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 },
+ { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 },
+ { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ case 0x5: /* SQSUB, UQSUB */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 },
+ { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 },
+ { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
case 0x6: /* CMGT, CMHI */
{
static NeonGenTwoOpFn * const fns[3][2] = {
@@ -6696,6 +6742,46 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genfn = fns[size][u];
break;
}
+ case 0x8: /* SSHL, USHL */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_shl_s8, gen_helper_neon_shl_u8 },
+ { gen_helper_neon_shl_s16, gen_helper_neon_shl_u16 },
+ { gen_helper_neon_shl_s32, gen_helper_neon_shl_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+ case 0x9: /* SQSHL, UQSHL */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qshl_s8, gen_helper_neon_qshl_u8 },
+ { gen_helper_neon_qshl_s16, gen_helper_neon_qshl_u16 },
+ { gen_helper_neon_qshl_s32, gen_helper_neon_qshl_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ case 0xa: /* SRSHL, URSHL */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_rshl_s8, gen_helper_neon_rshl_u8 },
+ { gen_helper_neon_rshl_s16, gen_helper_neon_rshl_u16 },
+ { gen_helper_neon_rshl_s32, gen_helper_neon_rshl_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+ case 0xb: /* SQRSHL, UQRSHL */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
+ { gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
+ { gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
case 0x10: /* ADD, SUB */
{
static NeonGenTwoOpFn * const fns[3][2] = {
@@ -6720,7 +6806,11 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
g_assert_not_reached();
}
- genfn(tcg_res, tcg_op1, tcg_op2);
+ if (genenvfn) {
+ genenvfn(tcg_res, cpu_env, tcg_op1, tcg_op2);
+ } else {
+ genfn(tcg_res, tcg_op1, tcg_op2);
+ }
write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 02/13] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 01/13] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
@ 2014-02-01 22:59 ` Peter Maydell
2014-02-03 21:21 ` Richard Henderson
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 03/13] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
` (11 subsequent siblings)
13 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 22:59 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the SIMD 3-reg-same instructions where the size == 3 case
is reserved: SHADD, UHADD, SRHADD, URHADD, SHSUB, UHSUB, SMAX,
UMAX, SMIN, UMIN, SABD, UABD, SABA, UABA, MLA, MLS, MUL, PMUL,
SQRDMULH, SQDMULH. (None of these have scalar-3-same versions.)
This completes the non-pairwise integer instructions in this category.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 131 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 127 insertions(+), 4 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index e67cdbb..8d996e9 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6556,6 +6556,27 @@ static void disas_simd_3same_logic(DisasContext *s, uint32_t insn)
tcg_temp_free_i64(tcg_res[1]);
}
+/* Helper functions for 32 bit comparisons */
+static void gen_max_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+ tcg_gen_movcond_i32(TCG_COND_GE, res, op1, op2, op1, op2);
+}
+
+static void gen_max_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+ tcg_gen_movcond_i32(TCG_COND_GEU, res, op1, op2, op1, op2);
+}
+
+static void gen_min_s32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+ tcg_gen_movcond_i32(TCG_COND_LE, res, op1, op2, op1, op2);
+}
+
+static void gen_min_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
+{
+ tcg_gen_movcond_i32(TCG_COND_LEU, res, op1, op2, op1, op2);
+}
+
/* Pairwise op subgroup of C3.6.16. */
static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
{
@@ -6656,15 +6677,13 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
unallocated_encoding(s);
return;
}
- unsupported_encoding(s, insn);
- return;
+ break;
case 0x16: /* SQDMULH, SQRDMULH */
if (size == 0 || size == 3) {
unallocated_encoding(s);
return;
}
- unsupported_encoding(s, insn);
- return;
+ break;
default:
if (size == 3 && !is_q) {
unallocated_encoding(s);
@@ -6702,6 +6721,16 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
read_vec_element_i32(s, tcg_op2, rm, pass, MO_32);
switch (opcode) {
+ case 0x0: /* SHADD, UHADD */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_hadd_s8, gen_helper_neon_hadd_u8 },
+ { gen_helper_neon_hadd_s16, gen_helper_neon_hadd_u16 },
+ { gen_helper_neon_hadd_s32, gen_helper_neon_hadd_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
case 0x1: /* SQADD, UQADD */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -6712,6 +6741,26 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genenvfn = fns[size][u];
break;
}
+ case 0x2: /* SRHADD, URHADD */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_rhadd_s8, gen_helper_neon_rhadd_u8 },
+ { gen_helper_neon_rhadd_s16, gen_helper_neon_rhadd_u16 },
+ { gen_helper_neon_rhadd_s32, gen_helper_neon_rhadd_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+ case 0x4: /* SHSUB, UHSUB */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_hsub_s8, gen_helper_neon_hsub_u8 },
+ { gen_helper_neon_hsub_s16, gen_helper_neon_hsub_u16 },
+ { gen_helper_neon_hsub_s32, gen_helper_neon_hsub_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
case 0x5: /* SQSUB, UQSUB */
{
static NeonGenTwoOpEnvFn * const fns[3][2] = {
@@ -6782,6 +6831,38 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genenvfn = fns[size][u];
break;
}
+ case 0xc: /* SMAX, UMAX */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_max_s8, gen_helper_neon_max_u8 },
+ { gen_helper_neon_max_s16, gen_helper_neon_max_u16 },
+ { gen_max_s32, gen_max_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+
+ case 0xd: /* SMIN, UMIN */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_min_s8, gen_helper_neon_min_u8 },
+ { gen_helper_neon_min_s16, gen_helper_neon_min_u16 },
+ { gen_min_s32, gen_min_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+ case 0xe: /* SABD, UABD */
+ case 0xf: /* SABA, UABA */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_abd_s8, gen_helper_neon_abd_u8 },
+ { gen_helper_neon_abd_s16, gen_helper_neon_abd_u16 },
+ { gen_helper_neon_abd_s32, gen_helper_neon_abd_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
case 0x10: /* ADD, SUB */
{
static NeonGenTwoOpFn * const fns[3][2] = {
@@ -6802,6 +6883,34 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genfn = fns[size][u];
break;
}
+ case 0x13: /* MUL, PMUL */
+ if (u) {
+ /* PMUL */
+ assert(size == 0);
+ genfn = gen_helper_neon_mul_p8;
+ break;
+ }
+ /* fall through : MUL */
+ case 0x12: /* MLA, MLS */
+ {
+ static NeonGenTwoOpFn * const fns[3] = {
+ gen_helper_neon_mul_u8,
+ gen_helper_neon_mul_u16,
+ tcg_gen_mul_i32,
+ };
+ genfn = fns[size];
+ break;
+ }
+ case 0x16: /* SQDMULH, SQRDMULH */
+ {
+ static NeonGenTwoOpEnvFn * const fns[2][2] = {
+ { gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 },
+ { gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 },
+ };
+ assert(size == 1 || size == 2);
+ genenvfn = fns[size - 1][u];
+ break;
+ }
default:
g_assert_not_reached();
}
@@ -6812,6 +6921,20 @@ static void disas_simd_3same_int(DisasContext *s, uint32_t insn)
genfn(tcg_res, tcg_op1, tcg_op2);
}
+ if (opcode == 0xf || opcode == 0x12) {
+ /* SABA, UABA, MLA, MLS: accumulating ops */
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_add_u8, gen_helper_neon_sub_u8 },
+ { gen_helper_neon_add_u16, gen_helper_neon_sub_u16 },
+ { tcg_gen_add_i32, tcg_gen_sub_i32 },
+ };
+ bool is_sub = (opcode == 0x12 && u); /* MLS */
+
+ genfn = fns[size][is_sub];
+ read_vec_element_i32(s, tcg_op1, rd, pass, MO_32);
+ genfn(tcg_res, tcg_res, tcg_op1);
+ }
+
write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
tcg_temp_free_i32(tcg_res);
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 03/13] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 01/13] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 02/13] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
@ 2014-02-01 22:59 ` Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 04/13] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
` (10 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 22:59 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the pairwise integer operations in the 3-reg-same SIMD group:
ADDP, SMAXP, SMINP, UMAXP and UMINP.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 124 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 123 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 8d996e9..01f6b79 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6580,7 +6580,129 @@ static void gen_min_u32(TCGv_i32 res, TCGv_i32 op1, TCGv_i32 op2)
/* Pairwise op subgroup of C3.6.16. */
static void disas_simd_3same_pair(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int is_q = extract32(insn, 30, 1);
+ int u = extract32(insn, 29, 1);
+ int size = extract32(insn, 22, 2);
+ int opcode = extract32(insn, 11, 5);
+ int rm = extract32(insn, 16, 5);
+ int rn = extract32(insn, 5, 5);
+ int rd = extract32(insn, 0, 5);
+ int pass;
+
+ if (size == 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ switch (opcode) {
+ case 0x14: /* SMAXP, UMAXP */
+ case 0x15: /* SMINP, UMINP */
+ if (size == 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ case 0x17:
+ if (u) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ /* These operations work on the concatenated rm:rn, with each pair of
+ * adjacent elements being operated on to produce an element in the result.
+ */
+ if (size == 3) {
+ TCGv_i64 tcg_res[2];
+
+ for (pass = 0; pass < 2; pass++) {
+ TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+ int passreg = (pass == 0) ? rn : rm;
+
+ read_vec_element(s, tcg_op1, passreg, 0, MO_64);
+ read_vec_element(s, tcg_op2, passreg, 1, MO_64);
+ tcg_res[pass] = tcg_temp_new_i64();
+
+ /* The only 64 bit pairwise integer op is ADDP */
+ assert(opcode == 0x17);
+ tcg_gen_add_i64(tcg_res[pass], tcg_op1, tcg_op2);
+
+ tcg_temp_free_i64(tcg_op1);
+ tcg_temp_free_i64(tcg_op2);
+ }
+
+ for (pass = 0; pass < 2; pass++) {
+ write_vec_element(s, tcg_res[pass], rd, pass, MO_64);
+ tcg_temp_free_i64(tcg_res[pass]);
+ }
+ } else {
+ int maxpass = is_q ? 4 : 2;
+ TCGv_i32 tcg_res[4];
+
+ for (pass = 0; pass < maxpass; pass++) {
+ TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+ TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+ NeonGenTwoOpFn *genfn;
+ int passreg = pass < (maxpass / 2) ? rn : rm;
+ int passelt = (is_q && (pass & 1)) ? 2 : 0;
+
+ read_vec_element_i32(s, tcg_op1, passreg, passelt, MO_32);
+ read_vec_element_i32(s, tcg_op2, passreg, passelt + 1, MO_32);
+ tcg_res[pass] = tcg_temp_new_i32();
+
+ switch (opcode) {
+ case 0x17: /* ADDP */
+ {
+ static NeonGenTwoOpFn * const fns[3] = {
+ gen_helper_neon_padd_u8,
+ gen_helper_neon_padd_u16,
+ tcg_gen_add_i32,
+ };
+ genfn = fns[size];
+ break;
+ }
+ case 0x14: /* SMAXP, UMAXP */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_pmax_s8, gen_helper_neon_pmax_u8 },
+ { gen_helper_neon_pmax_s16, gen_helper_neon_pmax_u16 },
+ { gen_max_s32, gen_max_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+ case 0x15: /* SMINP, UMINP */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_pmin_s8, gen_helper_neon_pmin_u8 },
+ { gen_helper_neon_pmin_s16, gen_helper_neon_pmin_u16 },
+ { gen_min_s32, gen_min_u32 },
+ };
+ genfn = fns[size][u];
+ break;
+ }
+ default:
+ g_assert_not_reached();
+ }
+
+ genfn(tcg_res[pass], tcg_op1, tcg_op2);
+
+ tcg_temp_free_i32(tcg_op1);
+ tcg_temp_free_i32(tcg_op2);
+ }
+
+ for (pass = 0; pass < maxpass; pass++) {
+ write_vec_element_i32(s, tcg_res[pass], rd, pass, MO_32);
+ tcg_temp_free_i32(tcg_res[pass]);
+ }
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
+ }
}
/* Floating point op subgroup of C3.6.16. */
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 04/13] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (2 preceding siblings ...)
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 03/13] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
@ 2014-02-01 22:59 ` Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 05/13] target-arm: A64: Implement scalar pairwise ops Peter Maydell
` (9 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 22:59 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
We have macros for marking TCGv values as unused, checking if they
are unused and comparing them to each other. However these only exist
for TCGv_i32 and TCGv_i64; add them for TCGv_ptr as well.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.h | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index c72af6c..f7efcb4 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -324,13 +324,16 @@ typedef int TCGv_i64;
#define TCGV_EQUAL_I32(a, b) (GET_TCGV_I32(a) == GET_TCGV_I32(b))
#define TCGV_EQUAL_I64(a, b) (GET_TCGV_I64(a) == GET_TCGV_I64(b))
+#define TCGV_EQUAL_PTR(a, b) (GET_TCGV_PTR(a) == GET_TCGV_PTR(b))
/* Dummy definition to avoid compiler warnings. */
#define TCGV_UNUSED_I32(x) x = MAKE_TCGV_I32(-1)
#define TCGV_UNUSED_I64(x) x = MAKE_TCGV_I64(-1)
+#define TCGV_UNUSED_PTR(x) x = MAKE_TCGV_PTR(-1)
#define TCGV_IS_UNUSED_I32(x) (GET_TCGV_I32(x) == -1)
#define TCGV_IS_UNUSED_I64(x) (GET_TCGV_I64(x) == -1)
+#define TCGV_IS_UNUSED_PTR(x) (GET_TCGV_PTR(x) == -1)
/* call flags */
/* Helper does not read globals (either directly or through an exception). It
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 05/13] target-arm: A64: Implement scalar pairwise ops
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (3 preceding siblings ...)
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 04/13] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
@ 2014-02-01 22:59 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 06/13] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
` (8 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 22:59 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the instructions in the scalar pairwise group (C3.6.8).
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 113 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 01f6b79..452b1fb 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -5501,7 +5501,119 @@ static void disas_simd_scalar_copy(DisasContext *s, uint32_t insn)
*/
static void disas_simd_scalar_pairwise(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int u = extract32(insn, 29, 1);
+ int size = extract32(insn, 22, 2);
+ int opcode = extract32(insn, 12, 5);
+ int rn = extract32(insn, 5, 5);
+ int rd = extract32(insn, 0, 5);
+ TCGv_ptr fpst;
+
+ /* For some ops (the FP ones), size[1] is part of the encoding.
+ * For ADDP strictly it is not but size[1] is always 1 for valid
+ * encodings.
+ */
+ opcode |= (extract32(size, 1, 1) << 5);
+
+ switch (opcode) {
+ case 0x3b: /* ADDP */
+ if (u || size != 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ TCGV_UNUSED_PTR(fpst);
+ break;
+ case 0xc: /* FMAXNMP */
+ case 0xd: /* FADDP */
+ case 0xf: /* FMAXP */
+ case 0x2c: /* FMINNMP */
+ case 0x2f: /* FMINP */
+ /* FP op, size[0] is 32 or 64 bit */
+ if (!u) {
+ unallocated_encoding(s);
+ return;
+ }
+ size = extract32(size, 0, 1) ? 3 : 2;
+ fpst = get_fpstatus_ptr();
+ break;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (size == 3) {
+ TCGv_i64 tcg_op1 = tcg_temp_new_i64();
+ TCGv_i64 tcg_op2 = tcg_temp_new_i64();
+ TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_op1, rn, 0, MO_64);
+ read_vec_element(s, tcg_op2, rn, 1, MO_64);
+
+ switch (opcode) {
+ case 0x3b: /* ADDP */
+ tcg_gen_add_i64(tcg_res, tcg_op1, tcg_op2);
+ break;
+ case 0xc: /* FMAXNMP */
+ gen_helper_vfp_maxnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0xd: /* FADDP */
+ gen_helper_vfp_addd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0xf: /* FMAXP */
+ gen_helper_vfp_maxd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x2c: /* FMINNMP */
+ gen_helper_vfp_minnumd(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x2f: /* FMINP */
+ gen_helper_vfp_mind(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ write_fp_dreg(s, rd, tcg_res);
+
+ tcg_temp_free_i64(tcg_op1);
+ tcg_temp_free_i64(tcg_op2);
+ tcg_temp_free_i64(tcg_res);
+ } else {
+ TCGv_i32 tcg_op1 = tcg_temp_new_i32();
+ TCGv_i32 tcg_op2 = tcg_temp_new_i32();
+ TCGv_i32 tcg_res = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, tcg_op1, rn, 0, MO_32);
+ read_vec_element_i32(s, tcg_op2, rn, 1, MO_32);
+
+ switch (opcode) {
+ case 0xc: /* FMAXNMP */
+ gen_helper_vfp_maxnums(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0xd: /* FADDP */
+ gen_helper_vfp_adds(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0xf: /* FMAXP */
+ gen_helper_vfp_maxs(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x2c: /* FMINNMP */
+ gen_helper_vfp_minnums(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ case 0x2f: /* FMINP */
+ gen_helper_vfp_mins(tcg_res, tcg_op1, tcg_op2, fpst);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+
+ write_fp_sreg(s, rd, tcg_res);
+
+ tcg_temp_free_i32(tcg_op1);
+ tcg_temp_free_i32(tcg_op2);
+ tcg_temp_free_i32(tcg_res);
+ }
+
+ if (!TCGV_IS_UNUSED_PTR(fpst)) {
+ tcg_temp_free_ptr(fpst);
+ }
}
/*
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 06/13] target-arm: A64: Implement remaining integer scalar-3-same insns
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (4 preceding siblings ...)
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 05/13] target-arm: A64: Implement scalar pairwise ops Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 07/13] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
` (7 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the remaining integer instructions in the scalar-three-reg-same
group: SQADD, UQADD, SQSUB, UQSUB, SQSHL, UQSHL, SQRSHL, UQRSHL,
SQDMULH, SQRDMULH.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 106 +++++++++++++++++++++++++++++++++++++--------
1 file changed, 87 insertions(+), 19 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 452b1fb..323d5b3 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6066,8 +6066,6 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
int rm = extract32(insn, 16, 5);
int size = extract32(insn, 22, 2);
bool u = extract32(insn, 29, 1);
- TCGv_i64 tcg_rn;
- TCGv_i64 tcg_rm;
TCGv_i64 tcg_rd;
if (opcode >= 0x18) {
@@ -6098,8 +6096,9 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0x1: /* SQADD, UQADD */
case 0x5: /* SQSUB, UQSUB */
- unsupported_encoding(s, insn);
- return;
+ case 0x9: /* SQSHL, UQSHL */
+ case 0xb: /* SQRSHL, UQRSHL */
+ break;
case 0x8: /* SSHL, USHL */
case 0xa: /* SRSHL, URSHL */
case 0x6: /* CMGT, CMHI */
@@ -6111,36 +6110,105 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
return;
}
break;
- case 0x9: /* SQSHL, UQSHL */
- case 0xb: /* SQRSHL, UQRSHL */
- unsupported_encoding(s, insn);
- return;
case 0x16: /* SQDMULH, SQRDMULH (vector) */
if (size != 1 && size != 2) {
unallocated_encoding(s);
return;
}
- unsupported_encoding(s, insn);
- return;
+ break;
default:
unallocated_encoding(s);
return;
}
- tcg_rn = read_fp_dreg(s, rn); /* op1 */
- tcg_rm = read_fp_dreg(s, rm); /* op2 */
tcg_rd = tcg_temp_new_i64();
- /* For the moment we only support the opcodes which are
- * 64-bit-width only. The size != 3 cases will
- * be handled later when the relevant ops are implemented.
- */
- handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+ if (size == 3) {
+ TCGv_i64 tcg_rn = read_fp_dreg(s, rn);
+ TCGv_i64 tcg_rm = read_fp_dreg(s, rm);
+
+ handle_3same_64(s, opcode, u, tcg_rd, tcg_rn, tcg_rm);
+ tcg_temp_free_i64(tcg_rn);
+ tcg_temp_free_i64(tcg_rm);
+ } else {
+ /* Do a single operation on the lowest element in the vector.
+ * We use the standard Neon helpers and rely on 0 OP 0 == 0 with
+ * no side effects for all these operations.
+ * OPTME: special-purpose helpers would avoid doing some
+ * unnecessary work in the helper for the 8 and 16 bit cases.
+ */
+ NeonGenTwoOpEnvFn *genenvfn;
+ TCGv_i32 tcg_rn = tcg_temp_new_i32();
+ TCGv_i32 tcg_rm = tcg_temp_new_i32();
+ TCGv_i32 tcg_rd32 = tcg_temp_new_i32();
+
+ read_vec_element_i32(s, tcg_rn, rn, 0, size);
+ read_vec_element_i32(s, tcg_rm, rm, 0, size);
+
+ switch (opcode) {
+ case 0x1: /* SQADD, UQADD */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qadd_s8, gen_helper_neon_qadd_u8 },
+ { gen_helper_neon_qadd_s16, gen_helper_neon_qadd_u16 },
+ { gen_helper_neon_qadd_s32, gen_helper_neon_qadd_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ case 0x5: /* SQSUB, UQSUB */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qsub_s8, gen_helper_neon_qsub_u8 },
+ { gen_helper_neon_qsub_s16, gen_helper_neon_qsub_u16 },
+ { gen_helper_neon_qsub_s32, gen_helper_neon_qsub_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ case 0x9: /* SQSHL, UQSHL */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qshl_s8, gen_helper_neon_qshl_u8 },
+ { gen_helper_neon_qshl_s16, gen_helper_neon_qshl_u16 },
+ { gen_helper_neon_qshl_s32, gen_helper_neon_qshl_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ case 0xb: /* SQRSHL, UQRSHL */
+ {
+ static NeonGenTwoOpEnvFn * const fns[3][2] = {
+ { gen_helper_neon_qrshl_s8, gen_helper_neon_qrshl_u8 },
+ { gen_helper_neon_qrshl_s16, gen_helper_neon_qrshl_u16 },
+ { gen_helper_neon_qrshl_s32, gen_helper_neon_qrshl_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ case 0x16: /* SQDMULH, SQRDMULH */
+ {
+ static NeonGenTwoOpEnvFn * const fns[2][2] = {
+ { gen_helper_neon_qdmulh_s16, gen_helper_neon_qrdmulh_s16 },
+ { gen_helper_neon_qdmulh_s32, gen_helper_neon_qrdmulh_s32 },
+ };
+ assert(size == 1 || size == 2);
+ genenvfn = fns[size - 1][u];
+ break;
+ }
+ default:
+ g_assert_not_reached();
+ }
+
+ genenvfn(tcg_rd32, cpu_env, tcg_rn, tcg_rm);
+ tcg_gen_extu_i32_i64(tcg_rd, tcg_rd32);
+ tcg_temp_free_i32(tcg_rd32);
+ tcg_temp_free_i32(tcg_rn);
+ tcg_temp_free_i32(tcg_rm);
+ }
write_fp_dreg(s, rd, tcg_rd);
- tcg_temp_free_i64(tcg_rn);
- tcg_temp_free_i64(tcg_rm);
tcg_temp_free_i64(tcg_rd);
}
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 07/13] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (5 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 06/13] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 08/13] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
` (6 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the simple 64 bit integer operations from the SIMD
scalar 2-register misc group (C3.6.12): the comparisons against
zero, plus ABS and NEG.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 87 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 86 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 323d5b3..dd6785a 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6212,6 +6212,48 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
tcg_temp_free_i64(tcg_rd);
}
+static void handle_2misc_64(DisasContext *s, int opcode, bool u,
+ TCGv_i64 tcg_rd, TCGv_i64 tcg_rn)
+{
+ /* Handle 64->64 opcodes which are shared between the scalar and
+ * vector 2-reg-misc groups. We cover every integer opcode where size == 3
+ * is valid in either group.
+ */
+ TCGCond cond;
+
+ switch (opcode) {
+ case 0xa: /* CMLT */
+ /* 64 bit integer comparison against zero, result is
+ * test ? (2^64 - 1) : 0. We implement via setcond(!test) and
+ * subtracting 1.
+ */
+ cond = TCG_COND_LT;
+ do_cmop:
+ tcg_gen_setcondi_i64(cond, tcg_rd, tcg_rn, 0);
+ tcg_gen_neg_i64(tcg_rd, tcg_rd);
+ break;
+ case 0x8: /* CMGT, CMGE */
+ cond = u ? TCG_COND_GE : TCG_COND_GT;
+ goto do_cmop;
+ case 0x9: /* CMEQ, CMLE */
+ cond = u ? TCG_COND_LE : TCG_COND_EQ;
+ goto do_cmop;
+ case 0xb: /* ABS, NEG */
+ if (u) {
+ tcg_gen_neg_i64(tcg_rd, tcg_rn);
+ } else {
+ TCGv_i64 tcg_zero = tcg_const_i64(0);
+ tcg_gen_neg_i64(tcg_rd, tcg_rn);
+ tcg_gen_movcond_i64(TCG_COND_GT, tcg_rd, tcg_rn, tcg_zero,
+ tcg_rn, tcg_rd);
+ tcg_temp_free_i64(tcg_zero);
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+}
+
/* C3.6.12 AdvSIMD scalar two reg misc
* 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0
* +-----+---+-----------+------+-----------+--------+-----+------+------+
@@ -6220,7 +6262,50 @@ static void disas_simd_scalar_three_reg_same(DisasContext *s, uint32_t insn)
*/
static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int rd = extract32(insn, 0, 5);
+ int rn = extract32(insn, 5, 5);
+ int opcode = extract32(insn, 12, 5);
+ int size = extract32(insn, 22, 2);
+ bool u = extract32(insn, 29, 1);
+
+ switch (opcode) {
+ case 0xa: /* CMLT */
+ if (u) {
+ unallocated_encoding(s);
+ return;
+ }
+ /* fall through */
+ case 0x8: /* CMGT, CMGE */
+ case 0x9: /* CMEQ, CMLE */
+ case 0xb: /* ABS, NEG */
+ if (size != 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ default:
+ /* Other categories of encoding in this class:
+ * + floating point (single and double)
+ * + SUQADD/USQADD/SQABS/SQNEG : size 8, 16, 32 or 64
+ * + SQXTN/SQXTN2/SQXTUN/SQXTUN2/UQXTN/UQXTN2:
+ * narrowing saturate ops: size 64/32/16 -> 32/16/8
+ */
+ unsupported_encoding(s, insn);
+ return;
+ }
+
+ if (size == 3) {
+ TCGv_i64 tcg_rn = read_fp_dreg(s, rn);
+ TCGv_i64 tcg_rd = tcg_temp_new_i64();
+
+ handle_2misc_64(s, opcode, u, tcg_rd, tcg_rn);
+ write_fp_dreg(s, rd, tcg_rd);
+ tcg_temp_free_i64(tcg_rd);
+ tcg_temp_free_i64(tcg_rn);
+ } else {
+ /* the 'size might not be 64' ops aren't implemented yet */
+ g_assert_not_reached();
+ }
}
/* C3.6.13 AdvSIMD scalar x indexed element
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 08/13] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (6 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 07/13] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 09/13] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
` (5 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Add a skeleton decode for the SIMD 2-reg misc group.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 110 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 109 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index dd6785a..b0011da 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7373,7 +7373,115 @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
*/
static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
{
- unsupported_encoding(s, insn);
+ int size = extract32(insn, 22, 2);
+ int opcode = extract32(insn, 12, 5);
+ bool u = extract32(insn, 29, 1);
+ bool is_q = extract32(insn, 30, 1);
+
+ switch (opcode) {
+ case 0x0: /* REV64, REV32 */
+ case 0x1: /* REV16 */
+ unsupported_encoding(s, insn);
+ return;
+ case 0x5: /* CNT, NOT, RBIT */
+ if ((u == 0 && size > 0) ||
+ (u == 1 && size > 1)) {
+ unallocated_encoding(s);
+ return;
+ }
+ unsupported_encoding(s, insn);
+ return;
+ case 0x2: /* SADDLP, UADDLP */
+ case 0x4: /* CLS, CLZ */
+ case 0x6: /* SADALP, UADALP */
+ case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */
+ case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */
+ if (size == 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ unsupported_encoding(s, insn);
+ return;
+ case 0x13: /* SHLL, SHLL2 */
+ if (u == 0 || size == 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ unsupported_encoding(s, insn);
+ return;
+ case 0xa: /* CMLT */
+ if (u == 1) {
+ unallocated_encoding(s);
+ return;
+ }
+ /* fall through */
+ case 0x3: /* SUQADD, USQADD */
+ case 0x7: /* SQABS, SQNEG */
+ case 0x8: /* CMGT, CMGE */
+ case 0x9: /* CMEQ, CMLE */
+ case 0xb: /* ABS, NEG */
+ if (size == 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+ unsupported_encoding(s, insn);
+ return;
+ case 0xc ... 0xf:
+ case 0x16 ... 0x1d:
+ case 0x1f:
+ {
+ /* Floating point: U, size[1] and opcode indicate operation;
+ * size[0] indicates single or double precision.
+ */
+ opcode |= (extract32(size, 1, 1) << 5) | (u << 6);
+ size = extract32(size, 0, 1) ? 3 : 2;
+ switch (opcode) {
+ case 0x16: /* FCVTN, FCVTN2 */
+ case 0x17: /* FCVTL, FCVTL2 */
+ case 0x18: /* FRINTN */
+ case 0x19: /* FRINTM */
+ case 0x1a: /* FCVTNS */
+ case 0x1b: /* FCVTMS */
+ case 0x1c: /* FCVTAS */
+ case 0x1d: /* SCVTF */
+ case 0x2c: /* FCMGT (zero) */
+ case 0x2d: /* FCMEQ (zero) */
+ case 0x2e: /* FCMLT (zero) */
+ case 0x2f: /* FABS */
+ case 0x38: /* FRINTP */
+ case 0x39: /* FRINTZ */
+ case 0x3a: /* FCVTPS */
+ case 0x3b: /* FCVTZS */
+ case 0x3c: /* URECPE */
+ case 0x3d: /* FRECPE */
+ case 0x56: /* FCVTXN, FCVTXN2 */
+ case 0x58: /* FRINTA */
+ case 0x59: /* FRINTX */
+ case 0x5a: /* FCVTNU */
+ case 0x5b: /* FCVTMU */
+ case 0x5c: /* FCVTAU */
+ case 0x5d: /* UCVTF */
+ case 0x6c: /* FCMGE (zero) */
+ case 0x6d: /* FCMLE (zero) */
+ case 0x6f: /* FNEG */
+ case 0x79: /* FRINTI */
+ case 0x7a: /* FCVTPU */
+ case 0x7b: /* FCVTZU */
+ case 0x7c: /* URSQRTE */
+ case 0x7d: /* FRSQRTE */
+ case 0x7f: /* FSQRT */
+ unsupported_encoding(s, insn);
+ return;
+ default:
+ unallocated_encoding(s);
+ return;
+ }
+ break;
+ }
+ default:
+ unallocated_encoding(s);
+ return;
+ }
}
/* C3.6.18 AdvSIMD vector x indexed element
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 09/13] target-arm: A64: Implement 2-register misc compares, ABS, NEG
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (7 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 08/13] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 10/13] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
` (4 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the simple 2-register-misc operations we can share
with the scalar-two-register-misc code. (SUQADD, USQADD, SQABS,
SQNEG also fall into this category, but aren't implemented in
the scalar-2-register case yet either.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 136 ++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 134 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index b0011da..c071663 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7377,6 +7377,8 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
int opcode = extract32(insn, 12, 5);
bool u = extract32(insn, 29, 1);
bool is_q = extract32(insn, 30, 1);
+ int rn = extract32(insn, 5, 5);
+ int rd = extract32(insn, 0, 5);
switch (opcode) {
case 0x0: /* REV64, REV32 */
@@ -7415,8 +7417,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
return;
}
/* fall through */
- case 0x3: /* SUQADD, USQADD */
- case 0x7: /* SQABS, SQNEG */
case 0x8: /* CMGT, CMGE */
case 0x9: /* CMEQ, CMLE */
case 0xb: /* ABS, NEG */
@@ -7424,6 +7424,13 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
unallocated_encoding(s);
return;
}
+ break;
+ case 0x3: /* SUQADD, USQADD */
+ case 0x7: /* SQABS, SQNEG */
+ if (size == 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
unsupported_encoding(s, insn);
return;
case 0xc ... 0xf:
@@ -7482,6 +7489,131 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
unallocated_encoding(s);
return;
}
+
+ if (size == 3) {
+ /* All 64-bit element operations can be shared with scalar 2misc */
+ int pass;
+
+ for (pass = 0; pass < (is_q ? 2 : 1); pass++) {
+ TCGv_i64 tcg_op = tcg_temp_new_i64();
+ TCGv_i64 tcg_res = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_op, rn, pass, MO_64);
+
+ handle_2misc_64(s, opcode, u, tcg_res, tcg_op);
+
+ write_vec_element(s, tcg_res, rd, pass, MO_64);
+
+ tcg_temp_free_i64(tcg_res);
+ tcg_temp_free_i64(tcg_op);
+ }
+ } else {
+ int pass;
+
+ for (pass = 0; pass < (is_q ? 4 : 2); pass++) {
+ TCGv_i32 tcg_op = tcg_temp_new_i32();
+ TCGv_i32 tcg_res = tcg_temp_new_i32();
+ TCGCond cond;
+
+ read_vec_element_i32(s, tcg_op, rn, pass, MO_32);
+
+ if (size == 2) {
+ /* Special cases for 32 bit elements */
+ switch (opcode) {
+ case 0xa: /* CMLT */
+ /* 32 bit integer comparison against zero, result is
+ * test ? (2^32 - 1) : 0. We implement via setcond(test)
+ * and inverting.
+ */
+ cond = TCG_COND_LT;
+ do_cmop:
+ tcg_gen_setcondi_i32(cond, tcg_res, tcg_op, 0);
+ tcg_gen_neg_i32(tcg_res, tcg_res);
+ break;
+ case 0x8: /* CMGT, CMGE */
+ cond = u ? TCG_COND_GE : TCG_COND_GT;
+ goto do_cmop;
+ case 0x9: /* CMEQ, CMLE */
+ cond = u ? TCG_COND_LE : TCG_COND_EQ;
+ goto do_cmop;
+ case 0xb: /* ABS, NEG */
+ if (u) {
+ tcg_gen_neg_i32(tcg_res, tcg_op);
+ } else {
+ TCGv_i32 tcg_zero = tcg_const_i32(0);
+ tcg_gen_neg_i32(tcg_res, tcg_op);
+ tcg_gen_movcond_i32(TCG_COND_GT, tcg_res, tcg_op,
+ tcg_zero, tcg_op, tcg_res);
+ tcg_temp_free_i32(tcg_zero);
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ } else {
+ /* Use helpers for 8 and 16 bit elements */
+ switch (opcode) {
+ case 0x8: /* CMGT, CMGE */
+ case 0x9: /* CMEQ, CMLE */
+ case 0xa: /* CMLT */
+ {
+ static NeonGenTwoOpFn * const fns[3][2] = {
+ { gen_helper_neon_cgt_s8, gen_helper_neon_cgt_s16 },
+ { gen_helper_neon_cge_s8, gen_helper_neon_cge_s16 },
+ { gen_helper_neon_ceq_u8, gen_helper_neon_ceq_u16 },
+ };
+ NeonGenTwoOpFn *genfn;
+ int comp;
+ bool reverse;
+ TCGv_i32 tcg_zero = tcg_const_i32(0);
+
+ /* comp = index into [CMGT, CMGE, CMEQ, CMLE, CMLT] */
+ comp = (opcode - 0x8) * 2 + u;
+ /* ...but LE, LT are implemented as reverse GE, GT */
+ reverse = (comp > 2);
+ if (reverse) {
+ comp = 4 - comp;
+ }
+ genfn = fns[comp][size];
+ if (reverse) {
+ genfn(tcg_res, tcg_zero, tcg_op);
+ } else {
+ genfn(tcg_res, tcg_op, tcg_zero);
+ }
+ tcg_temp_free_i32(tcg_zero);
+ break;
+ }
+ case 0xb: /* ABS, NEG */
+ if (u) {
+ TCGv_i32 tcg_zero = tcg_const_i32(0);
+ if (size) {
+ gen_helper_neon_sub_u16(tcg_res, tcg_zero, tcg_op);
+ } else {
+ gen_helper_neon_sub_u8(tcg_res, tcg_zero, tcg_op);
+ }
+ tcg_temp_free_i32(tcg_zero);
+ } else {
+ if (size) {
+ gen_helper_neon_abs_s16(tcg_res, tcg_op);
+ } else {
+ gen_helper_neon_abs_s8(tcg_res, tcg_op);
+ }
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+
+ write_vec_element_i32(s, tcg_res, rd, pass, MO_32);
+
+ tcg_temp_free_i32(tcg_res);
+ tcg_temp_free_i32(tcg_op);
+ }
+ }
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
}
/* C3.6.18 AdvSIMD vector x indexed element
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 10/13] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (8 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 09/13] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 11/13] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
` (3 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Implement the 2-reg-misc CNT, NOT and RBIT instructions.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/helper.h | 1 +
target-arm/neon_helper.c | 12 ++++++++++++
target-arm/translate-a64.c | 34 ++++++++++++++++++++++++++++------
3 files changed, 41 insertions(+), 6 deletions(-)
diff --git a/target-arm/helper.h b/target-arm/helper.h
index 71b8411..951e6ad 100644
--- a/target-arm/helper.h
+++ b/target-arm/helper.h
@@ -320,6 +320,7 @@ DEF_HELPER_1(neon_cls_s8, i32, i32)
DEF_HELPER_1(neon_cls_s16, i32, i32)
DEF_HELPER_1(neon_cls_s32, i32, i32)
DEF_HELPER_1(neon_cnt_u8, i32, i32)
+DEF_HELPER_FLAGS_1(neon_rbit_u8, TCG_CALL_NO_RWG_SE, i32, i32)
DEF_HELPER_3(neon_qdmulh_s16, i32, env, i32, i32)
DEF_HELPER_3(neon_qrdmulh_s16, i32, env, i32, i32)
diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c
index be6fbd9..b4c8690 100644
--- a/target-arm/neon_helper.c
+++ b/target-arm/neon_helper.c
@@ -1133,6 +1133,18 @@ uint32_t HELPER(neon_cnt_u8)(uint32_t x)
return x;
}
+/* Reverse bits in each 8 bit word */
+uint32_t HELPER(neon_rbit_u8)(uint32_t x)
+{
+ x = ((x & 0xf0f0f0f0) >> 4)
+ | ((x & 0x0f0f0f0f) << 4);
+ x = ((x & 0x88888888) >> 3)
+ | ((x & 0x44444444) >> 1)
+ | ((x & 0x22222222) << 1)
+ | ((x & 0x11111111) << 3);
+ return x;
+}
+
#define NEON_QDMULH16(dest, src1, src2, round) do { \
uint32_t tmp = (int32_t)(int16_t) src1 * (int16_t) src2; \
if ((tmp ^ (tmp << 1)) & SIGNBIT) { \
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index c071663..dd1bbeb 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6222,6 +6222,12 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
TCGCond cond;
switch (opcode) {
+ case 0x5: /* NOT */
+ /* This opcode is shared with CNT and RBIT but we have earlier
+ * enforced that size == 3 if and only if this is the NOT insn.
+ */
+ tcg_gen_not_i64(tcg_rd, tcg_rn);
+ break;
case 0xa: /* CMLT */
/* 64 bit integer comparison against zero, result is
* test ? (2^64 - 1) : 0. We implement via setcond(!test) and
@@ -7385,13 +7391,19 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
case 0x1: /* REV16 */
unsupported_encoding(s, insn);
return;
- case 0x5: /* CNT, NOT, RBIT */
- if ((u == 0 && size > 0) ||
- (u == 1 && size > 1)) {
- unallocated_encoding(s);
- return;
+ case 0x5: /* CNT, NOT, RBIT */
+ if (u && size == 0) {
+ /* NOT: adjust size so we can use the 64-bits-at-a-time loop. */
+ size = 3;
+ break;
+ } else if (u && size == 1) {
+ /* RBIT */
+ break;
+ } else if (!u && size == 0) {
+ /* CNT */
+ break;
}
- unsupported_encoding(s, insn);
+ unallocated_encoding(s);
return;
case 0x2: /* SADDLP, UADDLP */
case 0x4: /* CLS, CLZ */
@@ -7553,6 +7565,16 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
} else {
/* Use helpers for 8 and 16 bit elements */
switch (opcode) {
+ case 0x5: /* CNT, RBIT */
+ /* For these two insns size is part of the opcode specifier
+ * (handled earlier); they always operate on byte elements.
+ */
+ if (u) {
+ gen_helper_neon_rbit_u8(tcg_res, tcg_op);
+ } else {
+ gen_helper_neon_cnt_u8(tcg_res, tcg_op);
+ }
+ break;
case 0x8: /* CMGT, CMGE */
case 0x9: /* CMEQ, CMLE */
case 0xa: /* CMLT */
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 11/13] target-arm: A64: Add narrowing 2-reg-misc instructions
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (9 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 10/13] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 12/13] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
` (2 subsequent siblings)
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Add the narrowing integer instructions in the 2-reg-misc class.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 85 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 83 insertions(+), 2 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index dd1bbeb..42457e4 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -75,6 +75,8 @@ typedef struct AArch64DecodeTable {
/* Function prototype for gen_ functions for calling Neon helpers */
typedef void NeonGenTwoOpFn(TCGv_i32, TCGv_i32, TCGv_i32);
typedef void NeonGenTwoOpEnvFn(TCGv_i32, TCGv_ptr, TCGv_i32, TCGv_i32);
+typedef void NeonGenNarrowFn(TCGv_i32, TCGv_i64);
+typedef void NeonGenNarrowEnvFn(TCGv_i32, TCGv_ptr, TCGv_i64);
/* initialize TCG globals. */
void a64_translate_init(void)
@@ -7371,6 +7373,79 @@ static void disas_simd_three_reg_same(DisasContext *s, uint32_t insn)
}
}
+static void handle_2misc_narrow(DisasContext *s, int opcode, bool u, bool is_q,
+ int size, int rn, int rd)
+{
+ /* Handle 2-reg-misc ops which are narrowing (so each 2*size element
+ * in the source becomes a size element in the destination).
+ */
+ int pass;
+ TCGv_i32 tcg_res[2];
+ int destelt = is_q ? 2 : 0;
+
+ for (pass = 0; pass < 2; pass++) {
+ TCGv_i64 tcg_op = tcg_temp_new_i64();
+ NeonGenNarrowFn *genfn = NULL;
+ NeonGenNarrowEnvFn *genenvfn = NULL;
+
+ read_vec_element(s, tcg_op, rn, pass, MO_64);
+ tcg_res[pass] = tcg_temp_new_i32();
+
+ switch (opcode) {
+ case 0x12: /* XTN, SQXTUN */
+ {
+ static NeonGenNarrowFn * const xtnfns[3] = {
+ gen_helper_neon_narrow_u8,
+ gen_helper_neon_narrow_u16,
+ tcg_gen_trunc_i64_i32,
+ };
+ static NeonGenNarrowEnvFn * const sqxtunfns[3] = {
+ gen_helper_neon_unarrow_sat8,
+ gen_helper_neon_unarrow_sat16,
+ gen_helper_neon_unarrow_sat32,
+ };
+ if (u) {
+ genenvfn = sqxtunfns[size];
+ } else {
+ genfn = xtnfns[size];
+ }
+ break;
+ }
+ case 0x14: /* SQXTN, UQXTN */
+ {
+ static NeonGenNarrowEnvFn * const fns[3][2] = {
+ { gen_helper_neon_narrow_sat_s8,
+ gen_helper_neon_narrow_sat_u8 },
+ { gen_helper_neon_narrow_sat_s16,
+ gen_helper_neon_narrow_sat_u16 },
+ { gen_helper_neon_narrow_sat_s32,
+ gen_helper_neon_narrow_sat_u32 },
+ };
+ genenvfn = fns[size][u];
+ break;
+ }
+ default:
+ g_assert_not_reached();
+ }
+
+ if (genfn) {
+ genfn(tcg_res[pass], tcg_op);
+ } else {
+ genenvfn(tcg_res[pass], cpu_env, tcg_op);
+ }
+
+ tcg_temp_free_i64(tcg_op);
+ }
+
+ for (pass = 0; pass < 2; pass++) {
+ write_vec_element_i32(s, tcg_res[pass], rd, destelt + pass, MO_32);
+ tcg_temp_free_i32(tcg_res[pass]);
+ }
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
+}
+
/* C3.6.17 AdvSIMD two reg misc
* 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0
* +---+---+---+-----------+------+-----------+--------+-----+------+------+
@@ -7405,11 +7480,17 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
}
unallocated_encoding(s);
return;
+ case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */
+ case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */
+ if (size == 3) {
+ unallocated_encoding(s);
+ return;
+ }
+ handle_2misc_narrow(s, opcode, u, is_q, size, rn, rd);
+ return;
case 0x2: /* SADDLP, UADDLP */
case 0x4: /* CLS, CLZ */
case 0x6: /* SADALP, UADALP */
- case 0x12: /* XTN, XTN2, SQXTUN, SQXTUN2 */
- case 0x14: /* SQXTN, SQXTN2, UQXTN, UQXTN2 */
if (size == 3) {
unallocated_encoding(s);
return;
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 12/13] target-arm: A64: Add 2-reg-misc REV* instructions
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (10 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 11/13] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-03 21:23 ` Richard Henderson
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 13/13] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
2014-02-03 23:34 ` [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
13 siblings, 1 reply; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
From: Alex Bennée <alex.bennee@linaro.org>
Add the byte-reverse operations REV64, REV32 and REV16 from the
two-reg-misc group.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
target-arm/translate-a64.c | 71 +++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 70 insertions(+), 1 deletion(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index 42457e4..a941c48 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -7446,6 +7446,75 @@ static void handle_2misc_narrow(DisasContext *s, int opcode, bool u, bool is_q,
}
}
+static void handle_rev(DisasContext *s, int opcode, bool u,
+ bool is_q, int size, int rn, int rd)
+{
+ int op = (opcode << 1) | u;
+ int opsz = op + size;
+ int grp_size = 3 - opsz;
+ int dsize = is_q ? 128 : 64;
+ int i;
+
+ if (opsz >= 3) {
+ unallocated_encoding(s);
+ return;
+ }
+
+ if (size == 0) {
+ /* Special case bytes, use bswap op on each group of elements */
+ int groups = dsize / (8 << grp_size);
+
+ for (i = 0; i < groups; i++) {
+ TCGv_i64 tcg_tmp = tcg_temp_new_i64();
+
+ read_vec_element(s, tcg_tmp, rn, i, grp_size);
+ switch (grp_size) {
+ case MO_16:
+ tcg_gen_bswap16_i64(tcg_tmp, tcg_tmp);
+ break;
+ case MO_32:
+ tcg_gen_bswap32_i64(tcg_tmp, tcg_tmp);
+ break;
+ case MO_64:
+ tcg_gen_bswap64_i64(tcg_tmp, tcg_tmp);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ write_vec_element(s, tcg_tmp, rd, i, grp_size);
+ tcg_temp_free_i64(tcg_tmp);
+ }
+ if (!is_q) {
+ clear_vec_high(s, rd);
+ }
+ } else {
+ int revmask = (1 << grp_size) - 1;
+ int esize = 8 << size;
+ int elements = dsize / esize;
+ TCGv_i64 tcg_rn = tcg_temp_new_i64();
+ TCGv_i64 tcg_rd = tcg_const_i64(0);
+ TCGv_i64 tcg_rd_hi = tcg_const_i64(0);
+
+ for (i = 0; i < elements; i++) {
+ int e_rev = (i & 0xf) ^ revmask;
+ int off = e_rev * esize;
+ read_vec_element(s, tcg_rn, rn, i, size);
+ if (off >= 64) {
+ tcg_gen_deposit_i64(tcg_rd_hi, tcg_rd_hi,
+ tcg_rn, off - 64, esize);
+ } else {
+ tcg_gen_deposit_i64(tcg_rd, tcg_rd, tcg_rn, off, esize);
+ }
+ }
+ write_vec_element(s, tcg_rd, rd, 0, MO_64);
+ write_vec_element(s, tcg_rd_hi, rd, 1, MO_64);
+
+ tcg_temp_free_i64(tcg_rd_hi);
+ tcg_temp_free_i64(tcg_rd);
+ tcg_temp_free_i64(tcg_rn);
+ }
+}
+
/* C3.6.17 AdvSIMD two reg misc
* 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0
* +---+---+---+-----------+------+-----------+--------+-----+------+------+
@@ -7464,7 +7533,7 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
switch (opcode) {
case 0x0: /* REV64, REV32 */
case 0x1: /* REV16 */
- unsupported_encoding(s, insn);
+ handle_rev(s, opcode, u, is_q, size, rn, rd);
return;
case 0x5: /* CNT, NOT, RBIT */
if (u && size == 0) {
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [Qemu-devel] [PATCH v2 13/13] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (11 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 12/13] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
@ 2014-02-01 23:00 ` Peter Maydell
2014-02-03 23:34 ` [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-01 23:00 UTC (permalink / raw)
To: qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall, Richard Henderson
Add the SIMD FNEG and FABS instructions in the SIMD 2-reg-misc group.
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
---
target-arm/translate-a64.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index a941c48..5698b3e 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -6219,7 +6219,7 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
{
/* Handle 64->64 opcodes which are shared between the scalar and
* vector 2-reg-misc groups. We cover every integer opcode where size == 3
- * is valid in either group.
+ * is valid in either group and also the double-precision fp ops.
*/
TCGCond cond;
@@ -6257,6 +6257,12 @@ static void handle_2misc_64(DisasContext *s, int opcode, bool u,
tcg_temp_free_i64(tcg_zero);
}
break;
+ case 0x2f: /* FABS */
+ gen_helper_vfp_absd(tcg_rd, tcg_rn);
+ break;
+ case 0x6f: /* FNEG */
+ gen_helper_vfp_negd(tcg_rd, tcg_rn);
+ break;
default:
g_assert_not_reached();
}
@@ -7605,6 +7611,13 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
opcode |= (extract32(size, 1, 1) << 5) | (u << 6);
size = extract32(size, 0, 1) ? 3 : 2;
switch (opcode) {
+ case 0x2f: /* FABS */
+ case 0x6f: /* FNEG */
+ if (size == 3 && !is_q) {
+ unallocated_encoding(s);
+ return;
+ }
+ break;
case 0x16: /* FCVTN, FCVTN2 */
case 0x17: /* FCVTL, FCVTL2 */
case 0x18: /* FRINTN */
@@ -7616,7 +7629,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
case 0x2c: /* FCMGT (zero) */
case 0x2d: /* FCMEQ (zero) */
case 0x2e: /* FCMLT (zero) */
- case 0x2f: /* FABS */
case 0x38: /* FRINTP */
case 0x39: /* FRINTZ */
case 0x3a: /* FCVTPS */
@@ -7632,7 +7644,6 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
case 0x5d: /* UCVTF */
case 0x6c: /* FCMGE (zero) */
case 0x6d: /* FCMLE (zero) */
- case 0x6f: /* FNEG */
case 0x79: /* FRINTI */
case 0x7a: /* FCVTPU */
case 0x7b: /* FCVTZU */
@@ -7709,6 +7720,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn)
tcg_temp_free_i32(tcg_zero);
}
break;
+ case 0x2f: /* FABS */
+ gen_helper_vfp_abss(tcg_res, tcg_op);
+ break;
+ case 0x6f: /* FNEG */
+ gen_helper_vfp_negs(tcg_res, tcg_op);
+ break;
default:
g_assert_not_reached();
}
--
1.8.5
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH v2 02/13] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 02/13] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
@ 2014-02-03 21:21 ` Richard Henderson
0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2014-02-03 21:21 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall
On 02/01/2014 02:59 PM, Peter Maydell wrote:
> Implement the SIMD 3-reg-same instructions where the size == 3 case
> is reserved: SHADD, UHADD, SRHADD, URHADD, SHSUB, UHSUB, SMAX,
> UMAX, SMIN, UMIN, SABD, UABD, SABA, UABA, MLA, MLS, MUL, PMUL,
> SQRDMULH, SQDMULH. (None of these have scalar-3-same versions.)
> This completes the non-pairwise integer instructions in this category.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH v2 12/13] target-arm: A64: Add 2-reg-misc REV* instructions
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 12/13] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
@ 2014-02-03 21:23 ` Richard Henderson
0 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2014-02-03 21:23 UTC (permalink / raw)
To: Peter Maydell, qemu-devel
Cc: Peter Crosthwaite, patches, Michael Matz, Alexander Graf,
Claudio Fontana, Dirk Mueller, Will Newton, Laurent Desnogues,
Alex Bennée, kvmarm, Christoffer Dall
On 02/01/2014 03:00 PM, Peter Maydell wrote:
> From: Alex Bennée <alex.bennee@linaro.org>
>
> Add the byte-reverse operations REV64, REV32 and REV16 from the
> two-reg-misc group.
>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> target-arm/translate-a64.c | 71 +++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 70 insertions(+), 1 deletion(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
` (12 preceding siblings ...)
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 13/13] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
@ 2014-02-03 23:34 ` Peter Maydell
13 siblings, 0 replies; 17+ messages in thread
From: Peter Maydell @ 2014-02-03 23:34 UTC (permalink / raw)
To: QEMU Developers
Cc: Peter Crosthwaite, Laurent Desnogues, Patch Tracking,
Michael Matz, Claudio Fontana, Dirk Mueller, Will Newton,
kvmarm@lists.cs.columbia.edu, Richard Henderson
On 1 February 2014 22:59, Peter Maydell <peter.maydell@linaro.org> wrote:
> This is the v2 from my 'Neon second and third sets' patch from
> last week. The first 8 patches from that were all OK so have gone
> into target-arm.next.
Thanks for the review; I've added this series to target-arm.next.
-- PMM
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2014-02-03 23:34 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-01 22:59 [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 01/13] target-arm: A64: Implement SIMD 3-reg-same shift and saturate insns Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 02/13] target-arm: A64: Implement remaining non-pairwise int SIMD 3-reg-same insns Peter Maydell
2014-02-03 21:21 ` Richard Henderson
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 03/13] target-arm: A64: Implement pairwise integer ops from 3-reg-same SIMD Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 04/13] tcg: Add TCGV_UNUSED_PTR, TCGV_IS_UNUSED_PTR, TCGV_EQUAL_PTR Peter Maydell
2014-02-01 22:59 ` [Qemu-devel] [PATCH v2 05/13] target-arm: A64: Implement scalar pairwise ops Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 06/13] target-arm: A64: Implement remaining integer scalar-3-same insns Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 07/13] target-arm: A64: Add SIMD simple 64 bit insns from scalar 2-reg misc Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 08/13] target-arm: A64: Add skeleton decode for SIMD 2-reg misc group Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 09/13] target-arm: A64: Implement 2-register misc compares, ABS, NEG Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 10/13] target-arm: A64: Implement 2-reg-misc CNT, NOT and RBIT Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 11/13] target-arm: A64: Add narrowing 2-reg-misc instructions Peter Maydell
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 12/13] target-arm: A64: Add 2-reg-misc REV* instructions Peter Maydell
2014-02-03 21:23 ` Richard Henderson
2014-02-01 23:00 ` [Qemu-devel] [PATCH v2 13/13] target-arm: A64: Add FNEG and FABS to the SIMD 2-reg-misc group Peter Maydell
2014-02-03 23:34 ` [Qemu-devel] [PATCH v2 00/13] A64: Add Neon instructions, third set Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).