* [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2
@ 2016-07-28 6:49 Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.] Nikunj A Dadhania
` (7 more replies)
0 siblings, 8 replies; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh
This series contains 11 new instructions for POWER9 described in ISA3.0.
Patches:
01-02: Changes following instructions:
divd[u][o][.]: Divide Doubleword Signed/Unsigned
divw[u][o][.]: Divide Word Signed/Unsigned
03: dtstsfi[q] : DFP Test Significance Immediate [Quad]
04: vabsdub : Vector Absolute Difference Unsigned Byte
vabsduh : Vector Absolute Difference Unsigned Halfword
vabsduw : Vector Absolute Difference Unsigned Word
05: vcmpnezb[.] : Vector Compare Not Equal or Zero Byte
vcmpnezh[.] : Vector Compare Not Equal or Zero Halfword
vcmpnezw[.] : Vector Compare Not Equal or Zero Word
06: vslv : Vector Shift Left Variable
07: vsrv : Vector Shift Right Variable
08: extswsli : Extend Sign Word & Shift Left Immediate
Both part1 and part2 pushed here: https://github.com/nikunjad/qemu/tree/p9-tcg
Changelog:
v0:
* Introduce helpers for ISA300 ops
* vabsdu*: drop etype from implementation
* vcmpnez*: collapse the switch case
* vsrv: use reverse traversal to get rid of temporary array
* Include divd/w in this series, as part1 mostly is pushed.
Nikunj A Dadhania (3):
target-ppc: implement branch-less divw[o][.]
target-ppc: implement branch-less divd[o][.]
target-ppc: add extswsli[.] instruction
Sandipan Das (2):
target-ppc: add dtstsfi[q] instructions
target-ppc: add vabsdu[b,h,w] instructions
Swapnil Bokade (1):
target-ppc: add vcmpnez[b,h,w][.] instructions
Vivek Andrew Sha (2):
target-ppc: add vslv instruction
target-ppc: add vsrv instruction
target-ppc/dfp_helper.c | 35 +++++++++++
target-ppc/helper.h | 13 +++++
target-ppc/int_helper.c | 89 ++++++++++++++++++++++++++++
target-ppc/translate.c | 126 +++++++++++++++++++++++++---------------
target-ppc/translate/dfp-impl.c | 20 +++++++
target-ppc/translate/dfp-ops.c | 14 +++++
target-ppc/translate/vmx-impl.c | 14 +++++
target-ppc/translate/vmx-ops.c | 20 ++++++-
8 files changed, 281 insertions(+), 50 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.]
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 12:44 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 2/8] target-ppc: implement branch-less divd[o][.] Nikunj A Dadhania
` (6 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh
While implementing modulo instructions figured out that the
implementation uses many branches. Change the logic to achieve the
branch-less code. Undefined value is set to dividend in case of invalid
input.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target-ppc/translate.c | 48 +++++++++++++++++++++++-------------------------
1 file changed, 23 insertions(+), 25 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 3dd9a48..2a5ce3f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1096,41 +1096,39 @@ static void gen_addpcis(DisasContext *ctx)
static inline void gen_op_arith_divw(DisasContext *ctx, TCGv ret, TCGv arg1,
TCGv arg2, int sign, int compute_ov)
{
- TCGLabel *l1 = gen_new_label();
- TCGLabel *l2 = gen_new_label();
- TCGv_i32 t0 = tcg_temp_local_new_i32();
- TCGv_i32 t1 = tcg_temp_local_new_i32();
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+ TCGv_i32 t2 = tcg_temp_new_i32();
+ TCGv_i32 t3 = tcg_temp_new_i32();
tcg_gen_trunc_tl_i32(t0, arg1);
tcg_gen_trunc_tl_i32(t1, arg2);
- tcg_gen_brcondi_i32(TCG_COND_EQ, t1, 0, l1);
- if (sign) {
- TCGLabel *l3 = gen_new_label();
- tcg_gen_brcondi_i32(TCG_COND_NE, t1, -1, l3);
- tcg_gen_brcondi_i32(TCG_COND_EQ, t0, INT32_MIN, l1);
- gen_set_label(l3);
- tcg_gen_div_i32(t0, t0, t1);
- } else {
- tcg_gen_divu_i32(t0, t0, t1);
- }
- if (compute_ov) {
- tcg_gen_movi_tl(cpu_ov, 0);
- }
- tcg_gen_br(l2);
- gen_set_label(l1);
if (sign) {
- tcg_gen_sari_i32(t0, t0, 31);
+ tcg_gen_setcondi_i32(TCG_COND_EQ, t2, t0, INT_MIN);
+ tcg_gen_setcondi_i32(TCG_COND_EQ, t3, t1, -1);
+ tcg_gen_and_i32(t2, t2, t3);
+ tcg_gen_setcondi_i32(TCG_COND_EQ, t3, t1, 0);
+ tcg_gen_or_i32(t2, t2, t3);
+ tcg_gen_movi_i32(t3, 0);
+ tcg_gen_movcond_i32(TCG_COND_NE, t1, t2, t3, t2, t1);
+ tcg_gen_div_i32(t3, t0, t1);
+ tcg_gen_extu_i32_tl(ret, t3);
} else {
- tcg_gen_movi_i32(t0, 0);
+ tcg_gen_setcondi_i32(TCG_COND_EQ, t2, t1, 0);
+ tcg_gen_movi_i32(t3, 0);
+ tcg_gen_movcond_i32(TCG_COND_NE, t1, t2, t3, t2, t1);
+ tcg_gen_divu_i32(t3, t0, t1);
+ tcg_gen_extu_i32_tl(ret, t3);
}
if (compute_ov) {
- tcg_gen_movi_tl(cpu_ov, 1);
- tcg_gen_movi_tl(cpu_so, 1);
+ tcg_gen_extu_i32_tl(cpu_ov, t2);
+ tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
}
- gen_set_label(l2);
- tcg_gen_extu_i32_tl(ret, t0);
tcg_temp_free_i32(t0);
tcg_temp_free_i32(t1);
+ tcg_temp_free_i32(t2);
+ tcg_temp_free_i32(t3);
+
if (unlikely(Rc(ctx->opcode) != 0))
gen_set_Rc0(ctx, ret);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 2/8] target-ppc: implement branch-less divd[o][.]
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.] Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 12:45 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 3/8] target-ppc: add dtstsfi[q] instructions Nikunj A Dadhania
` (5 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh
Similar to divw, implement branch-less divd.
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target-ppc/translate.c | 48 ++++++++++++++++++++++++++----------------------
1 file changed, 26 insertions(+), 22 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 2a5ce3f..82349ed 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -1169,37 +1169,41 @@ GEN_DIVE(divweo, divwe, 1);
static inline void gen_op_arith_divd(DisasContext *ctx, TCGv ret, TCGv arg1,
TCGv arg2, int sign, int compute_ov)
{
- TCGLabel *l1 = gen_new_label();
- TCGLabel *l2 = gen_new_label();
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_i64 t1 = tcg_temp_new_i64();
+ TCGv_i64 t2 = tcg_temp_new_i64();
+ TCGv_i64 t3 = tcg_temp_new_i64();
- tcg_gen_brcondi_i64(TCG_COND_EQ, arg2, 0, l1);
- if (sign) {
- TCGLabel *l3 = gen_new_label();
- tcg_gen_brcondi_i64(TCG_COND_NE, arg2, -1, l3);
- tcg_gen_brcondi_i64(TCG_COND_EQ, arg1, INT64_MIN, l1);
- gen_set_label(l3);
- tcg_gen_div_i64(ret, arg1, arg2);
- } else {
- tcg_gen_divu_i64(ret, arg1, arg2);
- }
- if (compute_ov) {
- tcg_gen_movi_tl(cpu_ov, 0);
- }
- tcg_gen_br(l2);
- gen_set_label(l1);
+ tcg_gen_mov_i64(t0, arg1);
+ tcg_gen_mov_i64(t1, arg2);
if (sign) {
- tcg_gen_sari_i64(ret, arg1, 63);
+ tcg_gen_setcondi_i64(TCG_COND_EQ, t2, t0, INT64_MIN);
+ tcg_gen_setcondi_i64(TCG_COND_EQ, t3, t1, -1);
+ tcg_gen_and_i64(t2, t2, t3);
+ tcg_gen_setcondi_i64(TCG_COND_EQ, t3, t1, 0);
+ tcg_gen_or_i64(t2, t2, t3);
+ tcg_gen_movi_i64(t3, 0);
+ tcg_gen_movcond_i64(TCG_COND_NE, t1, t2, t3, t2, t1);
+ tcg_gen_div_i64(ret, t0, t1);
} else {
- tcg_gen_movi_i64(ret, 0);
+ tcg_gen_setcondi_i64(TCG_COND_EQ, t2, t1, 0);
+ tcg_gen_movi_i64(t3, 0);
+ tcg_gen_movcond_i64(TCG_COND_NE, t1, t2, t3, t2, t1);
+ tcg_gen_divu_i64(ret, t0, t1);
}
if (compute_ov) {
- tcg_gen_movi_tl(cpu_ov, 1);
- tcg_gen_movi_tl(cpu_so, 1);
+ tcg_gen_mov_tl(cpu_ov, t2);
+ tcg_gen_or_tl(cpu_so, cpu_so, cpu_ov);
}
- gen_set_label(l2);
+ tcg_temp_free_i64(t0);
+ tcg_temp_free_i64(t1);
+ tcg_temp_free_i64(t2);
+ tcg_temp_free_i64(t3);
+
if (unlikely(Rc(ctx->opcode) != 0))
gen_set_Rc0(ctx, ret);
}
+
#define GEN_INT_ARITH_DIVD(name, opc3, sign, compute_ov) \
static void glue(gen_, name)(DisasContext *ctx) \
{ \
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 3/8] target-ppc: add dtstsfi[q] instructions
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.] Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 2/8] target-ppc: implement branch-less divd[o][.] Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions Nikunj A Dadhania
` (4 subsequent siblings)
7 siblings, 0 replies; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh, Sandipan Das
From: Sandipan Das <sandipandas1990@gmail.com>
DFP Test Significance Immediate [Quad]
Signed-off-by: Sandipan Das <sandipandas1990@gmail.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
target-ppc/dfp_helper.c | 35 +++++++++++++++++++++++++++++++++++
target-ppc/helper.h | 2 ++
target-ppc/translate/dfp-impl.c | 20 ++++++++++++++++++++
target-ppc/translate/dfp-ops.c | 14 ++++++++++++++
4 files changed, 71 insertions(+)
diff --git a/target-ppc/dfp_helper.c b/target-ppc/dfp_helper.c
index db0ede6..9164fe7 100644
--- a/target-ppc/dfp_helper.c
+++ b/target-ppc/dfp_helper.c
@@ -647,6 +647,41 @@ uint32_t helper_##op(CPUPPCState *env, uint64_t *a, uint64_t *b) \
DFP_HELPER_TSTSF(dtstsf, 64)
DFP_HELPER_TSTSF(dtstsfq, 128)
+#define DFP_HELPER_TSTSFI(op, size) \
+uint32_t helper_##op(CPUPPCState *env, uint32_t a, uint64_t *b) \
+{ \
+ struct PPC_DFP dfp; \
+ unsigned uim; \
+ \
+ dfp_prepare_decimal##size(&dfp, 0, b, env); \
+ \
+ uim = a & 0x3F; \
+ \
+ if (unlikely(decNumberIsSpecial(&dfp.b))) { \
+ dfp.crbf = 1; \
+ } else if (uim == 0) { \
+ dfp.crbf = 4; \
+ } else if (unlikely(decNumberIsZero(&dfp.b))) { \
+ /* Zero has no sig digits */ \
+ dfp.crbf = 4; \
+ } else { \
+ unsigned nsd = dfp.b.digits; \
+ if (uim < nsd) { \
+ dfp.crbf = 8; \
+ } else if (uim > nsd) { \
+ dfp.crbf = 4; \
+ } else { \
+ dfp.crbf = 2; \
+ } \
+ } \
+ \
+ dfp_set_FPCC_from_CRBF(&dfp); \
+ return dfp.crbf; \
+}
+
+DFP_HELPER_TSTSFI(dtstsfi, 64)
+DFP_HELPER_TSTSFI(dtstsfiq, 128)
+
static void QUA_PPs(struct PPC_DFP *dfp)
{
dfp_set_FPRF_from_FRT(dfp);
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 3b6e6e6..27f2638 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -647,6 +647,8 @@ DEF_HELPER_3(dtstex, i32, env, fprp, fprp)
DEF_HELPER_3(dtstexq, i32, env, fprp, fprp)
DEF_HELPER_3(dtstsf, i32, env, fprp, fprp)
DEF_HELPER_3(dtstsfq, i32, env, fprp, fprp)
+DEF_HELPER_3(dtstsfi, i32, env, i32, fprp)
+DEF_HELPER_3(dtstsfiq, i32, env, i32, fprp)
DEF_HELPER_5(dquai, void, env, fprp, fprp, i32, i32)
DEF_HELPER_5(dquaiq, void, env, fprp, fprp, i32, i32)
DEF_HELPER_5(dqua, void, env, fprp, fprp, fprp, i32)
diff --git a/target-ppc/translate/dfp-impl.c b/target-ppc/translate/dfp-impl.c
index bf59951..178d304 100644
--- a/target-ppc/translate/dfp-impl.c
+++ b/target-ppc/translate/dfp-impl.c
@@ -45,6 +45,24 @@ static void gen_##name(DisasContext *ctx) \
tcg_temp_free_ptr(rb); \
}
+#define GEN_DFP_BF_I_B(name) \
+static void gen_##name(DisasContext *ctx) \
+{ \
+ TCGv_i32 uim; \
+ TCGv_ptr rb; \
+ if (unlikely(!ctx->fpu_enabled)) { \
+ gen_exception(ctx, POWERPC_EXCP_FPU); \
+ return; \
+ } \
+ gen_update_nip(ctx, ctx->nip - 4); \
+ uim = tcg_const_i32(UIMM5(ctx->opcode)); \
+ rb = gen_fprp_ptr(rB(ctx->opcode)); \
+ gen_helper_##name(cpu_crf[crfD(ctx->opcode)], \
+ cpu_env, uim, rb); \
+ tcg_temp_free_i32(uim); \
+ tcg_temp_free_ptr(rb); \
+}
+
#define GEN_DFP_BF_A_DCM(name) \
static void gen_##name(DisasContext *ctx) \
{ \
@@ -172,6 +190,8 @@ GEN_DFP_BF_A_B(dtstex)
GEN_DFP_BF_A_B(dtstexq)
GEN_DFP_BF_A_B(dtstsf)
GEN_DFP_BF_A_B(dtstsfq)
+GEN_DFP_BF_I_B(dtstsfi)
+GEN_DFP_BF_I_B(dtstsfiq)
GEN_DFP_T_B_U32_U32_Rc(dquai, SIMM5, RMC)
GEN_DFP_T_B_U32_U32_Rc(dquaiq, SIMM5, RMC)
GEN_DFP_T_A_B_I32_Rc(dqua, RMC)
diff --git a/target-ppc/translate/dfp-ops.c b/target-ppc/translate/dfp-ops.c
index 7f27d0f..6ef38e5 100644
--- a/target-ppc/translate/dfp-ops.c
+++ b/target-ppc/translate/dfp-ops.c
@@ -1,6 +1,9 @@
#define _GEN_DFP_LONG(name, op1, op2, mask) \
GEN_HANDLER_E(name, 0x3B, op1, op2, mask, PPC_NONE, PPC2_DFP)
+#define _GEN_DFP_LONG_300(name, op1, op2, mask) \
+GEN_HANDLER_E(name, 0x3B, op1, op2, mask, PPC_NONE, PPC2_ISA300)
+
#define _GEN_DFP_LONGx2(name, op1, op2, mask) \
GEN_HANDLER_E(name, 0x3B, op1, 0x00 | op2, mask, PPC_NONE, PPC2_DFP), \
GEN_HANDLER_E(name, 0x3B, op1, 0x10 | op2, mask, PPC_NONE, PPC2_DFP)
@@ -14,6 +17,9 @@ GEN_HANDLER_E(name, 0x3B, op1, 0x18 | op2, mask, PPC_NONE, PPC2_DFP)
#define _GEN_DFP_QUAD(name, op1, op2, mask) \
GEN_HANDLER_E(name, 0x3F, op1, op2, mask, PPC_NONE, PPC2_DFP)
+#define _GEN_DFP_QUAD_300(name, op1, op2, mask) \
+GEN_HANDLER_E(name, 0x3F, op1, op2, mask, PPC_NONE, PPC2_ISA300)
+
#define _GEN_DFP_QUADx2(name, op1, op2, mask) \
GEN_HANDLER_E(name, 0x3F, op1, 0x00 | op2, mask, PPC_NONE, PPC2_DFP), \
GEN_HANDLER_E(name, 0x3F, op1, 0x10 | op2, mask, PPC_NONE, PPC2_DFP)
@@ -48,12 +54,18 @@ _GEN_DFP_QUAD(name, op1, op2, 0x001F0800)
#define GEN_DFP_BF_A_B(name, op1, op2) \
_GEN_DFP_LONG(name, op1, op2, 0x00000001)
+#define GEN_DFP_BF_A_B_300(name, op1, op2) \
+_GEN_DFP_LONG_300(name, op1, op2, 0x00400001)
+
#define GEN_DFP_BF_Ap_Bp(name, op1, op2) \
_GEN_DFP_QUAD(name, op1, op2, 0x00610801)
#define GEN_DFP_BF_A_Bp(name, op1, op2) \
_GEN_DFP_QUAD(name, op1, op2, 0x00600801)
+#define GEN_DFP_BF_A_Bp_300(name, op1, op2) \
+_GEN_DFP_QUAD_300(name, op1, op2, 0x00400001)
+
#define GEN_DFP_BF_A_DCM(name, op1, op2) \
_GEN_DFP_LONGx2(name, op1, op2, 0x00600001)
@@ -119,6 +131,8 @@ GEN_DFP_BF_A_B(dtstex, 0x02, 0x05),
GEN_DFP_BF_Ap_Bp(dtstexq, 0x02, 0x05),
GEN_DFP_BF_A_B(dtstsf, 0x02, 0x15),
GEN_DFP_BF_A_Bp(dtstsfq, 0x02, 0x15),
+GEN_DFP_BF_A_B_300(dtstsfi, 0x03, 0x15),
+GEN_DFP_BF_A_Bp_300(dtstsfiq, 0x03, 0x15),
GEN_DFP_TE_T_B_RMC_Rc(dquai, 0x03, 0x02),
GEN_DFP_TE_Tp_Bp_RMC_Rc(dquaiq, 0x03, 0x02),
GEN_DFP_T_A_B_RMC_Rc(dqua, 0x03, 0x00),
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
` (2 preceding siblings ...)
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 3/8] target-ppc: add dtstsfi[q] instructions Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 12:52 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions Nikunj A Dadhania
` (3 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh, Sandipan Das
From: Sandipan Das <sandipandas1990@gmail.com>
Adds following instructions:
vabsdub: Vector Absolute Difference Unsigned Byte
vabsduh: Vector Absolute Difference Unsigned Halfword
vabsduw: Vector Absolute Difference Unsigned Word
Signed-off-by: Sandipan Das <sandipandas1990@gmail.com>
[ use ISA300 define and abs(). Drop etype ]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target-ppc/helper.h | 3 +++
target-ppc/int_helper.c | 22 ++++++++++++++++++++++
target-ppc/translate/vmx-impl.c | 9 +++++++++
target-ppc/translate/vmx-ops.c | 6 +++---
4 files changed, 37 insertions(+), 3 deletions(-)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 27f2638..1e68060 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -118,6 +118,9 @@ DEF_HELPER_3(vsubudm, void, avr, avr, avr)
DEF_HELPER_3(vavgub, void, avr, avr, avr)
DEF_HELPER_3(vavguh, void, avr, avr, avr)
DEF_HELPER_3(vavguw, void, avr, avr, avr)
+DEF_HELPER_3(vabsdub, void, avr, avr, avr)
+DEF_HELPER_3(vabsduh, void, avr, avr, avr)
+DEF_HELPER_3(vabsduw, void, avr, avr, avr)
DEF_HELPER_3(vavgsb, void, avr, avr, avr)
DEF_HELPER_3(vavgsh, void, avr, avr, avr)
DEF_HELPER_3(vavgsw, void, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 15947ad..2b375b4 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -629,6 +629,28 @@ VAVG(w, s32, int64_t, u32, uint64_t)
#undef VAVG_DO
#undef VAVG
+#define VABSDU_DO(name, element) \
+void helper_v##name(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b) \
+{ \
+ int i; \
+ \
+ for (i = 0; i < ARRAY_SIZE(r->element); i++) { \
+ r->element[i] = abs(a->element[i] - b->element[i]); \
+ } \
+}
+
+/* VABSDU - Vector absolute difference unsigned
+ * name - instruction mnemonic suffix (b: byte, h: halfword, w: word)
+ * element - element type to access from vector
+ */
+#define VABSDU(type, element) \
+ VABSDU_DO(absdu##type, element)
+VABSDU(b, u8)
+VABSDU(h, u16)
+VABSDU(w, u32)
+#undef VABSDU_DO
+#undef VABSDU
+
#define VCF(suffix, cvt, element) \
void helper_vcf##suffix(CPUPPCState *env, ppc_avr_t *r, \
ppc_avr_t *b, uint32_t uim) \
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index a58aa0c..f4ee05b 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -285,8 +285,17 @@ GEN_VXFORM(vminsh, 1, 13);
GEN_VXFORM(vminsw, 1, 14);
GEN_VXFORM(vminsd, 1, 15);
GEN_VXFORM(vavgub, 1, 16);
+GEN_VXFORM(vabsdub, 1, 16);
+GEN_VXFORM_DUAL(vavgub, PPC_ALTIVEC, PPC_NONE, \
+ vabsdub, PPC_NONE, PPC2_ISA300)
GEN_VXFORM(vavguh, 1, 17);
+GEN_VXFORM(vabsduh, 1, 17);
+GEN_VXFORM_DUAL(vavguh, PPC_ALTIVEC, PPC_NONE, \
+ vabsduh, PPC_NONE, PPC2_ISA300)
GEN_VXFORM(vavguw, 1, 18);
+GEN_VXFORM(vabsduw, 1, 18);
+GEN_VXFORM_DUAL(vavguw, PPC_ALTIVEC, PPC_NONE, \
+ vabsduw, PPC_NONE, PPC2_ISA300)
GEN_VXFORM(vavgsb, 1, 20);
GEN_VXFORM(vavgsh, 1, 21);
GEN_VXFORM(vavgsw, 1, 22);
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index 4d4a62e..abdef6e 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -69,9 +69,9 @@ GEN_VXFORM(vminsb, 1, 12),
GEN_VXFORM(vminsh, 1, 13),
GEN_VXFORM(vminsw, 1, 14),
GEN_VXFORM_207(vminsd, 1, 15),
-GEN_VXFORM(vavgub, 1, 16),
-GEN_VXFORM(vavguh, 1, 17),
-GEN_VXFORM(vavguw, 1, 18),
+GEN_VXFORM_DUAL(vavgub, vabsdub, 1, 16, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_DUAL(vavguh, vabsduh, 1, 17, PPC_ALTIVEC, PPC_NONE),
+GEN_VXFORM_DUAL(vavguw, vabsduw, 1, 18, PPC_ALTIVEC, PPC_NONE),
GEN_VXFORM(vavgsb, 1, 20),
GEN_VXFORM(vavgsh, 1, 21),
GEN_VXFORM(vavgsw, 1, 22),
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
` (3 preceding siblings ...)
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 12:55 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 6/8] target-ppc: add vslv instruction Nikunj A Dadhania
` (2 subsequent siblings)
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh, Swapnil Bokade
From: Swapnil Bokade <bokadeswapnil@gmail.com>
Adds following instructions:
vcmpnezb[.]: Vector Compare Not Equal or Zero Byte
vcmpnezh[.]: Vector Compare Not Equal or Zero Halfword
vcmpnezw[.]: Vector Compare Not Equal or Zero Word
Signed-off-by: Swapnil Bokade <bokadeswapnil@gmail.com>
[ collapse switch case ]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target-ppc/helper.h | 6 ++++++
target-ppc/int_helper.c | 36 ++++++++++++++++++++++++++++++++++++
target-ppc/translate/vmx-impl.c | 3 +++
target-ppc/translate/vmx-ops.c | 9 +++++++++
4 files changed, 54 insertions(+)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index 1e68060..e6ce3ab 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -144,6 +144,9 @@ DEF_HELPER_4(vcmpequb, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpequh, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpequw, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpequd, void, env, avr, avr, avr)
+DEF_HELPER_4(vcmpnezb, void, env, avr, avr, avr)
+DEF_HELPER_4(vcmpnezh, void, env, avr, avr, avr)
+DEF_HELPER_4(vcmpnezw, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpgtub, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpgtuh, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpgtuw, void, env, avr, avr, avr)
@@ -160,6 +163,9 @@ DEF_HELPER_4(vcmpequb_dot, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpequh_dot, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpequw_dot, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpequd_dot, void, env, avr, avr, avr)
+DEF_HELPER_4(vcmpnezb_dot, void, env, avr, avr, avr)
+DEF_HELPER_4(vcmpnezh_dot, void, env, avr, avr, avr)
+DEF_HELPER_4(vcmpnezw_dot, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpgtub_dot, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpgtuh_dot, void, env, avr, avr, avr)
DEF_HELPER_4(vcmpgtuw_dot, void, env, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index 2b375b4..fd5aa94 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -717,6 +717,42 @@ VCMP(gtsd, >, s64)
#undef VCMP_DO
#undef VCMP
+#define VCMPNEZ_DO(suffix, element, record) \
+void helper_vcmpnez##suffix(CPUPPCState *env, ppc_avr_t *r, \
+ ppc_avr_t *a, ppc_avr_t *b) \
+{ \
+ uint64_t ones = (uint64_t)-1; \
+ uint64_t all = ones; \
+ uint64_t none = 0; \
+ int i; \
+ \
+ for (i = 0; i < ARRAY_SIZE(r->element); i++) { \
+ uint64_t result = ((a->element[i] == 0) \
+ || (b->element[i] == 0) \
+ || (a->element[i] != b->element[i]) ? \
+ ones : 0x0); \
+ r->element[i] = result; \
+ all &= result; \
+ none |= result; \
+ } \
+ if (record) { \
+ env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1); \
+ } \
+}
+
+/* VCMPNEZ - Vector compare not equal to zero
+ * suffix - instruction mnemonic suffix (b: byte, h: halfword, w: word)
+ * element - element type to access from vector
+ */
+#define VCMPNEZ(suffix, element) \
+ VCMPNEZ_DO(suffix, element, 0) \
+ VCMPNEZ_DO(suffix##_dot, element, 1)
+VCMPNEZ(b, u8)
+VCMPNEZ(h, u16)
+VCMPNEZ(w, u32)
+#undef VCMPNEZ_DO
+#undef VCMPNEZ
+
#define VCMPFP_DO(suffix, compare, order, record) \
void helper_vcmp##suffix(CPUPPCState *env, ppc_avr_t *r, \
ppc_avr_t *a, ppc_avr_t *b) \
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index f4ee05b..da11632 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -477,6 +477,9 @@ GEN_VXRFORM(vcmpequb, 3, 0)
GEN_VXRFORM(vcmpequh, 3, 1)
GEN_VXRFORM(vcmpequw, 3, 2)
GEN_VXRFORM(vcmpequd, 3, 3)
+GEN_VXRFORM(vcmpnezb, 3, 4)
+GEN_VXRFORM(vcmpnezh, 3, 5)
+GEN_VXRFORM(vcmpnezw, 3, 6)
GEN_VXRFORM(vcmpgtsb, 3, 12)
GEN_VXRFORM(vcmpgtsh, 3, 13)
GEN_VXRFORM(vcmpgtsw, 3, 14)
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index abdef6e..9f99118 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -160,12 +160,21 @@ GEN_VXFORM(vminfp, 5, 17),
#define GEN_VXRFORM1(opname, name, str, opc2, opc3) \
GEN_HANDLER2(name, str, 0x4, opc2, opc3, 0x00000000, PPC_ALTIVEC),
+#define GEN_VXRFORM1_300(opname, name, str, opc2, opc3) \
+GEN_HANDLER2_E(name, str, 0x4, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300),
#define GEN_VXRFORM(name, opc2, opc3) \
GEN_VXRFORM1(name, name, #name, opc2, opc3) \
GEN_VXRFORM1(name##_dot, name##_, #name ".", opc2, (opc3 | (0x1 << 4)))
+#define GEN_VXRFORM_300(name, opc2, opc3) \
+ GEN_VXRFORM1_300(name, name, #name, opc2, opc3) \
+ GEN_VXRFORM1_300(name##_dot, name##_, #name ".", opc2, (opc3 | (0x1 << 4)))
+
GEN_VXRFORM(vcmpequb, 3, 0)
GEN_VXRFORM(vcmpequh, 3, 1)
GEN_VXRFORM(vcmpequw, 3, 2)
+GEN_VXRFORM_300(vcmpnezb, 3, 4)
+GEN_VXRFORM_300(vcmpnezh, 3, 5)
+GEN_VXRFORM_300(vcmpnezw, 3, 6)
GEN_VXRFORM(vcmpgtsb, 3, 12)
GEN_VXRFORM(vcmpgtsh, 3, 13)
GEN_VXRFORM(vcmpgtsw, 3, 14)
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 6/8] target-ppc: add vslv instruction
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
` (4 preceding siblings ...)
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 13:00 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 7/8] target-ppc: add vsrv instruction Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction Nikunj A Dadhania
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh, Vivek Andrew Sha
From: Vivek Andrew Sha <vivekandrewsha@gmail.com>
vslv: Vector Shift Left Variable
Signed-off-by: Vivek Andrew Sha <vivekandrewsha@gmail.com>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
target-ppc/helper.h | 1 +
target-ppc/int_helper.c | 14 ++++++++++++++
target-ppc/translate/vmx-impl.c | 1 +
target-ppc/translate/vmx-ops.c | 4 ++++
4 files changed, 20 insertions(+)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index e6ce3ab..d4c060b 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -211,6 +211,7 @@ DEF_HELPER_3(vslw, void, avr, avr, avr)
DEF_HELPER_3(vsld, void, avr, avr, avr)
DEF_HELPER_3(vslo, void, avr, avr, avr)
DEF_HELPER_3(vsro, void, avr, avr, avr)
+DEF_HELPER_3(vslv, void, avr, avr, avr)
DEF_HELPER_3(vaddcuw, void, avr, avr, avr)
DEF_HELPER_3(vsubcuw, void, avr, avr, avr)
DEF_HELPER_2(lvsl, void, avr, tl)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index fd5aa94..ad1a21f 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1694,6 +1694,20 @@ VSL(w, u32, 0x1F)
VSL(d, u64, 0x3F)
#undef VSL
+void helper_vslv(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+ int i;
+ unsigned int shift, bytes, size;
+
+ size = ARRAY_SIZE(r->u8);
+ for (i = 0; i < size; i++) {
+ shift = b->u8[i] & 0x7; /* extract shift value */
+ bytes = (a->u8[i] << 8) + /* extract adjacent bytes */
+ (((i + 1) < size) ? a->u8[i + 1] : 0);
+ r->u8[i] = (bytes << shift) >> 8; /* shift and store result */
+ }
+}
+
void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift)
{
int sh = shift & 0xf;
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index da11632..5844a7e 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -367,6 +367,7 @@ GEN_VXFORM(vsrab, 2, 12);
GEN_VXFORM(vsrah, 2, 13);
GEN_VXFORM(vsraw, 2, 14);
GEN_VXFORM(vsrad, 2, 15);
+GEN_VXFORM(vslv, 2, 29);
GEN_VXFORM(vslo, 6, 16);
GEN_VXFORM(vsro, 6, 17);
GEN_VXFORM(vaddcuw, 0, 6);
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index 9f99118..372ede0 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -38,6 +38,9 @@ GEN_HANDLER(name, 0x04, opc2, opc3, 0x00000000, PPC_ALTIVEC)
#define GEN_VXFORM_207(name, opc2, opc3) \
GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ALTIVEC_207)
+#define GEN_VXFORM_300(name, opc2, opc3) \
+GEN_HANDLER_E(name, 0x04, opc2, opc3, 0x00000000, PPC_NONE, PPC2_ISA300)
+
#define GEN_VXFORM_DUAL(name0, name1, opc2, opc3, type0, type1) \
GEN_HANDLER_E(name0##_##name1, 0x4, opc2, opc3, 0x00000000, type0, type1)
@@ -107,6 +110,7 @@ GEN_VXFORM(vsrab, 2, 12),
GEN_VXFORM(vsrah, 2, 13),
GEN_VXFORM(vsraw, 2, 14),
GEN_VXFORM_207(vsrad, 2, 15),
+GEN_VXFORM_300(vslv, 2, 29),
GEN_VXFORM(vslo, 6, 16),
GEN_VXFORM(vsro, 6, 17),
GEN_VXFORM(vaddcuw, 0, 6),
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 7/8] target-ppc: add vsrv instruction
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
` (5 preceding siblings ...)
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 6/8] target-ppc: add vslv instruction Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 13:01 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction Nikunj A Dadhania
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth
Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh, Vivek Andrew Sha
From: Vivek Andrew Sha <vivekandrewsha@gmail.com>
Adds Vector Shift Right Variable instruction.
Signed-off-by: Vivek Andrew Sha <vivekandrewsha@gmail.com>
[ reverse the order of computation to avoid temporary array ]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
---
target-ppc/helper.h | 1 +
target-ppc/int_helper.c | 17 +++++++++++++++++
target-ppc/translate/vmx-impl.c | 1 +
target-ppc/translate/vmx-ops.c | 1 +
4 files changed, 20 insertions(+)
diff --git a/target-ppc/helper.h b/target-ppc/helper.h
index d4c060b..93ac9e1 100644
--- a/target-ppc/helper.h
+++ b/target-ppc/helper.h
@@ -211,6 +211,7 @@ DEF_HELPER_3(vslw, void, avr, avr, avr)
DEF_HELPER_3(vsld, void, avr, avr, avr)
DEF_HELPER_3(vslo, void, avr, avr, avr)
DEF_HELPER_3(vsro, void, avr, avr, avr)
+DEF_HELPER_3(vsrv, void, avr, avr, avr)
DEF_HELPER_3(vslv, void, avr, avr, avr)
DEF_HELPER_3(vaddcuw, void, avr, avr, avr)
DEF_HELPER_3(vsubcuw, void, avr, avr, avr)
diff --git a/target-ppc/int_helper.c b/target-ppc/int_helper.c
index ad1a21f..5e031c1 100644
--- a/target-ppc/int_helper.c
+++ b/target-ppc/int_helper.c
@@ -1708,6 +1708,23 @@ void helper_vslv(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
}
}
+void helper_vsrv(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b)
+{
+ int i;
+ unsigned int shift, bytes;
+
+ /* Use reverse order, as destination and source register can be same. Its
+ * being modified in place saving temporary, reverse order will guarantee
+ * that computed result is not fed back.
+ */
+ for (i = ARRAY_SIZE(r->u8) - 1; i >= 0; i--) {
+ shift = b->u8[i] & 0x7; /* extract shift value */
+ bytes = ((i ? a->u8[i - 1] : 0) << 8) + a->u8[i];
+ /* extract adjacent bytes */
+ r->u8[i] = (bytes >> shift) & 0xFF; /* shift and store result */
+ }
+}
+
void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift)
{
int sh = shift & 0xf;
diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c
index 5844a7e..ac78caf 100644
--- a/target-ppc/translate/vmx-impl.c
+++ b/target-ppc/translate/vmx-impl.c
@@ -367,6 +367,7 @@ GEN_VXFORM(vsrab, 2, 12);
GEN_VXFORM(vsrah, 2, 13);
GEN_VXFORM(vsraw, 2, 14);
GEN_VXFORM(vsrad, 2, 15);
+GEN_VXFORM(vsrv, 2, 28);
GEN_VXFORM(vslv, 2, 29);
GEN_VXFORM(vslo, 6, 16);
GEN_VXFORM(vsro, 6, 17);
diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c
index 372ede0..53e2806 100644
--- a/target-ppc/translate/vmx-ops.c
+++ b/target-ppc/translate/vmx-ops.c
@@ -110,6 +110,7 @@ GEN_VXFORM(vsrab, 2, 12),
GEN_VXFORM(vsrah, 2, 13),
GEN_VXFORM(vsraw, 2, 14),
GEN_VXFORM_207(vsrad, 2, 15),
+GEN_VXFORM_300(vsrv, 2, 28),
GEN_VXFORM_300(vslv, 2, 29),
GEN_VXFORM(vslo, 6, 16),
GEN_VXFORM(vsro, 6, 17),
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
` (6 preceding siblings ...)
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 7/8] target-ppc: add vsrv instruction Nikunj A Dadhania
@ 2016-07-28 6:49 ` Nikunj A Dadhania
2016-07-28 13:04 ` Richard Henderson
7 siblings, 1 reply; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 6:49 UTC (permalink / raw)
To: qemu-ppc, david, rth; +Cc: qemu-devel, nikunj, bharata, aneesh.kumar, benh
extswsli : Extend Sign Word & Shift Left Immediate
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
target-ppc/translate.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 82349ed..8d25121 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -2328,6 +2328,32 @@ static void gen_sradi1(DisasContext *ctx)
gen_sradi(ctx, 1);
}
+/* extswsli & extswsli. */
+static inline void gen_extswsli(DisasContext *ctx, int n)
+{
+ int sh = SH(ctx->opcode) + (n << 5);
+ TCGv dst = cpu_gpr[rA(ctx->opcode)];
+ TCGv src = cpu_gpr[rS(ctx->opcode)];
+
+ tcg_gen_ext32s_tl(dst, src);
+ if (sh != 0) {
+ tcg_gen_shli_tl(dst, dst, sh);
+ }
+ if (unlikely(Rc(ctx->opcode) != 0)) {
+ gen_set_Rc0(ctx, dst);
+ }
+}
+
+static void gen_extswsli0(DisasContext *ctx)
+{
+ gen_extswsli(ctx, 0);
+}
+
+static void gen_extswsli1(DisasContext *ctx)
+{
+ gen_extswsli(ctx, 1);
+}
+
/* srd & srd. */
static void gen_srd(DisasContext *ctx)
{
@@ -6227,6 +6253,10 @@ GEN_HANDLER(srad, 0x1F, 0x1A, 0x18, 0x00000000, PPC_64B),
GEN_HANDLER2(sradi0, "sradi", 0x1F, 0x1A, 0x19, 0x00000000, PPC_64B),
GEN_HANDLER2(sradi1, "sradi", 0x1F, 0x1B, 0x19, 0x00000000, PPC_64B),
GEN_HANDLER(srd, 0x1F, 0x1B, 0x10, 0x00000000, PPC_64B),
+GEN_HANDLER2_E(extswsli0, "extswsli", 0x1F, 0x1A, 0x1B, 0x00000000,
+ PPC_NONE, PPC2_ISA300),
+GEN_HANDLER2_E(extswsli1, "extswsli", 0x1F, 0x1B, 0x1B, 0x00000000,
+ PPC_NONE, PPC2_ISA300),
#endif
#if defined(TARGET_PPC64)
GEN_HANDLER(ld, 0x3A, 0xFF, 0xFF, 0x00000000, PPC_64B),
--
2.7.4
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.]
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.] Nikunj A Dadhania
@ 2016-07-28 12:44 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 12:44 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> While implementing modulo instructions figured out that the
> implementation uses many branches. Change the logic to achieve the
> branch-less code. Undefined value is set to dividend in case of invalid
> input.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> target-ppc/translate.c | 48 +++++++++++++++++++++++-------------------------
> 1 file changed, 23 insertions(+), 25 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 2/8] target-ppc: implement branch-less divd[o][.]
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 2/8] target-ppc: implement branch-less divd[o][.] Nikunj A Dadhania
@ 2016-07-28 12:45 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 12:45 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> Similar to divw, implement branch-less divd.
>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> target-ppc/translate.c | 48 ++++++++++++++++++++++++++----------------------
> 1 file changed, 26 insertions(+), 22 deletions(-)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions Nikunj A Dadhania
@ 2016-07-28 12:52 ` Richard Henderson
2016-07-28 17:21 ` Nikunj A Dadhania
2016-07-29 3:46 ` David Gibson
0 siblings, 2 replies; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 12:52 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh, Sandipan Das
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> + r->element[i] = abs(a->element[i] - b->element[i]); \
> + } \
> +}
> +
> +/* VABSDU - Vector absolute difference unsigned
> + * name - instruction mnemonic suffix (b: byte, h: halfword, w: word)
> + * element - element type to access from vector
> + */
> +#define VABSDU(type, element) \
> + VABSDU_DO(absdu##type, element)
> +VABSDU(b, u8)
> +VABSDU(h, u16)
> +VABSDU(w, u32)
From whence are you receiving this abs definition, and how do you expect it to
work with an unsigned input?
I can only imagine you're getting abs(3), aka int abs(int), from stdlib.h.
Which technically does work post-arithmetic promotion for u8 and u16, but it
does not for u32.
I think we'd prefer an explicit (a > b ? a - b : b - a).
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions Nikunj A Dadhania
@ 2016-07-28 12:55 ` Richard Henderson
2016-07-28 17:31 ` Nikunj A Dadhania
0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 12:55 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh, Swapnil Bokade
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> +#define VCMPNEZ_DO(suffix, element, record) \
> +void helper_vcmpnez##suffix(CPUPPCState *env, ppc_avr_t *r, \
> + ppc_avr_t *a, ppc_avr_t *b) \
> +{ \
> + uint64_t ones = (uint64_t)-1; \
> + uint64_t all = ones; \
> + uint64_t none = 0; \
> + int i; \
> + \
> + for (i = 0; i < ARRAY_SIZE(r->element); i++) { \
> + uint64_t result = ((a->element[i] == 0) \
> + || (b->element[i] == 0) \
> + || (a->element[i] != b->element[i]) ? \
> + ones : 0x0); \
Don't you have the proper type to use, as opposed to widening everything to
uint64_t? I would guess element##_t would do the job.
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 6/8] target-ppc: add vslv instruction
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 6/8] target-ppc: add vslv instruction Nikunj A Dadhania
@ 2016-07-28 13:00 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 13:00 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh, Vivek Andrew Sha
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> From: Vivek Andrew Sha <vivekandrewsha@gmail.com>
>
> vslv: Vector Shift Left Variable
>
> Signed-off-by: Vivek Andrew Sha <vivekandrewsha@gmail.com>
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
> target-ppc/helper.h | 1 +
> target-ppc/int_helper.c | 14 ++++++++++++++
> target-ppc/translate/vmx-impl.c | 1 +
> target-ppc/translate/vmx-ops.c | 4 ++++
> 4 files changed, 20 insertions(+)
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 7/8] target-ppc: add vsrv instruction
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 7/8] target-ppc: add vsrv instruction Nikunj A Dadhania
@ 2016-07-28 13:01 ` Richard Henderson
0 siblings, 0 replies; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 13:01 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh, Vivek Andrew Sha
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> From: Vivek Andrew Sha <vivekandrewsha@gmail.com>
>
> Adds Vector Shift Right Variable instruction.
>
> Signed-off-by: Vivek Andrew Sha <vivekandrewsha@gmail.com>
> [ reverse the order of computation to avoid temporary array ]
> Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> ---
> target-ppc/helper.h | 1 +
> target-ppc/int_helper.c | 17 +++++++++++++++++
> target-ppc/translate/vmx-impl.c | 1 +
> target-ppc/translate/vmx-ops.c | 1 +
> 4 files changed, 20 inserti
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction Nikunj A Dadhania
@ 2016-07-28 13:04 ` Richard Henderson
2016-07-28 17:33 ` Nikunj A Dadhania
0 siblings, 1 reply; 20+ messages in thread
From: Richard Henderson @ 2016-07-28 13:04 UTC (permalink / raw)
To: Nikunj A Dadhania, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh
On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> + tcg_gen_ext32s_tl(dst, src);
> + if (sh != 0) {
> + tcg_gen_shli_tl(dst, dst, sh);
> + }
You need not test for sh != 0, since that will be done in tcg_gen_shli_tl.
Otherwise,
Reviewed-by: Richard Henderson <rth@twiddle.net>
r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions
2016-07-28 12:52 ` Richard Henderson
@ 2016-07-28 17:21 ` Nikunj A Dadhania
2016-07-29 3:46 ` David Gibson
1 sibling, 0 replies; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 17:21 UTC (permalink / raw)
To: Richard Henderson, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh, Sandipan Das
Richard Henderson <rth@twiddle.net> writes:
> On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
>> + r->element[i] = abs(a->element[i] - b->element[i]); \
>> + } \
>> +}
>> +
>> +/* VABSDU - Vector absolute difference unsigned
>> + * name - instruction mnemonic suffix (b: byte, h: halfword, w: word)
>> + * element - element type to access from vector
>> + */
>> +#define VABSDU(type, element) \
>> + VABSDU_DO(absdu##type, element)
>> +VABSDU(b, u8)
>> +VABSDU(h, u16)
>> +VABSDU(w, u32)
>
> From whence are you receiving this abs definition, and how do you expect it to
> work with an unsigned input?
>
> I can only imagine you're getting abs(3), aka int abs(int), from stdlib.h.
> Which technically does work post-arithmetic promotion for u8 and u16, but it
> does not for u32.
Thanks for pointing that, wasn't aware about that.
> I think we'd prefer an explicit (a > b ? a - b : b - a).
That is what ISA also says, i thought of using library function abs().
Regards
Nikunj
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions
2016-07-28 12:55 ` Richard Henderson
@ 2016-07-28 17:31 ` Nikunj A Dadhania
0 siblings, 0 replies; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 17:31 UTC (permalink / raw)
To: Richard Henderson, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh, Swapnil Bokade
Richard Henderson <rth@twiddle.net> writes:
> On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
>> +#define VCMPNEZ_DO(suffix, element, record) \
>> +void helper_vcmpnez##suffix(CPUPPCState *env, ppc_avr_t *r, \
>> + ppc_avr_t *a, ppc_avr_t *b) \
>> +{ \
>> + uint64_t ones = (uint64_t)-1; \
>> + uint64_t all = ones; \
>> + uint64_t none = 0; \
>> + int i; \
>> + \
>> + for (i = 0; i < ARRAY_SIZE(r->element); i++) { \
>> + uint64_t result = ((a->element[i] == 0) \
>> + || (b->element[i] == 0) \
>> + || (a->element[i] != b->element[i]) ? \
>> + ones : 0x0); \
>
> Don't you have the proper type to use, as opposed to widening everything to
> uint64_t?
And then truncating them as well. One option could be passing the
element type in the macro.
#define VCMPNEZ_DO(suffix, element, etype, record) \
void helper_vcmpnez##suffix(CPUPPCState *env, ppc_avr_t *r, \
ppc_avr_t *a, ppc_avr_t *b) \
{ \
etype ones = (etype)-1; \
etype all = ones; \
etype none = 0; \
int i; \
\
for (i = 0; i < ARRAY_SIZE(r->element); i++) { \
etype result = ((a->element[i] == 0) \
|| (b->element[i] == 0) \
|| (a->element[i] != b->element[i]) ? \
ones : 0x0); \
r->element[i] = result; \
all &= result; \
none |= result; \
} \
if (record) { \
env->crf[6] = ((all != 0) << 3) | ((none == 0) << 1); \
} \
}
/* VCMPNEZ - Vector compare not equal to zero
* suffix - instruction mnemonic suffix (b: byte, h: halfword, w: word)
* element - element type to access from vector
*/
#define VCMPNEZ(suffix, element, etype) \
VCMPNEZ_DO(suffix, element, etype, 0) \
VCMPNEZ_DO(suffix##_dot, element, etype, 1)
VCMPNEZ(b, u8, uint8_t)
VCMPNEZ(h, u16, uint16_t)
VCMPNEZ(w, u32, uint32_t)
#undef VCMPNEZ_DO
#undef VCMPNEZ
> I would guess element##_t would do the job.
That would be u32_t, we would need uint32_t
Regards
Nikunj
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction
2016-07-28 13:04 ` Richard Henderson
@ 2016-07-28 17:33 ` Nikunj A Dadhania
0 siblings, 0 replies; 20+ messages in thread
From: Nikunj A Dadhania @ 2016-07-28 17:33 UTC (permalink / raw)
To: Richard Henderson, qemu-ppc, david
Cc: qemu-devel, bharata, aneesh.kumar, benh
Richard Henderson <rth@twiddle.net> writes:
> On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
>> + tcg_gen_ext32s_tl(dst, src);
>> + if (sh != 0) {
>> + tcg_gen_shli_tl(dst, dst, sh);
>> + }
>
> You need not test for sh != 0, since that will be done in
> tcg_gen_shli_tl.
Sure, will updated.
> Otherwise,
>
> Reviewed-by: Richard Henderson <rth@twiddle.net>
>
>
> r~
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions
2016-07-28 12:52 ` Richard Henderson
2016-07-28 17:21 ` Nikunj A Dadhania
@ 2016-07-29 3:46 ` David Gibson
1 sibling, 0 replies; 20+ messages in thread
From: David Gibson @ 2016-07-29 3:46 UTC (permalink / raw)
To: Richard Henderson
Cc: Nikunj A Dadhania, qemu-ppc, qemu-devel, bharata, aneesh.kumar,
benh, Sandipan Das
[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]
On Thu, Jul 28, 2016 at 06:22:05PM +0530, Richard Henderson wrote:
> On 07/28/2016 12:19 PM, Nikunj A Dadhania wrote:
> > + r->element[i] = abs(a->element[i] - b->element[i]); \
> > + } \
> > +}
> > +
> > +/* VABSDU - Vector absolute difference unsigned
> > + * name - instruction mnemonic suffix (b: byte, h: halfword, w: word)
> > + * element - element type to access from vector
> > + */
> > +#define VABSDU(type, element) \
> > + VABSDU_DO(absdu##type, element)
> > +VABSDU(b, u8)
> > +VABSDU(h, u16)
> > +VABSDU(w, u32)
>
> From whence are you receiving this abs definition, and how do you expect it
> to work with an unsigned input?
>
> I can only imagine you're getting abs(3), aka int abs(int), from stdlib.h.
> Which technically does work post-arithmetic promotion for u8 and u16, but it
> does not for u32.
So, I noticed this and was also concerned, but I more or less
convinced myself that it would still work, by the magic of 2's
complement, as long as sizeof(int) >= 4.
Maybe I'm wrong, though.
> I think we'd prefer an explicit (a > b ? a - b : b - a).
That probably is easier to follow, though.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2016-07-29 4:11 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-28 6:49 [Qemu-devel] [PATCH v1 0/8] POWER9 TCG enablements - part2 Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 1/8] target-ppc: implement branch-less divw[o][.] Nikunj A Dadhania
2016-07-28 12:44 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 2/8] target-ppc: implement branch-less divd[o][.] Nikunj A Dadhania
2016-07-28 12:45 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 3/8] target-ppc: add dtstsfi[q] instructions Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 4/8] target-ppc: add vabsdu[b, h, w] instructions Nikunj A Dadhania
2016-07-28 12:52 ` Richard Henderson
2016-07-28 17:21 ` Nikunj A Dadhania
2016-07-29 3:46 ` David Gibson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 5/8] target-ppc: add vcmpnez[b, h, w][.] instructions Nikunj A Dadhania
2016-07-28 12:55 ` Richard Henderson
2016-07-28 17:31 ` Nikunj A Dadhania
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 6/8] target-ppc: add vslv instruction Nikunj A Dadhania
2016-07-28 13:00 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 7/8] target-ppc: add vsrv instruction Nikunj A Dadhania
2016-07-28 13:01 ` Richard Henderson
2016-07-28 6:49 ` [Qemu-devel] [PATCH v1 8/8] target-ppc: add extswsli[.] instruction Nikunj A Dadhania
2016-07-28 13:04 ` Richard Henderson
2016-07-28 17:33 ` Nikunj A Dadhania
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).