* [PATCH v1 00/46] Add LoongArch LASX instructions
@ 2023-06-20 9:37 Song Gao
2023-06-20 9:37 ` [PATCH v1 01/46] target/loongarch: Add LASX data type XReg Song Gao
` (45 more replies)
0 siblings, 46 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
Hi,
This series adds LoongArch LASX instructions.
About test:
We use RISU test the LoongArch LASX instructions.
QEMU:
https://github.com/loongson/qemu/tree/tcg-old-abi-support-lasx
RISU:
https://github.com/loongson/risu/tree/loongarch-suport-lasx
Please review, Thanks.
Song Gao (46):
target/loongarch: Add LASX data type XReg
target/loongarch: meson.build support build LASX
target/loongarch: Add CHECK_ASXE maccro for check LASX enable
target/loongarch: Implement xvadd/xvsub
target/loongarch: Implement xvreplgr2vr
target/loongarch: Implement xvaddi/xvsubi
target/loongarch: Implement xvneg
target/loongarch: Implement xvsadd/xvssub
target/loongarch: Implement xvhaddw/xvhsubw
target/loongarch: Implement xvaddw/xvsubw
target/loongarch: Implement xavg/xvagr
target/loongarch: Implement xvabsd
target/loongarch: Implement xvadda
target/loongarch: Implement xvmax/xvmin
target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od}
target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}
target/loongarch; Implement xvdiv/xvmod
target/loongarch: Implement xvsat
target/loongarch: Implement xvexth
target/loongarch: Implement vext2xv
target/loongarch: Implement xvsigncov
target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
target/loognarch: Implement xvldi
target/loongarch: Implement LASX logic instructions
target/loongarch: Implement xvsll xvsrl xvsra xvrotr
target/loongarch: Implement xvsllwil xvextl
target/loongarch: Implement xvsrlr xvsrar
target/loongarch: Implement xvsrln xvsran
target/loongarch: Implement xvsrlrn xvsrarn
target/loongarch: Implement xvssrln xvssran
target/loongarch: Implement xvssrlrn xvssrarn
target/loongarch: Implement xvclo xvclz
target/loongarch: Implement xvpcnt
target/loongarch: Implement xvbitclr xvbitset xvbitrev
target/loongarch: Implement xvfrstp
target/loongarch: Implement LASX fpu arith instructions
target/loongarch: Implement LASX fpu fcvt instructions
target/loongarch: Implement xvseq xvsle xvslt
target/loongarch: Implement xvfcmp
target/loongarch: Implement xvbitsel xvset
target/loongarch: Implement xvinsgr2vr xvpickve2gr
target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v
target/loongarch: Implement xvpack xvpick xvilv{l/h}
target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins
target/loongarch: Implement xvld xvst
target/loongarch: CPUCFG support LASX
linux-user/loongarch64/signal.c | 1 +
target/loongarch/cpu.c | 4 +
target/loongarch/cpu.h | 16 +
target/loongarch/disas.c | 924 +++++
target/loongarch/gdbstub.c | 1 +
target/loongarch/helper.h | 592 ++++
target/loongarch/insn_trans/trans_lasx.c.inc | 3203 +++++++++++++++++
target/loongarch/insns.decode | 828 +++++
target/loongarch/internals.h | 22 -
target/loongarch/lasx_helper.c | 3221 ++++++++++++++++++
target/loongarch/lsx_helper.c | 111 +-
target/loongarch/machine.c | 40 +-
target/loongarch/meson.build | 1 +
target/loongarch/translate.c | 18 +
target/loongarch/vec.h | 125 +
15 files changed, 9006 insertions(+), 101 deletions(-)
create mode 100644 target/loongarch/insn_trans/trans_lasx.c.inc
create mode 100644 target/loongarch/lasx_helper.c
create mode 100644 target/loongarch/vec.h
--
2.39.1
^ permalink raw reply [flat|nested] 54+ messages in thread
* [PATCH v1 01/46] target/loongarch: Add LASX data type XReg
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 12:09 ` Richard Henderson
2023-06-20 9:37 ` [PATCH v1 02/46] target/loongarch: meson.build support build LASX Song Gao
` (44 subsequent siblings)
45 siblings, 1 reply; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
linux-user/loongarch64/signal.c | 1 +
target/loongarch/cpu.c | 1 +
target/loongarch/cpu.h | 14 +++++++++
target/loongarch/gdbstub.c | 1 +
target/loongarch/internals.h | 22 --------------
target/loongarch/lsx_helper.c | 1 +
target/loongarch/machine.c | 40 ++++++++++++++++++++++++--
target/loongarch/vec.h | 51 +++++++++++++++++++++++++++++++++
8 files changed, 106 insertions(+), 25 deletions(-)
create mode 100644 target/loongarch/vec.h
diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index bb8efb1172..39572c1190 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -12,6 +12,7 @@
#include "linux-user/trace.h"
#include "target/loongarch/internals.h"
+#include "target/loongarch/vec.h"
/* FP context was used */
#define SC_USED_FP (1 << 0)
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index ad93ecac92..5037cfc02c 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -18,6 +18,7 @@
#include "cpu-csr.h"
#include "sysemu/reset.h"
#include "tcg/tcg.h"
+#include "vec.h"
const char * const regnames[32] = {
"r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index b23f38c3d5..347950b4d0 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -259,9 +259,23 @@ typedef union VReg {
Int128 Q[LSX_LEN / 128];
}VReg;
+#define LASX_LEN (256)
+typedef union XReg {
+ int8_t XB[LASX_LEN / 8];
+ int16_t XH[LASX_LEN / 16];
+ int32_t XW[LASX_LEN / 32];
+ int64_t XD[LASX_LEN / 64];
+ uint8_t UXB[LASX_LEN / 8];
+ uint16_t UXH[LASX_LEN / 16];
+ uint32_t UXW[LASX_LEN / 32];
+ uint64_t UXD[LASX_LEN / 64];
+ Int128 XQ[LASX_LEN / 128];
+} XReg;
+
typedef union fpr_t fpr_t;
union fpr_t {
VReg vreg;
+ XReg xreg;
};
struct LoongArchTLB {
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index 0752fff924..94c427f4da 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -11,6 +11,7 @@
#include "internals.h"
#include "exec/gdbstub.h"
#include "gdbstub/helpers.h"
+#include "vec.h"
uint64_t read_fcc(CPULoongArchState *env)
{
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index 7b0f29c942..c492863cc5 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -21,28 +21,6 @@
/* Global bit for huge page */
#define LOONGARCH_HGLOBAL_SHIFT 12
-#if HOST_BIG_ENDIAN
-#define B(x) B[15 - (x)]
-#define H(x) H[7 - (x)]
-#define W(x) W[3 - (x)]
-#define D(x) D[1 - (x)]
-#define UB(x) UB[15 - (x)]
-#define UH(x) UH[7 - (x)]
-#define UW(x) UW[3 - (x)]
-#define UD(x) UD[1 -(x)]
-#define Q(x) Q[x]
-#else
-#define B(x) B[x]
-#define H(x) H[x]
-#define W(x) W[x]
-#define D(x) D[x]
-#define UB(x) UB[x]
-#define UH(x) UH[x]
-#define UW(x) UW[x]
-#define UD(x) UD[x]
-#define Q(x) Q[x]
-#endif
-
void loongarch_translate_init(void);
void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9571f0aef0..b231a2798b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -12,6 +12,7 @@
#include "fpu/softfloat.h"
#include "internals.h"
#include "tcg/tcg.h"
+#include "vec.h"
#define DO_ADD(a, b) (a + b)
#define DO_SUB(a, b) (a - b)
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index d8ac99c9a4..3fbf68d7ff 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -8,7 +8,7 @@
#include "qemu/osdep.h"
#include "cpu.h"
#include "migration/cpu.h"
-#include "internals.h"
+#include "vec.h"
static const VMStateDescription vmstate_fpu_reg = {
.name = "fpu_reg",
@@ -76,6 +76,39 @@ static const VMStateDescription vmstate_lsx = {
},
};
+static const VMStateDescription vmstate_lasxh_reg = {
+ .name = "lasxh_reg",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .fields = (VMStateField[]) {
+ VMSTATE_UINT64(UXD(2), XReg),
+ VMSTATE_UINT64(UXD(3), XReg),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
+#define VMSTATE_LASXH_REGS(_field, _state, _start) \
+ VMSTATE_STRUCT_SUB_ARRAY(_field, _state, _start, 32, 0, \
+ vmstate_lasxh_reg, fpr_t)
+
+static bool lasx_needed(void *opaque)
+{
+ LoongArchCPU *cpu = opaque;
+
+ return FIELD_EX64(cpu->env.cpucfg[2], CPUCFG2, LASX);
+}
+
+static const VMStateDescription vmstate_lasx = {
+ .name = "cpu/lasx",
+ .version_id = 1,
+ .minimum_version_id = 1,
+ .needed = lasx_needed,
+ .fields = (VMStateField[]) {
+ VMSTATE_LASXH_REGS(env.fpr, LoongArchCPU, 0),
+ VMSTATE_END_OF_LIST()
+ },
+};
+
/* TLB state */
const VMStateDescription vmstate_tlb = {
.name = "cpu/tlb",
@@ -92,8 +125,8 @@ const VMStateDescription vmstate_tlb = {
/* LoongArch CPU state */
const VMStateDescription vmstate_loongarch_cpu = {
.name = "cpu",
- .version_id = 1,
- .minimum_version_id = 1,
+ .version_id = 2,
+ .minimum_version_id = 2,
.fields = (VMStateField[]) {
VMSTATE_UINTTL_ARRAY(env.gpr, LoongArchCPU, 32),
VMSTATE_UINTTL(env.pc, LoongArchCPU),
@@ -163,6 +196,7 @@ const VMStateDescription vmstate_loongarch_cpu = {
.subsections = (const VMStateDescription*[]) {
&vmstate_fpu,
&vmstate_lsx,
+ &vmstate_lasx,
NULL
}
};
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
new file mode 100644
index 0000000000..a89cdb8d45
--- /dev/null
+++ b/target/loongarch/vec.h
@@ -0,0 +1,51 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch vector utilitites
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#ifndef LOONGARCH_VEC_H
+#define LOONGARCH_VEC_H
+
+#if HOST_BIG_ENDIAN
+#define B(x) B[15 - (x)]
+#define H(x) H[7 - (x)]
+#define W(x) W[3 - (x)]
+#define D(x) D[1 - (x)]
+#define UB(x) UB[15 - (x)]
+#define UH(x) UH[7 - (x)]
+#define UW(x) UW[3 - (x)]
+#define UD(x) UD[1 - (x)]
+#define Q(x) Q[x]
+#define XB(x) XB[31 - (x)]
+#define XH(x) XH[15 - (x)]
+#define XW(x) XW[7 - (x)]
+#define XD(x) XD[3 - (x)]
+#define UXB(x) UXB[31 - (x)]
+#define UXH(x) UXH[15 - (x)]
+#define UXW(x) UXW[7 - (x)]
+#define UXD(x) UXD[3 - (x)]
+#define XQ(x) XQ[1 - (x)]
+#else
+#define B(x) B[x]
+#define H(x) H[x]
+#define W(x) W[x]
+#define D(x) D[x]
+#define UB(x) UB[x]
+#define UH(x) UH[x]
+#define UW(x) UW[x]
+#define UD(x) UD[x]
+#define Q(x) Q[x]
+#define XB(x) XB[x]
+#define XH(x) XH[x]
+#define XW(x) XW[x]
+#define XD(x) XD[x]
+#define UXB(x) UXB[x]
+#define UXH(x) UXH[x]
+#define UXW(x) UXW[x]
+#define UXD(x) UXD[x]
+#define XQ(x) XQ[x]
+#endif /* HOST_BIG_ENDIAN */
+
+#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 02/46] target/loongarch: meson.build support build LASX
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
2023-06-20 9:37 ` [PATCH v1 01/46] target/loongarch: Add LASX data type XReg Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
` (43 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/insn_trans/trans_lasx.c.inc | 6 ++++++
target/loongarch/lasx_helper.c | 6 ++++++
target/loongarch/meson.build | 1 +
target/loongarch/translate.c | 1 +
4 files changed, 14 insertions(+)
create mode 100644 target/loongarch/insn_trans/trans_lasx.c.inc
create mode 100644 target/loongarch/lasx_helper.c
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
new file mode 100644
index 0000000000..56a9839255
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LASX translate functions
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
new file mode 100644
index 0000000000..1754790a3a
--- /dev/null
+++ b/target/loongarch/lasx_helper.c
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch LASX helper functions.
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index 1117a51c52..90a5a21977 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -12,6 +12,7 @@ loongarch_tcg_ss.add(files(
'translate.c',
'gdbstub.c',
'lsx_helper.c',
+ 'lasx_helper.c',
))
loongarch_tcg_ss.add(zlib)
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 3146a2d4ac..6bf2d726d6 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -220,6 +220,7 @@ static void set_fpr(int reg_num, TCGv val)
#include "insn_trans/trans_branch.c.inc"
#include "insn_trans/trans_privileged.c.inc"
#include "insn_trans/trans_lsx.c.inc"
+#include "insn_trans/trans_lasx.c.inc"
static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
{
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
2023-06-20 9:37 ` [PATCH v1 01/46] target/loongarch: Add LASX data type XReg Song Gao
2023-06-20 9:37 ` [PATCH v1 02/46] target/loongarch: meson.build support build LASX Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 12:10 ` Richard Henderson
2023-06-20 9:37 ` [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub Song Gao
` (42 subsequent siblings)
45 siblings, 1 reply; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/cpu.c | 2 ++
target/loongarch/cpu.h | 2 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 10 ++++++++++
3 files changed, 14 insertions(+)
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 5037cfc02c..c9f9cbb19d 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -54,6 +54,7 @@ static const char * const excp_names[] = {
[EXCCODE_DBP] = "Debug breakpoint",
[EXCCODE_BCE] = "Bound Check Exception",
[EXCCODE_SXD] = "128 bit vector instructions Disable exception",
+ [EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
};
const char *loongarch_exception_name(int32_t exception)
@@ -189,6 +190,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
case EXCCODE_FPD:
case EXCCODE_FPE:
case EXCCODE_SXD:
+ case EXCCODE_ASXD:
env->CSR_BADV = env->pc;
QEMU_FALLTHROUGH;
case EXCCODE_BCE:
diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 347950b4d0..6e8d247ae0 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -440,6 +440,7 @@ static inline int cpu_mmu_index(CPULoongArchState *env, bool ifetch)
#define HW_FLAGS_CRMD_PG R_CSR_CRMD_PG_MASK /* 0x10 */
#define HW_FLAGS_EUEN_FPE 0x04
#define HW_FLAGS_EUEN_SXE 0x08
+#define HW_FLAGS_EUEN_ASXE 0x10
static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
target_ulong *pc,
@@ -451,6 +452,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState *env,
*flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
*flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
*flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
+ *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, ASXE) * HW_FLAGS_EUEN_ASXE;
}
void loongarch_cpu_list(void);
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 56a9839255..75a77f5dce 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -4,3 +4,13 @@
* Copyright (c) 2023 Loongson Technology Corporation Limited
*/
+#ifndef CONFIG_USER_ONLY
+#define CHECK_ASXE do { \
+ if ((ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0) { \
+ generate_exception(ctx, EXCCODE_ASXD); \
+ return true; \
+ } \
+} while (0)
+#else
+#define CHECK_ASXE
+#endif
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (2 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 12:25 ` Richard Henderson
2023-06-20 9:37 ` [PATCH v1 05/46] target/loongarch: Implement xvreplgr2vr Song Gao
` (41 subsequent siblings)
45 siblings, 1 reply; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVADD.{B/H/W/D/Q};
- XVSUB.{B/H/W/D/Q}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 23 ++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 59 ++++++++++++++++++++
target/loongarch/insns.decode | 23 ++++++++
target/loongarch/translate.c | 17 ++++++
4 files changed, 122 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5c402d944d..696f78c491 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1695,3 +1695,26 @@ INSN_LSX(vstelm_d, vr_ii)
INSN_LSX(vstelm_w, vr_ii)
INSN_LSX(vstelm_h, vr_ii)
INSN_LSX(vstelm_b, vr_ii)
+
+#define INSN_LASX(insn, type) \
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{ \
+ output_##type(ctx, a, #insn); \
+ return true; \
+}
+
+static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk);
+}
+
+INSN_LASX(xvadd_b, xxx)
+INSN_LASX(xvadd_h, xxx)
+INSN_LASX(xvadd_w, xxx)
+INSN_LASX(xvadd_d, xxx)
+INSN_LASX(xvadd_q, xxx)
+INSN_LASX(xvsub_b, xxx)
+INSN_LASX(xvsub_h, xxx)
+INSN_LASX(xvsub_w, xxx)
+INSN_LASX(xvsub_d, xxx)
+INSN_LASX(xvsub_q, xxx)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 75a77f5dce..c918522f96 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -14,3 +14,62 @@
#else
#define CHECK_ASXE
#endif
+
+static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
+ void (*func)(unsigned, uint32_t, uint32_t,
+ uint32_t, uint32_t, uint32_t))
+{
+ uint32_t xd_ofs, xj_ofs, xk_ofs;
+
+ CHECK_ASXE;
+
+ xd_ofs = vec_full_offset(a->xd);
+ xj_ofs = vec_full_offset(a->xj);
+ xk_ofs = vec_full_offset(a->xk);
+
+ func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
+ return true;
+}
+
+TRANS(xvadd_b, gvec_xxx, MO_8, tcg_gen_gvec_add)
+TRANS(xvadd_h, gvec_xxx, MO_16, tcg_gen_gvec_add)
+TRANS(xvadd_w, gvec_xxx, MO_32, tcg_gen_gvec_add)
+TRANS(xvadd_d, gvec_xxx, MO_64, tcg_gen_gvec_add)
+
+#define XVADDSUB_Q(NAME) \
+static bool trans_xv## NAME ##_q(DisasContext *ctx, arg_xxx *a) \
+{ \
+ TCGv_i64 rh, rl, ah, al, bh, bl; \
+ int i; \
+ \
+ CHECK_ASXE; \
+ \
+ rh = tcg_temp_new_i64(); \
+ rl = tcg_temp_new_i64(); \
+ ah = tcg_temp_new_i64(); \
+ al = tcg_temp_new_i64(); \
+ bh = tcg_temp_new_i64(); \
+ bl = tcg_temp_new_i64(); \
+ \
+ for (i = 0; i < 2; i++) { \
+ get_xreg64(ah, a->xj, 1 + i * 2); \
+ get_xreg64(al, a->xj, 0 + i * 2); \
+ get_xreg64(bh, a->xk, 1 + i * 2); \
+ get_xreg64(bl, a->xk, 0 + i * 2); \
+ \
+ tcg_gen_## NAME ##2_i64(rl, rh, al, ah, bl, bh); \
+ \
+ set_xreg64(rh, a->xd, 1 + i * 2); \
+ set_xreg64(rl, a->xd, 0 + i * 2); \
+ } \
+ \
+ return true; \
+}
+
+XVADDSUB_Q(add)
+XVADDSUB_Q(sub)
+
+TRANS(xvsub_b, gvec_xxx, MO_8, tcg_gen_gvec_sub)
+TRANS(xvsub_h, gvec_xxx, MO_16, tcg_gen_gvec_sub)
+TRANS(xvsub_w, gvec_xxx, MO_32, tcg_gen_gvec_sub)
+TRANS(xvsub_d, gvec_xxx, MO_64, tcg_gen_gvec_sub)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c9c3bc2c73..bac1903975 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1296,3 +1296,26 @@ vstelm_d 0011 00010001 0 . ........ ..... ..... @vr_i8i1
vstelm_w 0011 00010010 .. ........ ..... ..... @vr_i8i2
vstelm_h 0011 0001010 ... ........ ..... ..... @vr_i8i3
vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
+
+#
+# LASX Argument sets
+#
+
+&xxx xd xj xk
+
+#
+# LASX Formats
+#
+
+@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
+
+xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
+xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
+xvadd_w 0111 01000000 10110 ..... ..... ..... @xxx
+xvadd_d 0111 01000000 10111 ..... ..... ..... @xxx
+xvadd_q 0111 01010010 11010 ..... ..... ..... @xxx
+xvsub_b 0111 01000000 11000 ..... ..... ..... @xxx
+xvsub_h 0111 01000000 11001 ..... ..... ..... @xxx
+xvsub_w 0111 01000000 11010 ..... ..... ..... @xxx
+xvsub_d 0111 01000000 11011 ..... ..... ..... @xxx
+xvsub_q 0111 01010010 11011 ..... ..... ..... @xxx
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 6bf2d726d6..5300e14815 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -18,6 +18,7 @@
#include "fpu/softfloat.h"
#include "translate.h"
#include "internals.h"
+#include "vec.h"
/* Global register indices */
TCGv cpu_gpr[32], cpu_pc;
@@ -48,6 +49,18 @@ static inline void set_vreg64(TCGv_i64 src, int regno, int index)
offsetof(CPULoongArchState, fpr[regno].vreg.D(index)));
}
+static inline void get_xreg64(TCGv_i64 dest, int regno, int index)
+{
+ tcg_gen_ld_i64(dest, cpu_env,
+ offsetof(CPULoongArchState, fpr[regno].xreg.XD(index)));
+}
+
+static inline void set_xreg64(TCGv_i64 src, int regno, int index)
+{
+ tcg_gen_st_i64(src, cpu_env,
+ offsetof(CPULoongArchState, fpr[regno].xreg.XD(index)));
+}
+
static inline int plus_1(DisasContext *ctx, int x)
{
return x + 1;
@@ -119,6 +132,10 @@ static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
ctx->vl = LSX_LEN;
}
+ if (FIELD_EX64(env->cpucfg[2], CPUCFG2, LASX)) {
+ ctx->vl = LASX_LEN;
+ }
+
ctx->zero = tcg_constant_tl(0);
}
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 05/46] target/loongarch: Implement xvreplgr2vr
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (3 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 06/46] target/loongarch: Implement xvaddi/xvsubi Song Gao
` (40 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVREPLGR2VR.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 10 ++++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 16 ++++++++++++++++
target/loongarch/insns.decode | 8 ++++++++
3 files changed, 34 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 696f78c491..78e1fd19ac 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic)
output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk);
}
+static void output_xr(DisasContext *ctx, arg_xr *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, r%d", a->xd, a->rj);
+}
+
INSN_LASX(xvadd_b, xxx)
INSN_LASX(xvadd_h, xxx)
INSN_LASX(xvadd_w, xxx)
@@ -1718,3 +1723,8 @@ INSN_LASX(xvsub_h, xxx)
INSN_LASX(xvsub_w, xxx)
INSN_LASX(xvsub_d, xxx)
INSN_LASX(xvsub_q, xxx)
+
+INSN_LASX(xvreplgr2vr_b, xr)
+INSN_LASX(xvreplgr2vr_h, xr)
+INSN_LASX(xvreplgr2vr_w, xr)
+INSN_LASX(xvreplgr2vr_d, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index c918522f96..d394a4f40a 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -73,3 +73,19 @@ TRANS(xvsub_b, gvec_xxx, MO_8, tcg_gen_gvec_sub)
TRANS(xvsub_h, gvec_xxx, MO_16, tcg_gen_gvec_sub)
TRANS(xvsub_w, gvec_xxx, MO_32, tcg_gen_gvec_sub)
TRANS(xvsub_d, gvec_xxx, MO_64, tcg_gen_gvec_sub)
+
+static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
+{
+ TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
+
+ CHECK_ASXE;
+
+ tcg_gen_gvec_dup_i64(mop, vec_full_offset(a->xd),
+ 32, ctx->vl / 8, src);
+ return true;
+}
+
+TRANS(xvreplgr2vr_b, gvec_dupx, MO_8)
+TRANS(xvreplgr2vr_h, gvec_dupx, MO_16)
+TRANS(xvreplgr2vr_w, gvec_dupx, MO_32)
+TRANS(xvreplgr2vr_d, gvec_dupx, MO_64)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index bac1903975..2eab7f6a98 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1302,12 +1302,15 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
#
&xxx xd xj xk
+&xr xd rj
+
#
# LASX Formats
#
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
+@xr .... ........ ..... ..... rj:5 xd:5 &xr
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1319,3 +1322,8 @@ xvsub_h 0111 01000000 11001 ..... ..... ..... @xxx
xvsub_w 0111 01000000 11010 ..... ..... ..... @xxx
xvsub_d 0111 01000000 11011 ..... ..... ..... @xxx
xvsub_q 0111 01010010 11011 ..... ..... ..... @xxx
+
+xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
+xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
+xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
+xvreplgr2vr_d 0111 01101001 11110 00011 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 06/46] target/loongarch: Implement xvaddi/xvsubi
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (4 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 05/46] target/loongarch: Implement xvreplgr2vr Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 07/46] target/loongarch: Implement xvneg Song Gao
` (39 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVADDI.{B/H/W/D}U;
- XVSUBI.{B/H/W/D}U.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 14 ++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 37 ++++++++++++++++++++
target/loongarch/insns.decode | 12 ++++++-
3 files changed, 62 insertions(+), 1 deletion(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 78e1fd19ac..7b84766fa8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic)
output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk);
}
+static void output_xx_i(DisasContext *ctx, arg_xx_i *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, x%d, 0x%x", a->xd, a->xj, a->imm);
+}
+
static void output_xr(DisasContext *ctx, arg_xr *a, const char *mnemonic)
{
output(ctx, mnemonic, "x%d, r%d", a->xd, a->rj);
@@ -1724,6 +1729,15 @@ INSN_LASX(xvsub_w, xxx)
INSN_LASX(xvsub_d, xxx)
INSN_LASX(xvsub_q, xxx)
+INSN_LASX(xvaddi_bu, xx_i)
+INSN_LASX(xvaddi_hu, xx_i)
+INSN_LASX(xvaddi_wu, xx_i)
+INSN_LASX(xvaddi_du, xx_i)
+INSN_LASX(xvsubi_bu, xx_i)
+INSN_LASX(xvsubi_hu, xx_i)
+INSN_LASX(xvsubi_wu, xx_i)
+INSN_LASX(xvsubi_du, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index d394a4f40a..a42e92f930 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -31,6 +31,34 @@ static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
return true;
}
+static bool gvec_xx_i(DisasContext *ctx, arg_xx_i *a, MemOp mop,
+ void (*func)(unsigned, uint32_t, uint32_t,
+ int64_t, uint32_t, uint32_t))
+{
+ uint32_t xd_ofs, xj_ofs;
+
+ CHECK_ASXE;
+
+ xd_ofs = vec_full_offset(a->xd);
+ xj_ofs = vec_full_offset(a->xj);
+
+ func(mop, xd_ofs, xj_ofs, a->imm , 32, ctx->vl / 8);
+ return true;
+}
+
+static bool gvec_xsubi(DisasContext *ctx, arg_xx_i *a, MemOp mop)
+{
+ uint32_t xd_ofs, xj_ofs;
+
+ CHECK_ASXE;
+
+ xd_ofs = vec_full_offset(a->xd);
+ xj_ofs = vec_full_offset(a->xj);
+
+ tcg_gen_gvec_addi(mop, xd_ofs, xj_ofs, -a->imm, 32, ctx->vl / 8);
+ return true;
+}
+
TRANS(xvadd_b, gvec_xxx, MO_8, tcg_gen_gvec_add)
TRANS(xvadd_h, gvec_xxx, MO_16, tcg_gen_gvec_add)
TRANS(xvadd_w, gvec_xxx, MO_32, tcg_gen_gvec_add)
@@ -74,6 +102,15 @@ TRANS(xvsub_h, gvec_xxx, MO_16, tcg_gen_gvec_sub)
TRANS(xvsub_w, gvec_xxx, MO_32, tcg_gen_gvec_sub)
TRANS(xvsub_d, gvec_xxx, MO_64, tcg_gen_gvec_sub)
+TRANS(xvaddi_bu, gvec_xx_i, MO_8, tcg_gen_gvec_addi)
+TRANS(xvaddi_hu, gvec_xx_i, MO_16, tcg_gen_gvec_addi)
+TRANS(xvaddi_wu, gvec_xx_i, MO_32, tcg_gen_gvec_addi)
+TRANS(xvaddi_du, gvec_xx_i, MO_64, tcg_gen_gvec_addi)
+TRANS(xvsubi_bu, gvec_xsubi, MO_8)
+TRANS(xvsubi_hu, gvec_xsubi, MO_16)
+TRANS(xvsubi_wu, gvec_xsubi, MO_32)
+TRANS(xvsubi_du, gvec_xsubi, MO_64)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 2eab7f6a98..0bed748216 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1303,7 +1303,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&xxx xd xj xk
&xr xd rj
-
+&xx_i xd xj imm
#
# LASX Formats
@@ -1311,6 +1311,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
@xr .... ........ ..... ..... rj:5 xd:5 &xr
+@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1323,6 +1324,15 @@ xvsub_w 0111 01000000 11010 ..... ..... ..... @xxx
xvsub_d 0111 01000000 11011 ..... ..... ..... @xxx
xvsub_q 0111 01010010 11011 ..... ..... ..... @xxx
+xvaddi_bu 0111 01101000 10100 ..... ..... ..... @xx_ui5
+xvaddi_hu 0111 01101000 10101 ..... ..... ..... @xx_ui5
+xvaddi_wu 0111 01101000 10110 ..... ..... ..... @xx_ui5
+xvaddi_du 0111 01101000 10111 ..... ..... ..... @xx_ui5
+xvsubi_bu 0111 01101000 11000 ..... ..... ..... @xx_ui5
+xvsubi_hu 0111 01101000 11001 ..... ..... ..... @xx_ui5
+xvsubi_wu 0111 01101000 11010 ..... ..... ..... @xx_ui5
+xvsubi_du 0111 01101000 11011 ..... ..... ..... @xx_ui5
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 07/46] target/loongarch: Implement xvneg
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (5 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 06/46] target/loongarch: Implement xvaddi/xvsubi Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 08/46] target/loongarch: Implement xvsadd/xvssub Song Gao
` (38 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVNEG.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 10 ++++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 20 ++++++++++++++++++++
target/loongarch/insns.decode | 7 +++++++
3 files changed, 37 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 7b84766fa8..eefd16e3f1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1713,6 +1713,11 @@ static void output_xx_i(DisasContext *ctx, arg_xx_i *a, const char *mnemonic)
output(ctx, mnemonic, "x%d, x%d, 0x%x", a->xd, a->xj, a->imm);
}
+static void output_xx(DisasContext *ctx, arg_xx *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, x%d", a->xd, a->xj);
+}
+
static void output_xr(DisasContext *ctx, arg_xr *a, const char *mnemonic)
{
output(ctx, mnemonic, "x%d, r%d", a->xd, a->rj);
@@ -1738,6 +1743,11 @@ INSN_LASX(xvsubi_hu, xx_i)
INSN_LASX(xvsubi_wu, xx_i)
INSN_LASX(xvsubi_du, xx_i)
+INSN_LASX(xvneg_b, xx)
+INSN_LASX(xvneg_h, xx)
+INSN_LASX(xvneg_w, xx)
+INSN_LASX(xvneg_d, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index a42e92f930..cea944c3ba 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -59,6 +59,21 @@ static bool gvec_xsubi(DisasContext *ctx, arg_xx_i *a, MemOp mop)
return true;
}
+static bool gvec_xx(DisasContext *ctx, arg_xx *a, MemOp mop,
+ void (*func)(unsigned, uint32_t, uint32_t,
+ uint32_t, uint32_t))
+{
+ uint32_t xd_ofs, xj_ofs;
+
+ CHECK_ASXE;
+
+ xd_ofs = vec_full_offset(a->xd);
+ xj_ofs = vec_full_offset(a->xj);
+
+ func(mop, xd_ofs, xj_ofs, 32, ctx->vl / 8);
+ return true;
+}
+
TRANS(xvadd_b, gvec_xxx, MO_8, tcg_gen_gvec_add)
TRANS(xvadd_h, gvec_xxx, MO_16, tcg_gen_gvec_add)
TRANS(xvadd_w, gvec_xxx, MO_32, tcg_gen_gvec_add)
@@ -111,6 +126,11 @@ TRANS(xvsubi_hu, gvec_xsubi, MO_16)
TRANS(xvsubi_wu, gvec_xsubi, MO_32)
TRANS(xvsubi_du, gvec_xsubi, MO_64)
+TRANS(xvneg_b, gvec_xx, MO_8, tcg_gen_gvec_neg)
+TRANS(xvneg_h, gvec_xx, MO_16, tcg_gen_gvec_neg)
+TRANS(xvneg_w, gvec_xx, MO_32, tcg_gen_gvec_neg)
+TRANS(xvneg_d, gvec_xx, MO_64, tcg_gen_gvec_neg)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0bed748216..78452c622c 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1301,6 +1301,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
# LASX Argument sets
#
+&xx xd xj
&xxx xd xj xk
&xr xd rj
&xx_i xd xj imm
@@ -1309,6 +1310,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
# LASX Formats
#
+@xx .... ........ ..... ..... xj:5 xd:5 &xx
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
@xr .... ........ ..... ..... rj:5 xd:5 &xr
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
@@ -1333,6 +1335,11 @@ xvsubi_hu 0111 01101000 11001 ..... ..... ..... @xx_ui5
xvsubi_wu 0111 01101000 11010 ..... ..... ..... @xx_ui5
xvsubi_du 0111 01101000 11011 ..... ..... ..... @xx_ui5
+xvneg_b 0111 01101001 11000 01100 ..... ..... @xx
+xvneg_h 0111 01101001 11000 01101 ..... ..... @xx
+xvneg_w 0111 01101001 11000 01110 ..... ..... @xx
+xvneg_d 0111 01101001 11000 01111 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 08/46] target/loongarch: Implement xvsadd/xvssub
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (6 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 07/46] target/loongarch: Implement xvneg Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 09/46] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
` (37 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSADD.{B/H/W/D}[U];
- XVSSUB.{B/H/W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 17 +++++++++++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++++++++++
target/loongarch/insns.decode | 18 ++++++++++++++++++
3 files changed, 52 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index eefd16e3f1..2a2993cb95 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1748,6 +1748,23 @@ INSN_LASX(xvneg_h, xx)
INSN_LASX(xvneg_w, xx)
INSN_LASX(xvneg_d, xx)
+INSN_LASX(xvsadd_b, xxx)
+INSN_LASX(xvsadd_h, xxx)
+INSN_LASX(xvsadd_w, xxx)
+INSN_LASX(xvsadd_d, xxx)
+INSN_LASX(xvsadd_bu, xxx)
+INSN_LASX(xvsadd_hu, xxx)
+INSN_LASX(xvsadd_wu, xxx)
+INSN_LASX(xvsadd_du, xxx)
+INSN_LASX(xvssub_b, xxx)
+INSN_LASX(xvssub_h, xxx)
+INSN_LASX(xvssub_w, xxx)
+INSN_LASX(xvssub_d, xxx)
+INSN_LASX(xvssub_bu, xxx)
+INSN_LASX(xvssub_hu, xxx)
+INSN_LASX(xvssub_wu, xxx)
+INSN_LASX(xvssub_du, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index cea944c3ba..ec68193686 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -131,6 +131,23 @@ TRANS(xvneg_h, gvec_xx, MO_16, tcg_gen_gvec_neg)
TRANS(xvneg_w, gvec_xx, MO_32, tcg_gen_gvec_neg)
TRANS(xvneg_d, gvec_xx, MO_64, tcg_gen_gvec_neg)
+TRANS(xvsadd_b, gvec_xxx, MO_8, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_h, gvec_xxx, MO_16, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_w, gvec_xxx, MO_32, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_d, gvec_xxx, MO_64, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_bu, gvec_xxx, MO_8, tcg_gen_gvec_usadd)
+TRANS(xvsadd_hu, gvec_xxx, MO_16, tcg_gen_gvec_usadd)
+TRANS(xvsadd_wu, gvec_xxx, MO_32, tcg_gen_gvec_usadd)
+TRANS(xvsadd_du, gvec_xxx, MO_64, tcg_gen_gvec_usadd)
+TRANS(xvssub_b, gvec_xxx, MO_8, tcg_gen_gvec_sssub)
+TRANS(xvssub_h, gvec_xxx, MO_16, tcg_gen_gvec_sssub)
+TRANS(xvssub_w, gvec_xxx, MO_32, tcg_gen_gvec_sssub)
+TRANS(xvssub_d, gvec_xxx, MO_64, tcg_gen_gvec_sssub)
+TRANS(xvssub_bu, gvec_xxx, MO_8, tcg_gen_gvec_ussub)
+TRANS(xvssub_hu, gvec_xxx, MO_16, tcg_gen_gvec_ussub)
+TRANS(xvssub_wu, gvec_xxx, MO_32, tcg_gen_gvec_ussub)
+TRANS(xvssub_du, gvec_xxx, MO_64, tcg_gen_gvec_ussub)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 78452c622c..be706fe0f7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1340,6 +1340,24 @@ xvneg_h 0111 01101001 11000 01101 ..... ..... @xx
xvneg_w 0111 01101001 11000 01110 ..... ..... @xx
xvneg_d 0111 01101001 11000 01111 ..... ..... @xx
+xvsadd_b 0111 01000100 01100 ..... ..... ..... @xxx
+xvsadd_h 0111 01000100 01101 ..... ..... ..... @xxx
+xvsadd_w 0111 01000100 01110 ..... ..... ..... @xxx
+xvsadd_d 0111 01000100 01111 ..... ..... ..... @xxx
+xvsadd_bu 0111 01000100 10100 ..... ..... ..... @xxx
+xvsadd_hu 0111 01000100 10101 ..... ..... ..... @xxx
+xvsadd_wu 0111 01000100 10110 ..... ..... ..... @xxx
+xvsadd_du 0111 01000100 10111 ..... ..... ..... @xxx
+
+xvssub_b 0111 01000100 10000 ..... ..... ..... @xxx
+xvssub_h 0111 01000100 10001 ..... ..... ..... @xxx
+xvssub_w 0111 01000100 10010 ..... ..... ..... @xxx
+xvssub_d 0111 01000100 10011 ..... ..... ..... @xxx
+xvssub_bu 0111 01000100 11000 ..... ..... ..... @xxx
+xvssub_hu 0111 01000100 11001 ..... ..... ..... @xxx
+xvssub_wu 0111 01000100 11010 ..... ..... ..... @xxx
+xvssub_du 0111 01000100 11011 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 09/46] target/loongarch: Implement xvhaddw/xvhsubw
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (7 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 08/46] target/loongarch: Implement xvsadd/xvssub Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 10/46] target/loongarch: Implement xvaddw/xvsubw Song Gao
` (36 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVHADDW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU};
- XVHSUBW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 17 ++++
target/loongarch/helper.h | 18 ++++
target/loongarch/insn_trans/trans_lasx.c.inc | 30 +++++++
target/loongarch/insns.decode | 18 ++++
target/loongarch/lasx_helper.c | 90 ++++++++++++++++++++
target/loongarch/lsx_helper.c | 3 -
target/loongarch/vec.h | 3 +
7 files changed, 176 insertions(+), 3 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2a2993cb95..770359524e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1765,6 +1765,23 @@ INSN_LASX(xvssub_hu, xxx)
INSN_LASX(xvssub_wu, xxx)
INSN_LASX(xvssub_du, xxx)
+INSN_LASX(xvhaddw_h_b, xxx)
+INSN_LASX(xvhaddw_w_h, xxx)
+INSN_LASX(xvhaddw_d_w, xxx)
+INSN_LASX(xvhaddw_q_d, xxx)
+INSN_LASX(xvhaddw_hu_bu, xxx)
+INSN_LASX(xvhaddw_wu_hu, xxx)
+INSN_LASX(xvhaddw_du_wu, xxx)
+INSN_LASX(xvhaddw_qu_du, xxx)
+INSN_LASX(xvhsubw_h_b, xxx)
+INSN_LASX(xvhsubw_w_h, xxx)
+INSN_LASX(xvhsubw_d_w, xxx)
+INSN_LASX(xvhsubw_q_d, xxx)
+INSN_LASX(xvhsubw_hu_bu, xxx)
+INSN_LASX(xvhsubw_wu_hu, xxx)
+INSN_LASX(xvhsubw_du_wu, xxx)
+INSN_LASX(xvhsubw_qu_du, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b9de77d926..db2deaff79 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -696,3 +696,21 @@ DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
+
+/* LoongArch LASX */
+DEF_HELPER_4(xvhaddw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhaddw_qu_du, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_d_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_q_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvhsubw_qu_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index ec68193686..aa0e35b228 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -15,6 +15,19 @@
#define CHECK_ASXE
#endif
+static bool gen_xxx(DisasContext *ctx, arg_xxx *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+ TCGv_i32 xk = tcg_constant_i32(a->xk);
+
+ CHECK_ASXE;
+
+ func(cpu_env, xd, xj, xk);
+ return true;
+}
+
static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
void (*func)(unsigned, uint32_t, uint32_t,
uint32_t, uint32_t, uint32_t))
@@ -148,6 +161,23 @@ TRANS(xvssub_hu, gvec_xxx, MO_16, tcg_gen_gvec_ussub)
TRANS(xvssub_wu, gvec_xxx, MO_32, tcg_gen_gvec_ussub)
TRANS(xvssub_du, gvec_xxx, MO_64, tcg_gen_gvec_ussub)
+TRANS(xvhaddw_h_b, gen_xxx, gen_helper_xvhaddw_h_b)
+TRANS(xvhaddw_w_h, gen_xxx, gen_helper_xvhaddw_w_h)
+TRANS(xvhaddw_d_w, gen_xxx, gen_helper_xvhaddw_d_w)
+TRANS(xvhaddw_q_d, gen_xxx, gen_helper_xvhaddw_q_d)
+TRANS(xvhaddw_hu_bu, gen_xxx, gen_helper_xvhaddw_hu_bu)
+TRANS(xvhaddw_wu_hu, gen_xxx, gen_helper_xvhaddw_wu_hu)
+TRANS(xvhaddw_du_wu, gen_xxx, gen_helper_xvhaddw_du_wu)
+TRANS(xvhaddw_qu_du, gen_xxx, gen_helper_xvhaddw_qu_du)
+TRANS(xvhsubw_h_b, gen_xxx, gen_helper_xvhsubw_h_b)
+TRANS(xvhsubw_w_h, gen_xxx, gen_helper_xvhsubw_w_h)
+TRANS(xvhsubw_d_w, gen_xxx, gen_helper_xvhsubw_d_w)
+TRANS(xvhsubw_q_d, gen_xxx, gen_helper_xvhsubw_q_d)
+TRANS(xvhsubw_hu_bu, gen_xxx, gen_helper_xvhsubw_hu_bu)
+TRANS(xvhsubw_wu_hu, gen_xxx, gen_helper_xvhsubw_wu_hu)
+TRANS(xvhsubw_du_wu, gen_xxx, gen_helper_xvhsubw_du_wu)
+TRANS(xvhsubw_qu_du, gen_xxx, gen_helper_xvhsubw_qu_du)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index be706fe0f7..48556b2267 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1358,6 +1358,24 @@ xvssub_hu 0111 01000100 11001 ..... ..... ..... @xxx
xvssub_wu 0111 01000100 11010 ..... ..... ..... @xxx
xvssub_du 0111 01000100 11011 ..... ..... ..... @xxx
+xvhaddw_h_b 0111 01000101 01000 ..... ..... ..... @xxx
+xvhaddw_w_h 0111 01000101 01001 ..... ..... ..... @xxx
+xvhaddw_d_w 0111 01000101 01010 ..... ..... ..... @xxx
+xvhaddw_q_d 0111 01000101 01011 ..... ..... ..... @xxx
+xvhaddw_hu_bu 0111 01000101 10000 ..... ..... ..... @xxx
+xvhaddw_wu_hu 0111 01000101 10001 ..... ..... ..... @xxx
+xvhaddw_du_wu 0111 01000101 10010 ..... ..... ..... @xxx
+xvhaddw_qu_du 0111 01000101 10011 ..... ..... ..... @xxx
+
+xvhsubw_h_b 0111 01000101 01100 ..... ..... ..... @xxx
+xvhsubw_w_h 0111 01000101 01101 ..... ..... ..... @xxx
+xvhsubw_d_w 0111 01000101 01110 ..... ..... ..... @xxx
+xvhsubw_q_d 0111 01000101 01111 ..... ..... ..... @xxx
+xvhsubw_hu_bu 0111 01000101 10100 ..... ..... ..... @xxx
+xvhsubw_wu_hu 0111 01000101 10101 ..... ..... ..... @xxx
+xvhsubw_du_wu 0111 01000101 10110 ..... ..... ..... @xxx
+xvhsubw_qu_du 0111 01000101 10111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 1754790a3a..d86381ff8a 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -4,3 +4,93 @@
*
* Copyright (c) 2023 Loongson Technology Corporation Limited
*/
+
+#include "qemu/osdep.h"
+#include "cpu.h"
+#include "exec/exec-all.h"
+#include "exec/helper-proto.h"
+#include "internals.h"
+#include "vec.h"
+
+#define XDO_ODD_EVEN(NAME, BIT, E1, E2, DO_OP) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ typedef __typeof(Xd->E1(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) = DO_OP((TD)Xj->E2(2 * i + 1), (TD)Xk->E2(2 * i)); \
+ } \
+}
+
+XDO_ODD_EVEN(xvhaddw_h_b, 16, XH, XB, DO_ADD)
+XDO_ODD_EVEN(xvhaddw_w_h, 32, XW, XH, DO_ADD)
+XDO_ODD_EVEN(xvhaddw_d_w, 64, XD, XW, DO_ADD)
+
+void HELPER(xvhaddw_q_d)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ Xd->XQ(0) = int128_add(int128_makes64(Xj->XD(1)),
+ int128_makes64(Xk->XD(0)));
+ Xd->XQ(1) = int128_add(int128_makes64(Xj->XD(3)),
+ int128_makes64(Xk->XD(2)));
+}
+
+XDO_ODD_EVEN(xvhsubw_h_b, 16, XH, XB, DO_SUB)
+XDO_ODD_EVEN(xvhsubw_w_h, 32, XW, XH, DO_SUB)
+XDO_ODD_EVEN(xvhsubw_d_w, 64, XD, XW, DO_SUB)
+
+void HELPER(xvhsubw_q_d)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ Xd->XQ(0) = int128_sub(int128_makes64(Xj->XD(1)),
+ int128_makes64(Xk->XD(0)));
+ Xd->XQ(1) = int128_sub(int128_makes64(Xj->XD(3)),
+ int128_makes64(Xk->XD(2)));
+}
+
+XDO_ODD_EVEN(xvhaddw_hu_bu, 16, UXH, UXB, DO_ADD)
+XDO_ODD_EVEN(xvhaddw_wu_hu, 32, UXW, UXH, DO_ADD)
+XDO_ODD_EVEN(xvhaddw_du_wu, 64, UXD, UXW, DO_ADD)
+
+void HELPER(xvhaddw_qu_du)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ Xd->XQ(0) = int128_add(int128_make64(Xj->UXD(1)),
+ int128_make64(Xk->UXD(0)));
+ Xd->XQ(1) = int128_add(int128_make64(Xj->UXD(3)),
+ int128_make64(Xk->UXD(2)));
+}
+
+XDO_ODD_EVEN(xvhsubw_hu_bu, 16, UXH, UXB, DO_SUB)
+XDO_ODD_EVEN(xvhsubw_wu_hu, 32, UXW, UXH, DO_SUB)
+XDO_ODD_EVEN(xvhsubw_du_wu, 64, UXD, UXW, DO_SUB)
+
+void HELPER(xvhsubw_qu_du)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ Xd->XQ(0) = int128_sub(int128_make64(Xj->UXD(1)),
+ int128_make64(Xk->UXD(0)));
+ Xd->XQ(1) = int128_sub(int128_make64(Xj->UXD(3)),
+ int128_make64(Xk->UXD(2)));
+}
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b231a2798b..d79a65dfe2 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -14,9 +14,6 @@
#include "tcg/tcg.h"
#include "vec.h"
-#define DO_ADD(a, b) (a + b)
-#define DO_SUB(a, b) (a - b)
-
#define DO_ODD_EVEN(NAME, BIT, E1, E2, DO_OP) \
void HELPER(NAME)(CPULoongArchState *env, \
uint32_t vd, uint32_t vj, uint32_t vk) \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index a89cdb8d45..7e71035e50 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -48,4 +48,7 @@
#define XQ(x) XQ[x]
#endif /* HOST_BIG_ENDIAN */
+#define DO_ADD(a, b) (a + b)
+#define DO_SUB(a, b) (a - b)
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 10/46] target/loongarch: Implement xvaddw/xvsubw
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (8 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 09/46] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 11/46] target/loongarch: Implement xavg/xvagr Song Gao
` (35 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVSUBW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 43 ++
target/loongarch/helper.h | 45 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 410 +++++++++++++++++++
target/loongarch/insns.decode | 45 ++
target/loongarch/lasx_helper.c | 214 ++++++++++
5 files changed, 757 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 770359524e..6e790f0959 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1782,6 +1782,49 @@ INSN_LASX(xvhsubw_wu_hu, xxx)
INSN_LASX(xvhsubw_du_wu, xxx)
INSN_LASX(xvhsubw_qu_du, xxx)
+INSN_LASX(xvaddwev_h_b, xxx)
+INSN_LASX(xvaddwev_w_h, xxx)
+INSN_LASX(xvaddwev_d_w, xxx)
+INSN_LASX(xvaddwev_q_d, xxx)
+INSN_LASX(xvaddwod_h_b, xxx)
+INSN_LASX(xvaddwod_w_h, xxx)
+INSN_LASX(xvaddwod_d_w, xxx)
+INSN_LASX(xvaddwod_q_d, xxx)
+INSN_LASX(xvsubwev_h_b, xxx)
+INSN_LASX(xvsubwev_w_h, xxx)
+INSN_LASX(xvsubwev_d_w, xxx)
+INSN_LASX(xvsubwev_q_d, xxx)
+INSN_LASX(xvsubwod_h_b, xxx)
+INSN_LASX(xvsubwod_w_h, xxx)
+INSN_LASX(xvsubwod_d_w, xxx)
+INSN_LASX(xvsubwod_q_d, xxx)
+
+INSN_LASX(xvaddwev_h_bu, xxx)
+INSN_LASX(xvaddwev_w_hu, xxx)
+INSN_LASX(xvaddwev_d_wu, xxx)
+INSN_LASX(xvaddwev_q_du, xxx)
+INSN_LASX(xvaddwod_h_bu, xxx)
+INSN_LASX(xvaddwod_w_hu, xxx)
+INSN_LASX(xvaddwod_d_wu, xxx)
+INSN_LASX(xvaddwod_q_du, xxx)
+INSN_LASX(xvsubwev_h_bu, xxx)
+INSN_LASX(xvsubwev_w_hu, xxx)
+INSN_LASX(xvsubwev_d_wu, xxx)
+INSN_LASX(xvsubwev_q_du, xxx)
+INSN_LASX(xvsubwod_h_bu, xxx)
+INSN_LASX(xvsubwod_w_hu, xxx)
+INSN_LASX(xvsubwod_d_wu, xxx)
+INSN_LASX(xvsubwod_q_du, xxx)
+
+INSN_LASX(xvaddwev_h_bu_b, xxx)
+INSN_LASX(xvaddwev_w_hu_h, xxx)
+INSN_LASX(xvaddwev_d_wu_w, xxx)
+INSN_LASX(xvaddwev_q_du_d, xxx)
+INSN_LASX(xvaddwod_h_bu_b, xxx)
+INSN_LASX(xvaddwod_w_hu_h, xxx)
+INSN_LASX(xvaddwod_d_wu_w, xxx)
+INSN_LASX(xvaddwod_q_du_d, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index db2deaff79..2034576d87 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -714,3 +714,48 @@ DEF_HELPER_4(xvhsubw_hu_bu, void, env, i32, i32, i32)
DEF_HELPER_4(xvhsubw_wu_hu, void, env, i32, i32, i32)
DEF_HELPER_4(xvhsubw_du_wu, void, env, i32, i32, i32)
DEF_HELPER_4(xvhsubw_qu_du, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvsubwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwev_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvaddwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvsubwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwev_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsubwod_q_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvaddwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwev_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index aa0e35b228..0a574182db 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -178,6 +178,416 @@ TRANS(xvhsubw_wu_hu, gen_xxx, gen_helper_xvhsubw_wu_hu)
TRANS(xvhsubw_du_wu, gen_xxx, gen_helper_xvhsubw_du_wu)
TRANS(xvhsubw_qu_du, gen_xxx, gen_helper_xvhsubw_qu_du)
+static void do_xvaddwev_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vaddwev_s,
+ .fno = gen_helper_xvaddwev_h_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vaddwev_w_h,
+ .fniv = gen_vaddwev_s,
+ .fno = gen_helper_xvaddwev_w_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vaddwev_d_w,
+ .fniv = gen_vaddwev_s,
+ .fno = gen_helper_xvaddwev_d_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvaddwev_q_d,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvaddwev_h_b, gvec_xxx, MO_8, do_xvaddwev_s)
+TRANS(xvaddwev_w_h, gvec_xxx, MO_16, do_xvaddwev_s)
+TRANS(xvaddwev_d_w, gvec_xxx, MO_32, do_xvaddwev_s)
+TRANS(xvaddwev_q_d, gvec_xxx, MO_64, do_xvaddwev_s)
+
+static void do_xvaddwod_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vaddwod_s,
+ .fno = gen_helper_xvaddwod_h_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vaddwod_w_h,
+ .fniv = gen_vaddwod_s,
+ .fno = gen_helper_xvaddwod_w_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vaddwod_d_w,
+ .fniv = gen_vaddwod_s,
+ .fno = gen_helper_xvaddwod_d_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvaddwod_q_d,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvaddwod_h_b, gvec_xxx, MO_8, do_xvaddwod_s)
+TRANS(xvaddwod_w_h, gvec_xxx, MO_16, do_xvaddwod_s)
+TRANS(xvaddwod_d_w, gvec_xxx, MO_32, do_xvaddwod_s)
+TRANS(xvaddwod_q_d, gvec_xxx, MO_64, do_xvaddwod_s)
+
+static void do_xvsubwev_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vsubwev_s,
+ .fno = gen_helper_xvsubwev_h_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vsubwev_w_h,
+ .fniv = gen_vsubwev_s,
+ .fno = gen_helper_xvsubwev_w_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vsubwev_d_w,
+ .fniv = gen_vsubwev_s,
+ .fno = gen_helper_xvsubwev_d_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvsubwev_q_d,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvsubwev_h_b, gvec_xxx, MO_8, do_xvsubwev_s)
+TRANS(xvsubwev_w_h, gvec_xxx, MO_16, do_xvsubwev_s)
+TRANS(xvsubwev_d_w, gvec_xxx, MO_32, do_xvsubwev_s)
+TRANS(xvsubwev_q_d, gvec_xxx, MO_64, do_xvsubwev_s)
+
+static void do_xvsubwod_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vsubwod_s,
+ .fno = gen_helper_xvsubwod_h_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vsubwod_w_h,
+ .fniv = gen_vsubwod_s,
+ .fno = gen_helper_xvsubwod_w_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vsubwod_d_w,
+ .fniv = gen_vsubwod_s,
+ .fno = gen_helper_xvsubwod_d_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvsubwod_q_d,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvsubwod_h_b, gvec_xxx, MO_8, do_xvsubwod_s)
+TRANS(xvsubwod_w_h, gvec_xxx, MO_16, do_xvsubwod_s)
+TRANS(xvsubwod_d_w, gvec_xxx, MO_32, do_xvsubwod_s)
+TRANS(xvsubwod_q_d, gvec_xxx, MO_64, do_xvsubwod_s)
+
+static void do_xvaddwev_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vaddwev_u,
+ .fno = gen_helper_xvaddwev_h_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vaddwev_w_hu,
+ .fniv = gen_vaddwev_u,
+ .fno = gen_helper_xvaddwev_w_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vaddwev_d_wu,
+ .fniv = gen_vaddwev_u,
+ .fno = gen_helper_xvaddwev_d_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvaddwev_q_du,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvaddwev_h_bu, gvec_xxx, MO_8, do_xvaddwev_u)
+TRANS(xvaddwev_w_hu, gvec_xxx, MO_16, do_xvaddwev_u)
+TRANS(xvaddwev_d_wu, gvec_xxx, MO_32, do_xvaddwev_u)
+TRANS(xvaddwev_q_du, gvec_xxx, MO_64, do_xvaddwev_u)
+
+static void do_xvaddwod_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vaddwod_u,
+ .fno = gen_helper_xvaddwod_h_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vaddwod_w_hu,
+ .fniv = gen_vaddwod_u,
+ .fno = gen_helper_xvaddwod_w_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vaddwod_d_wu,
+ .fniv = gen_vaddwod_u,
+ .fno = gen_helper_xvaddwod_d_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvaddwod_q_du,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvaddwod_h_bu, gvec_xxx, MO_8, do_xvaddwod_u)
+TRANS(xvaddwod_w_hu, gvec_xxx, MO_16, do_xvaddwod_u)
+TRANS(xvaddwod_d_wu, gvec_xxx, MO_32, do_xvaddwod_u)
+TRANS(xvaddwod_q_du, gvec_xxx, MO_64, do_xvaddwod_u)
+
+static void do_xvsubwev_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vsubwev_u,
+ .fno = gen_helper_xvsubwev_h_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vsubwev_w_hu,
+ .fniv = gen_vsubwev_u,
+ .fno = gen_helper_xvsubwev_w_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vsubwev_d_wu,
+ .fniv = gen_vsubwev_u,
+ .fno = gen_helper_xvsubwev_d_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvsubwev_q_du,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvsubwev_h_bu, gvec_xxx, MO_8, do_xvsubwev_u)
+TRANS(xvsubwev_w_hu, gvec_xxx, MO_16, do_xvsubwev_u)
+TRANS(xvsubwev_d_wu, gvec_xxx, MO_32, do_xvsubwev_u)
+TRANS(xvsubwev_q_du, gvec_xxx, MO_64, do_xvsubwev_u)
+
+static void do_xvsubwod_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vsubwod_u,
+ .fno = gen_helper_xvsubwod_h_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vsubwod_w_hu,
+ .fniv = gen_vsubwod_u,
+ .fno = gen_helper_xvsubwod_w_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vsubwod_d_wu,
+ .fniv = gen_vsubwod_u,
+ .fno = gen_helper_xvsubwod_d_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvsubwod_q_du,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvsubwod_h_bu, gvec_xxx, MO_8, do_xvsubwod_u)
+TRANS(xvsubwod_w_hu, gvec_xxx, MO_16, do_xvsubwod_u)
+TRANS(xvsubwod_d_wu, gvec_xxx, MO_32, do_xvsubwod_u)
+TRANS(xvsubwod_q_du, gvec_xxx, MO_64, do_xvsubwod_u)
+
+static void do_xvaddwev_u_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vaddwev_u_s,
+ .fno = gen_helper_xvaddwev_h_bu_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vaddwev_w_hu_h,
+ .fniv = gen_vaddwev_u_s,
+ .fno = gen_helper_xvaddwev_w_hu_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vaddwev_d_wu_w,
+ .fniv = gen_vaddwev_u_s,
+ .fno = gen_helper_xvaddwev_d_wu_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvaddwev_q_du_d,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvaddwev_h_bu_b, gvec_xxx, MO_8, do_xvaddwev_u_s)
+TRANS(xvaddwev_w_hu_h, gvec_xxx, MO_16, do_xvaddwev_u_s)
+TRANS(xvaddwev_d_wu_w, gvec_xxx, MO_32, do_xvaddwev_u_s)
+TRANS(xvaddwev_q_du_d, gvec_xxx, MO_64, do_xvaddwev_u_s)
+
+static void do_xvaddwod_u_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vaddwod_u_s,
+ .fno = gen_helper_xvaddwod_h_bu_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vaddwod_w_hu_h,
+ .fniv = gen_vaddwod_u_s,
+ .fno = gen_helper_xvaddwod_w_hu_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vaddwod_d_wu_w,
+ .fniv = gen_vaddwod_u_s,
+ .fno = gen_helper_xvaddwod_d_wu_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ {
+ .fno = gen_helper_xvaddwod_q_du_d,
+ .vece = MO_128
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvaddwod_h_bu_b, gvec_xxx, MO_8, do_xvaddwod_u_s)
+TRANS(xvaddwod_w_hu_h, gvec_xxx, MO_16, do_xvaddwod_u_s)
+TRANS(xvaddwod_d_wu_w, gvec_xxx, MO_32, do_xvaddwod_u_s)
+TRANS(xvaddwod_q_du_d, gvec_xxx, MO_64, do_xvaddwod_u_s)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 48556b2267..1d177f9676 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1376,6 +1376,51 @@ xvhsubw_wu_hu 0111 01000101 10101 ..... ..... ..... @xxx
xvhsubw_du_wu 0111 01000101 10110 ..... ..... ..... @xxx
xvhsubw_qu_du 0111 01000101 10111 ..... ..... ..... @xxx
+xvaddwev_h_b 0111 01000001 11100 ..... ..... ..... @xxx
+xvaddwev_w_h 0111 01000001 11101 ..... ..... ..... @xxx
+xvaddwev_d_w 0111 01000001 11110 ..... ..... ..... @xxx
+xvaddwev_q_d 0111 01000001 11111 ..... ..... ..... @xxx
+xvaddwod_h_b 0111 01000010 00100 ..... ..... ..... @xxx
+xvaddwod_w_h 0111 01000010 00101 ..... ..... ..... @xxx
+xvaddwod_d_w 0111 01000010 00110 ..... ..... ..... @xxx
+xvaddwod_q_d 0111 01000010 00111 ..... ..... ..... @xxx
+
+xvsubwev_h_b 0111 01000010 00000 ..... ..... ..... @xxx
+xvsubwev_w_h 0111 01000010 00001 ..... ..... ..... @xxx
+xvsubwev_d_w 0111 01000010 00010 ..... ..... ..... @xxx
+xvsubwev_q_d 0111 01000010 00011 ..... ..... ..... @xxx
+xvsubwod_h_b 0111 01000010 01000 ..... ..... ..... @xxx
+xvsubwod_w_h 0111 01000010 01001 ..... ..... ..... @xxx
+xvsubwod_d_w 0111 01000010 01010 ..... ..... ..... @xxx
+xvsubwod_q_d 0111 01000010 01011 ..... ..... ..... @xxx
+
+xvaddwev_h_bu 0111 01000010 11100 ..... ..... ..... @xxx
+xvaddwev_w_hu 0111 01000010 11101 ..... ..... ..... @xxx
+xvaddwev_d_wu 0111 01000010 11110 ..... ..... ..... @xxx
+xvaddwev_q_du 0111 01000010 11111 ..... ..... ..... @xxx
+xvaddwod_h_bu 0111 01000011 00100 ..... ..... ..... @xxx
+xvaddwod_w_hu 0111 01000011 00101 ..... ..... ..... @xxx
+xvaddwod_d_wu 0111 01000011 00110 ..... ..... ..... @xxx
+xvaddwod_q_du 0111 01000011 00111 ..... ..... ..... @xxx
+
+xvsubwev_h_bu 0111 01000011 00000 ..... ..... ..... @xxx
+xvsubwev_w_hu 0111 01000011 00001 ..... ..... ..... @xxx
+xvsubwev_d_wu 0111 01000011 00010 ..... ..... ..... @xxx
+xvsubwev_q_du 0111 01000011 00011 ..... ..... ..... @xxx
+xvsubwod_h_bu 0111 01000011 01000 ..... ..... ..... @xxx
+xvsubwod_w_hu 0111 01000011 01001 ..... ..... ..... @xxx
+xvsubwod_d_wu 0111 01000011 01010 ..... ..... ..... @xxx
+xvsubwod_q_du 0111 01000011 01011 ..... ..... ..... @xxx
+
+xvaddwev_h_bu_b 0111 01000011 11100 ..... ..... ..... @xxx
+xvaddwev_w_hu_h 0111 01000011 11101 ..... ..... ..... @xxx
+xvaddwev_d_wu_w 0111 01000011 11110 ..... ..... ..... @xxx
+xvaddwev_q_du_d 0111 01000011 11111 ..... ..... ..... @xxx
+xvaddwod_h_bu_b 0111 01000100 00000 ..... ..... ..... @xxx
+xvaddwod_w_hu_h 0111 01000100 00001 ..... ..... ..... @xxx
+xvaddwod_d_wu_w 0111 01000100 00010 ..... ..... ..... @xxx
+xvaddwod_q_du_d 0111 01000100 00011 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index d86381ff8a..8e830e1f3c 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -94,3 +94,217 @@ void HELPER(xvhsubw_qu_du)(CPULoongArchState *env,
Xd->XQ(1) = int128_sub(int128_make64(Xj->UXD(3)),
int128_make64(Xk->UXD(2)));
}
+
+#define XDO_EVEN(NAME, BIT, E1, E2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->E1(0)) TD; \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) = DO_OP((TD)Xj->E2(2 * i), (TD)Xk->E2(2 * i)); \
+ } \
+}
+
+#define XDO_ODD(NAME, BIT, E1, E2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->E1(0)) TD; \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) = DO_OP((TD)Xj->E2(2 * i + 1), (TD)Xk->E2(2 * i + 1)); \
+ } \
+}
+
+void HELPER(xvaddwev_q_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_add(int128_makes64(Xj->XD(0)),
+ int128_makes64(Xk->XD(0)));
+ Xd->XQ(1) = int128_add(int128_makes64(Xj->XD(2)),
+ int128_makes64(Xk->XD(2)));
+}
+
+XDO_EVEN(xvaddwev_h_b, 16, XH, XB, DO_ADD)
+XDO_EVEN(xvaddwev_w_h, 32, XW, XH, DO_ADD)
+XDO_EVEN(xvaddwev_d_w, 64, XD, XW, DO_ADD)
+
+void HELPER(xvaddwod_q_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_add(int128_makes64(Xj->XD(1)),
+ int128_makes64(Xk->XD(1)));
+ Xd->XQ(1) = int128_add(int128_makes64(Xj->XD(3)),
+ int128_makes64(Xk->XD(3)));
+}
+
+XDO_ODD(xvaddwod_h_b, 16, XH, XB, DO_ADD)
+XDO_ODD(xvaddwod_w_h, 32, XW, XH, DO_ADD)
+XDO_ODD(xvaddwod_d_w, 64, XD, XW, DO_ADD)
+
+void HELPER(xvsubwev_q_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_sub(int128_makes64(Xj->XD(0)),
+ int128_makes64(Xk->XD(0)));
+ Xd->XQ(1) = int128_sub(int128_makes64(Xj->XD(2)),
+ int128_makes64(Xk->XD(2)));
+}
+
+XDO_EVEN(xvsubwev_h_b, 16, XH, XB, DO_SUB)
+XDO_EVEN(xvsubwev_w_h, 32, XW, XH, DO_SUB)
+XDO_EVEN(xvsubwev_d_w, 64, XD, XW, DO_SUB)
+
+void HELPER(xvsubwod_q_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_sub(int128_makes64(Xj->XD(1)),
+ int128_makes64(Xk->XD(1)));
+ Xd->XQ(1) = int128_sub(int128_makes64(Xj->XD(3)),
+ int128_makes64(Xk->XD(3)));
+}
+
+XDO_ODD(xvsubwod_h_b, 16, XH, XB, DO_SUB)
+XDO_ODD(xvsubwod_w_h, 32, XW, XH, DO_SUB)
+XDO_ODD(xvsubwod_d_w, 64, XD, XW, DO_SUB)
+
+void HELPER(xvaddwev_q_du)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_add(int128_make64(Xj->UXD(0)),
+ int128_make64(Xk->UXD(0)));
+ Xd->XQ(1) = int128_add(int128_make64(Xj->UXD(2)),
+ int128_make64(Xk->UXD(2)));
+}
+
+XDO_EVEN(xvaddwev_h_bu, 16, UXH, UXB, DO_ADD)
+XDO_EVEN(xvaddwev_w_hu, 32, UXW, UXH, DO_ADD)
+XDO_EVEN(xvaddwev_d_wu, 64, UXD, UXW, DO_ADD)
+
+void HELPER(xvaddwod_q_du)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_add(int128_make64(Xj->UXD(1)),
+ int128_make64(Xk->UXD(1)));
+ Xd->XQ(1) = int128_add(int128_make64(Xj->UXD(3)),
+ int128_make64(Xk->UXD(3)));
+}
+
+XDO_ODD(xvaddwod_h_bu, 16, UXH, UXB, DO_ADD)
+XDO_ODD(xvaddwod_w_hu, 32, UXW, UXH, DO_ADD)
+XDO_ODD(xvaddwod_d_wu, 64, UXD, UXW, DO_ADD)
+
+void HELPER(xvsubwev_q_du)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_sub(int128_make64(Xj->UXD(0)),
+ int128_make64(Xk->UXD(0)));
+ Xd->XQ(1) = int128_sub(int128_make64(Xj->UXD(2)),
+ int128_make64(Xk->UXD(2)));
+}
+
+XDO_EVEN(xvsubwev_h_bu, 16, UXH, UXB, DO_SUB)
+XDO_EVEN(xvsubwev_w_hu, 32, UXW, UXH, DO_SUB)
+XDO_EVEN(xvsubwev_d_wu, 64, UXD, UXW, DO_SUB)
+
+void HELPER(xvsubwod_q_du)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_sub(int128_make64(Xj->UXD(1)),
+ int128_make64(Xk->UXD(1)));
+ Xd->XQ(1) = int128_sub(int128_make64(Xj->UXD(3)),
+ int128_make64(Xk->UXD(3)));
+}
+
+XDO_ODD(xvsubwod_h_bu, 16, UXH, UXB, DO_SUB)
+XDO_ODD(xvsubwod_w_hu, 32, UXW, UXH, DO_SUB)
+XDO_ODD(xvsubwod_d_wu, 64, UXD, UXW, DO_SUB)
+
+#define XDO_EVEN_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->ES1(0)) TDS; \
+ typedef __typeof(Xd->EU1(0)) TDU; \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->ES1(i) = DO_OP((TDU)Xj->EU2(2 * i), (TDS)Xk->ES2(2 * i)); \
+ } \
+}
+
+#define XDO_ODD_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->ES1(0)) TDS; \
+ typedef __typeof(Xd->EU1(0)) TDU; \
+ for (i = 0; i < LSX_LEN / BIT; i++) { \
+ Xd->ES1(i) = DO_OP((TDU)Xj->EU2(2 * i + 1), (TDS)Xk->ES2(2 * i + 1)); \
+ } \
+}
+
+void HELPER(xvaddwev_q_du_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_add(int128_make64(Xj->UXD(0)),
+ int128_makes64(Xk->XD(0)));
+ Xd->XQ(1) = int128_add(int128_make64(Xj->UXD(2)),
+ int128_makes64(Xk->XD(2)));
+}
+
+XDO_EVEN_U_S(xvaddwev_h_bu_b, 16, XH, UXH, XB, UXB, DO_ADD)
+XDO_EVEN_U_S(xvaddwev_w_hu_h, 32, XW, UXW, XH, UXH, DO_ADD)
+XDO_EVEN_U_S(xvaddwev_d_wu_w, 64, XD, UXD, XW, UXW, DO_ADD)
+
+void HELPER(xvaddwod_q_du_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+
+ Xd->XQ(0) = int128_add(int128_make64(Xj->UXD(1)),
+ int128_makes64(Xk->XD(1)));
+ Xd->XQ(1) = int128_add(int128_make64(Xj->UXD(3)),
+ int128_makes64(Xk->XD(3)));
+}
+
+XDO_ODD_U_S(xvaddwod_h_bu_b, 16, XH, UXH, XB, UXB, DO_ADD)
+XDO_ODD_U_S(xvaddwod_w_hu_h, 32, XW, UXW, XH, UXH, DO_ADD)
+XDO_ODD_U_S(xvaddwod_d_wu_w, 64, XD, UXD, XW, UXW, DO_ADD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 11/46] target/loongarch: Implement xavg/xvagr
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (9 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 10/46] target/loongarch: Implement xvaddw/xvsubw Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 12/46] target/loongarch: Implement xvabsd Song Gao
` (34 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVAVG.{B/H/W/D/}[U];
- XVAVGR.{B/H/W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 17 ++
target/loongarch/helper.h | 18 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 162 +++++++++++++++++++
target/loongarch/insns.decode | 17 ++
target/loongarch/lasx_helper.c | 29 ++++
target/loongarch/vec.h | 3 +
6 files changed, 246 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6e790f0959..d804caaee0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1825,6 +1825,23 @@ INSN_LASX(xvaddwod_w_hu_h, xxx)
INSN_LASX(xvaddwod_d_wu_w, xxx)
INSN_LASX(xvaddwod_q_du_d, xxx)
+INSN_LASX(xvavg_b, xxx)
+INSN_LASX(xvavg_h, xxx)
+INSN_LASX(xvavg_w, xxx)
+INSN_LASX(xvavg_d, xxx)
+INSN_LASX(xvavg_bu, xxx)
+INSN_LASX(xvavg_hu, xxx)
+INSN_LASX(xvavg_wu, xxx)
+INSN_LASX(xvavg_du, xxx)
+INSN_LASX(xvavgr_b, xxx)
+INSN_LASX(xvavgr_h, xxx)
+INSN_LASX(xvavgr_w, xxx)
+INSN_LASX(xvavgr_d, xxx)
+INSN_LASX(xvavgr_bu, xxx)
+INSN_LASX(xvavgr_hu, xxx)
+INSN_LASX(xvavgr_wu, xxx)
+INSN_LASX(xvavgr_du, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 2034576d87..feeaa92447 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -759,3 +759,21 @@ DEF_HELPER_FLAGS_4(xvaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvaddwod_q_du_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvavg_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavg_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvavgr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvavgr_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 0a574182db..4a8bcf618f 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -588,6 +588,168 @@ TRANS(xvaddwod_w_hu_h, gvec_xxx, MO_16, do_xvaddwod_u_s)
TRANS(xvaddwod_d_wu_w, gvec_xxx, MO_32, do_xvaddwod_u_s)
TRANS(xvaddwod_q_du_d, gvec_xxx, MO_64, do_xvaddwod_u_s)
+static void do_xvavg_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vavg_s,
+ .fno = gen_helper_xvavg_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vavg_s,
+ .fno = gen_helper_xvavg_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vavg_s,
+ .fno = gen_helper_xvavg_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vavg_s,
+ .fno = gen_helper_xvavg_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+static void do_xvavg_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vavg_u,
+ .fno = gen_helper_xvavg_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vavg_u,
+ .fno = gen_helper_xvavg_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vavg_u,
+ .fno = gen_helper_xvavg_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vavg_u,
+ .fno = gen_helper_xvavg_du,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvavg_b, gvec_xxx, MO_8, do_xvavg_s)
+TRANS(xvavg_h, gvec_xxx, MO_16, do_xvavg_s)
+TRANS(xvavg_w, gvec_xxx, MO_32, do_xvavg_s)
+TRANS(xvavg_d, gvec_xxx, MO_64, do_xvavg_s)
+TRANS(xvavg_bu, gvec_xxx, MO_8, do_xvavg_u)
+TRANS(xvavg_hu, gvec_xxx, MO_16, do_xvavg_u)
+TRANS(xvavg_wu, gvec_xxx, MO_32, do_xvavg_u)
+TRANS(xvavg_du, gvec_xxx, MO_64, do_xvavg_u)
+
+static void do_xvavgr_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vavgr_s,
+ .fno = gen_helper_xvavgr_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vavgr_s,
+ .fno = gen_helper_xvavgr_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vavgr_s,
+ .fno = gen_helper_xvavgr_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vavgr_s,
+ .fno = gen_helper_xvavgr_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+static void do_xvavgr_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vavgr_u,
+ .fno = gen_helper_xvavgr_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vavgr_u,
+ .fno = gen_helper_xvavgr_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vavgr_u,
+ .fno = gen_helper_xvavgr_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vavgr_u,
+ .fno = gen_helper_xvavgr_du,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvavgr_b, gvec_xxx, MO_8, do_xvavgr_s)
+TRANS(xvavgr_h, gvec_xxx, MO_16, do_xvavgr_s)
+TRANS(xvavgr_w, gvec_xxx, MO_32, do_xvavgr_s)
+TRANS(xvavgr_d, gvec_xxx, MO_64, do_xvavgr_s)
+TRANS(xvavgr_bu, gvec_xxx, MO_8, do_xvavgr_u)
+TRANS(xvavgr_hu, gvec_xxx, MO_16, do_xvavgr_u)
+TRANS(xvavgr_wu, gvec_xxx, MO_32, do_xvavgr_u)
+TRANS(xvavgr_du, gvec_xxx, MO_64, do_xvavgr_u)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 1d177f9676..0057aaf1d4 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1421,6 +1421,23 @@ xvaddwod_w_hu_h 0111 01000100 00001 ..... ..... ..... @xxx
xvaddwod_d_wu_w 0111 01000100 00010 ..... ..... ..... @xxx
xvaddwod_q_du_d 0111 01000100 00011 ..... ..... ..... @xxx
+xvavg_b 0111 01000110 01000 ..... ..... ..... @xxx
+xvavg_h 0111 01000110 01001 ..... ..... ..... @xxx
+xvavg_w 0111 01000110 01010 ..... ..... ..... @xxx
+xvavg_d 0111 01000110 01011 ..... ..... ..... @xxx
+xvavg_bu 0111 01000110 01100 ..... ..... ..... @xxx
+xvavg_hu 0111 01000110 01101 ..... ..... ..... @xxx
+xvavg_wu 0111 01000110 01110 ..... ..... ..... @xxx
+xvavg_du 0111 01000110 01111 ..... ..... ..... @xxx
+xvavgr_b 0111 01000110 10000 ..... ..... ..... @xxx
+xvavgr_h 0111 01000110 10001 ..... ..... ..... @xxx
+xvavgr_w 0111 01000110 10010 ..... ..... ..... @xxx
+xvavgr_d 0111 01000110 10011 ..... ..... ..... @xxx
+xvavgr_bu 0111 01000110 10100 ..... ..... ..... @xxx
+xvavgr_hu 0111 01000110 10101 ..... ..... ..... @xxx
+xvavgr_wu 0111 01000110 10110 ..... ..... ..... @xxx
+xvavgr_du 0111 01000110 10111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 8e830e1f3c..8e1bcdb764 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -308,3 +308,32 @@ void HELPER(xvaddwod_q_du_d)(void *xd, void *xj, void *xk, uint32_t v)
XDO_ODD_U_S(xvaddwod_h_bu_b, 16, XH, UXH, XB, UXB, DO_ADD)
XDO_ODD_U_S(xvaddwod_w_hu_h, 32, XW, UXW, XH, UXH, DO_ADD)
XDO_ODD_U_S(xvaddwod_d_wu_w, 64, XD, UXD, XW, UXW, DO_ADD)
+
+#define XDO_3OP(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i), Xk->E(i)); \
+ } \
+}
+
+XDO_3OP(xvavg_b, 8, XB, DO_VAVG)
+XDO_3OP(xvavg_h, 16, XH, DO_VAVG)
+XDO_3OP(xvavg_w, 32, XW, DO_VAVG)
+XDO_3OP(xvavg_d, 64, XD, DO_VAVG)
+XDO_3OP(xvavgr_b, 8, XB, DO_VAVGR)
+XDO_3OP(xvavgr_h, 16, XH, DO_VAVGR)
+XDO_3OP(xvavgr_w, 32, XW, DO_VAVGR)
+XDO_3OP(xvavgr_d, 64, XD, DO_VAVGR)
+XDO_3OP(xvavg_bu, 8, UXB, DO_VAVG)
+XDO_3OP(xvavg_hu, 16, UXH, DO_VAVG)
+XDO_3OP(xvavg_wu, 32, UXW, DO_VAVG)
+XDO_3OP(xvavg_du, 64, UXD, DO_VAVG)
+XDO_3OP(xvavgr_bu, 8, UXB, DO_VAVGR)
+XDO_3OP(xvavgr_hu, 16, UXH, DO_VAVGR)
+XDO_3OP(xvavgr_wu, 32, UXW, DO_VAVGR)
+XDO_3OP(xvavgr_du, 64, UXD, DO_VAVGR)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 7e71035e50..2a9c312e3d 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -51,4 +51,7 @@
#define DO_ADD(a, b) (a + b)
#define DO_SUB(a, b) (a - b)
+#define DO_VAVG(a, b) ((a >> 1) + (b >> 1) + (a & b & 1))
+#define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 12/46] target/loongarch: Implement xvabsd
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (10 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 11/46] target/loongarch: Implement xavg/xvagr Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 13/46] target/loongarch: Implement xvadda Song Gao
` (33 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVABSD.{B/H/W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 9 +++
target/loongarch/helper.h | 9 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 81 ++++++++++++++++++++
target/loongarch/insns.decode | 9 +++
target/loongarch/lasx_helper.c | 9 +++
target/loongarch/lsx_helper.c | 2 -
target/loongarch/vec.h | 2 +
7 files changed, 119 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d804caaee0..d6b6b8ddd6 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1842,6 +1842,15 @@ INSN_LASX(xvavgr_hu, xxx)
INSN_LASX(xvavgr_wu, xxx)
INSN_LASX(xvavgr_du, xxx)
+INSN_LASX(xvabsd_b, xxx)
+INSN_LASX(xvabsd_h, xxx)
+INSN_LASX(xvabsd_w, xxx)
+INSN_LASX(xvabsd_d, xxx)
+INSN_LASX(xvabsd_bu, xxx)
+INSN_LASX(xvabsd_hu, xxx)
+INSN_LASX(xvabsd_wu, xxx)
+INSN_LASX(xvabsd_du, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index feeaa92447..3ec7717c88 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -777,3 +777,12 @@ DEF_HELPER_FLAGS_4(xvavgr_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvavgr_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvavgr_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvavgr_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvabsd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvabsd_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 4a8bcf618f..8f7ff2cba6 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -750,6 +750,87 @@ TRANS(xvavgr_hu, gvec_xxx, MO_16, do_xvavgr_u)
TRANS(xvavgr_wu, gvec_xxx, MO_32, do_xvavgr_u)
TRANS(xvavgr_du, gvec_xxx, MO_64, do_xvavgr_u)
+static void do_xvabsd_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_smax_vec, INDEX_op_smin_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vabsd_s,
+ .fno = gen_helper_xvabsd_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vabsd_s,
+ .fno = gen_helper_xvabsd_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vabsd_s,
+ .fno = gen_helper_xvabsd_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vabsd_s,
+ .fno = gen_helper_xvabsd_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+static void do_xvabsd_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_umax_vec, INDEX_op_umin_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vabsd_u,
+ .fno = gen_helper_xvabsd_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vabsd_u,
+ .fno = gen_helper_xvabsd_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vabsd_u,
+ .fno = gen_helper_xvabsd_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vabsd_u,
+ .fno = gen_helper_xvabsd_du,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvabsd_b, gvec_xxx, MO_8, do_xvabsd_s)
+TRANS(xvabsd_h, gvec_xxx, MO_16, do_xvabsd_s)
+TRANS(xvabsd_w, gvec_xxx, MO_32, do_xvabsd_s)
+TRANS(xvabsd_d, gvec_xxx, MO_64, do_xvabsd_s)
+TRANS(xvabsd_bu, gvec_xxx, MO_8, do_xvabsd_u)
+TRANS(xvabsd_hu, gvec_xxx, MO_16, do_xvabsd_u)
+TRANS(xvabsd_wu, gvec_xxx, MO_32, do_xvabsd_u)
+TRANS(xvabsd_du, gvec_xxx, MO_64, do_xvabsd_u)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0057aaf1d4..8bd029a6e8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1438,6 +1438,15 @@ xvavgr_hu 0111 01000110 10101 ..... ..... ..... @xxx
xvavgr_wu 0111 01000110 10110 ..... ..... ..... @xxx
xvavgr_du 0111 01000110 10111 ..... ..... ..... @xxx
+xvabsd_b 0111 01000110 00000 ..... ..... ..... @xxx
+xvabsd_h 0111 01000110 00001 ..... ..... ..... @xxx
+xvabsd_w 0111 01000110 00010 ..... ..... ..... @xxx
+xvabsd_d 0111 01000110 00011 ..... ..... ..... @xxx
+xvabsd_bu 0111 01000110 00100 ..... ..... ..... @xxx
+xvabsd_hu 0111 01000110 00101 ..... ..... ..... @xxx
+xvabsd_wu 0111 01000110 00110 ..... ..... ..... @xxx
+xvabsd_du 0111 01000110 00111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 8e1bcdb764..e9d38d83bc 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -337,3 +337,12 @@ XDO_3OP(xvavgr_bu, 8, UXB, DO_VAVGR)
XDO_3OP(xvavgr_hu, 16, UXH, DO_VAVGR)
XDO_3OP(xvavgr_wu, 32, UXW, DO_VAVGR)
XDO_3OP(xvavgr_du, 64, UXD, DO_VAVGR)
+
+XDO_3OP(xvabsd_b, 8, XB, DO_VABSD)
+XDO_3OP(xvabsd_h, 16, XH, DO_VABSD)
+XDO_3OP(xvabsd_w, 32, XW, DO_VABSD)
+XDO_3OP(xvabsd_d, 64, XD, DO_VABSD)
+XDO_3OP(xvabsd_bu, 8, UXB, DO_VABSD)
+XDO_3OP(xvabsd_hu, 16, UXH, DO_VABSD)
+XDO_3OP(xvabsd_wu, 32, UXW, DO_VABSD)
+XDO_3OP(xvabsd_du, 64, UXD, DO_VABSD)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d79a65dfe2..72e2306f0c 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -309,8 +309,6 @@ DO_3OP(vavgr_hu, 16, UH, DO_VAVGR)
DO_3OP(vavgr_wu, 32, UW, DO_VAVGR)
DO_3OP(vavgr_du, 64, UD, DO_VAVGR)
-#define DO_VABSD(a, b) ((a > b) ? (a -b) : (b-a))
-
DO_3OP(vabsd_b, 8, B, DO_VABSD)
DO_3OP(vabsd_h, 16, H, DO_VABSD)
DO_3OP(vabsd_w, 32, W, DO_VABSD)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 2a9c312e3d..652d46c157 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -54,4 +54,6 @@
#define DO_VAVG(a, b) ((a >> 1) + (b >> 1) + (a & b & 1))
#define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
+#define DO_VABSD(a, b) ((a > b) ? (a - b) : (b - a))
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 13/46] target/loongarch: Implement xvadda
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (11 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 12/46] target/loongarch: Implement xvabsd Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 14/46] target/loongarch: Implement xvmax/xvmin Song Gao
` (32 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVADDA.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 5 +++
target/loongarch/helper.h | 5 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 41 ++++++++++++++++++++
target/loongarch/insns.decode | 5 +++
target/loongarch/lasx_helper.c | 17 ++++++++
target/loongarch/lsx_helper.c | 2 -
target/loongarch/vec.h | 2 +
7 files changed, 75 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d6b6b8ddd6..cc92f0e763 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1851,6 +1851,11 @@ INSN_LASX(xvabsd_hu, xxx)
INSN_LASX(xvabsd_wu, xxx)
INSN_LASX(xvabsd_du, xxx)
+INSN_LASX(xvadda_b, xxx)
+INSN_LASX(xvadda_h, xxx)
+INSN_LASX(xvadda_w, xxx)
+INSN_LASX(xvadda_d, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 3ec7717c88..67ef7491c4 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -786,3 +786,8 @@ DEF_HELPER_FLAGS_4(xvabsd_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvabsd_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvabsd_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvabsd_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvadda_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvadda_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvadda_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvadda_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 8f7ff2cba6..4b2e50de68 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -831,6 +831,47 @@ TRANS(xvabsd_hu, gvec_xxx, MO_16, do_xvabsd_u)
TRANS(xvabsd_wu, gvec_xxx, MO_32, do_xvabsd_u)
TRANS(xvabsd_du, gvec_xxx, MO_64, do_xvabsd_u)
+static void do_xvadda(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_abs_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vadda,
+ .fno = gen_helper_xvadda_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vadda,
+ .fno = gen_helper_xvadda_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vadda,
+ .fno = gen_helper_xvadda_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vadda,
+ .fno = gen_helper_xvadda_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvadda_b, gvec_xxx, MO_8, do_xvadda)
+TRANS(xvadda_h, gvec_xxx, MO_16, do_xvadda)
+TRANS(xvadda_w, gvec_xxx, MO_32, do_xvadda)
+TRANS(xvadda_d, gvec_xxx, MO_64, do_xvadda)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8bd029a6e8..f8a17f262a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1447,6 +1447,11 @@ xvabsd_hu 0111 01000110 00101 ..... ..... ..... @xxx
xvabsd_wu 0111 01000110 00110 ..... ..... ..... @xxx
xvabsd_du 0111 01000110 00111 ..... ..... ..... @xxx
+xvadda_b 0111 01000101 11000 ..... ..... ..... @xxx
+xvadda_h 0111 01000101 11001 ..... ..... ..... @xxx
+xvadda_w 0111 01000101 11010 ..... ..... ..... @xxx
+xvadda_d 0111 01000101 11011 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index e9d38d83bc..52c230a681 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -346,3 +346,20 @@ XDO_3OP(xvabsd_bu, 8, UXB, DO_VABSD)
XDO_3OP(xvabsd_hu, 16, UXH, DO_VABSD)
XDO_3OP(xvabsd_wu, 32, UXW, DO_VABSD)
XDO_3OP(xvabsd_du, 64, UXD, DO_VABSD)
+
+#define XDO_VADDA(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i)) + DO_OP(Xk->E(i)); \
+ } \
+}
+
+XDO_VADDA(xvadda_b, 8, XB, DO_VABS)
+XDO_VADDA(xvadda_h, 16, XH, DO_VABS)
+XDO_VADDA(xvadda_w, 32, XW, DO_VABS)
+XDO_VADDA(xvadda_d, 64, XD, DO_VABS)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 72e2306f0c..72120c04a4 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -318,8 +318,6 @@ DO_3OP(vabsd_hu, 16, UH, DO_VABSD)
DO_3OP(vabsd_wu, 32, UW, DO_VABSD)
DO_3OP(vabsd_du, 64, UD, DO_VABSD)
-#define DO_VABS(a) ((a < 0) ? (-a) : (a))
-
#define DO_VADDA(NAME, BIT, E, DO_OP) \
void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
{ \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 652d46c157..20b86c3119 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -56,4 +56,6 @@
#define DO_VABSD(a, b) ((a > b) ? (a - b) : (b - a))
+#define DO_VABS(a) ((a < 0) ? (-a) : (a))
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 14/46] target/loongarch: Implement xvmax/xvmin
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (12 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 13/46] target/loongarch: Implement xvadda Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 15/46] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
` (31 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVMAX[I].{B/H/W/D}[U];
- XVMIN[I].{B/H/W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 33 ++++
target/loongarch/helper.h | 18 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 180 +++++++++++++++++++
target/loongarch/insns.decode | 37 ++++
target/loongarch/lasx_helper.c | 30 ++++
target/loongarch/lsx_helper.c | 3 -
target/loongarch/vec.h | 3 +
7 files changed, 301 insertions(+), 3 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index cc92f0e763..ff22fcb90e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1856,6 +1856,39 @@ INSN_LASX(xvadda_h, xxx)
INSN_LASX(xvadda_w, xxx)
INSN_LASX(xvadda_d, xxx)
+INSN_LASX(xvmax_b, xxx)
+INSN_LASX(xvmax_h, xxx)
+INSN_LASX(xvmax_w, xxx)
+INSN_LASX(xvmax_d, xxx)
+INSN_LASX(xvmin_b, xxx)
+INSN_LASX(xvmin_h, xxx)
+INSN_LASX(xvmin_w, xxx)
+INSN_LASX(xvmin_d, xxx)
+INSN_LASX(xvmax_bu, xxx)
+INSN_LASX(xvmax_hu, xxx)
+INSN_LASX(xvmax_wu, xxx)
+INSN_LASX(xvmax_du, xxx)
+INSN_LASX(xvmin_bu, xxx)
+INSN_LASX(xvmin_hu, xxx)
+INSN_LASX(xvmin_wu, xxx)
+INSN_LASX(xvmin_du, xxx)
+INSN_LASX(xvmaxi_b, xx_i)
+INSN_LASX(xvmaxi_h, xx_i)
+INSN_LASX(xvmaxi_w, xx_i)
+INSN_LASX(xvmaxi_d, xx_i)
+INSN_LASX(xvmini_b, xx_i)
+INSN_LASX(xvmini_h, xx_i)
+INSN_LASX(xvmini_w, xx_i)
+INSN_LASX(xvmini_d, xx_i)
+INSN_LASX(xvmaxi_bu, xx_i)
+INSN_LASX(xvmaxi_hu, xx_i)
+INSN_LASX(xvmaxi_wu, xx_i)
+INSN_LASX(xvmaxi_du, xx_i)
+INSN_LASX(xvmini_bu, xx_i)
+INSN_LASX(xvmini_hu, xx_i)
+INSN_LASX(xvmini_wu, xx_i)
+INSN_LASX(xvmini_du, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 67ef7491c4..d5ebc0b963 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -791,3 +791,21 @@ DEF_HELPER_FLAGS_4(xvadda_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvadda_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvadda_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvadda_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmini_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmini_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(xvmaxi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 4b2e50de68..cdf3dcc161 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -872,6 +872,186 @@ TRANS(xvadda_h, gvec_xxx, MO_16, do_xvadda)
TRANS(xvadda_w, gvec_xxx, MO_32, do_xvadda)
TRANS(xvadda_d, gvec_xxx, MO_64, do_xvadda)
+TRANS(xvmax_b, gvec_xxx, MO_8, tcg_gen_gvec_smax)
+TRANS(xvmax_h, gvec_xxx, MO_16, tcg_gen_gvec_smax)
+TRANS(xvmax_w, gvec_xxx, MO_32, tcg_gen_gvec_smax)
+TRANS(xvmax_d, gvec_xxx, MO_64, tcg_gen_gvec_smax)
+TRANS(xvmax_bu, gvec_xxx, MO_8, tcg_gen_gvec_umax)
+TRANS(xvmax_hu, gvec_xxx, MO_16, tcg_gen_gvec_umax)
+TRANS(xvmax_wu, gvec_xxx, MO_32, tcg_gen_gvec_umax)
+TRANS(xvmax_du, gvec_xxx, MO_64, tcg_gen_gvec_umax)
+
+TRANS(xvmin_b, gvec_xxx, MO_8, tcg_gen_gvec_smin)
+TRANS(xvmin_h, gvec_xxx, MO_16, tcg_gen_gvec_smin)
+TRANS(xvmin_w, gvec_xxx, MO_32, tcg_gen_gvec_smin)
+TRANS(xvmin_d, gvec_xxx, MO_64, tcg_gen_gvec_smin)
+TRANS(xvmin_bu, gvec_xxx, MO_8, tcg_gen_gvec_umin)
+TRANS(xvmin_hu, gvec_xxx, MO_16, tcg_gen_gvec_umin)
+TRANS(xvmin_wu, gvec_xxx, MO_32, tcg_gen_gvec_umin)
+TRANS(xvmin_du, gvec_xxx, MO_64, tcg_gen_gvec_umin)
+
+static void do_xvmini_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_smin_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vmini_s,
+ .fnoi = gen_helper_xvmini_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vmini_s,
+ .fnoi = gen_helper_xvmini_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vmini_s,
+ .fnoi = gen_helper_xvmini_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vmini_s,
+ .fnoi = gen_helper_xvmini_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+static void do_xvmini_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_umin_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vmini_u,
+ .fnoi = gen_helper_xvmini_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vmini_u,
+ .fnoi = gen_helper_xvmini_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vmini_u,
+ .fnoi = gen_helper_xvmini_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vmini_u,
+ .fnoi = gen_helper_xvmini_du,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(xvmini_b, gvec_xx_i, MO_8, do_xvmini_s)
+TRANS(xvmini_h, gvec_xx_i, MO_16, do_xvmini_s)
+TRANS(xvmini_w, gvec_xx_i, MO_32, do_xvmini_s)
+TRANS(xvmini_d, gvec_xx_i, MO_64, do_xvmini_s)
+TRANS(xvmini_bu, gvec_xx_i, MO_8, do_xvmini_u)
+TRANS(xvmini_hu, gvec_xx_i, MO_16, do_xvmini_u)
+TRANS(xvmini_wu, gvec_xx_i, MO_32, do_xvmini_u)
+TRANS(xvmini_du, gvec_xx_i, MO_64, do_xvmini_u)
+
+static void do_xvmaxi_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_smax_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vmaxi_s,
+ .fnoi = gen_helper_xvmaxi_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vmaxi_s,
+ .fnoi = gen_helper_xvmaxi_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vmaxi_s,
+ .fnoi = gen_helper_xvmaxi_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vmaxi_s,
+ .fnoi = gen_helper_xvmaxi_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+static void do_xvmaxi_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_umax_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vmaxi_u,
+ .fnoi = gen_helper_xvmaxi_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vmaxi_u,
+ .fnoi = gen_helper_xvmaxi_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vmaxi_u,
+ .fnoi = gen_helper_xvmaxi_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vmaxi_u,
+ .fnoi = gen_helper_xvmaxi_du,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(xvmaxi_b, gvec_xx_i, MO_8, do_xvmaxi_s)
+TRANS(xvmaxi_h, gvec_xx_i, MO_16, do_xvmaxi_s)
+TRANS(xvmaxi_w, gvec_xx_i, MO_32, do_xvmaxi_s)
+TRANS(xvmaxi_d, gvec_xx_i, MO_64, do_xvmaxi_s)
+TRANS(xvmaxi_bu, gvec_xx_i, MO_8, do_xvmaxi_u)
+TRANS(xvmaxi_hu, gvec_xx_i, MO_16, do_xvmaxi_u)
+TRANS(xvmaxi_wu, gvec_xx_i, MO_32, do_xvmaxi_u)
+TRANS(xvmaxi_du, gvec_xx_i, MO_64, do_xvmaxi_u)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index f8a17f262a..29666f7925 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1313,6 +1313,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx .... ........ ..... ..... xj:5 xd:5 &xx
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
@xr .... ........ ..... ..... rj:5 xd:5 &xr
+@xx_i5 .... ........ ..... imm:s5 xj:5 xd:5 &xx_i
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
@@ -1452,6 +1453,42 @@ xvadda_h 0111 01000101 11001 ..... ..... ..... @xxx
xvadda_w 0111 01000101 11010 ..... ..... ..... @xxx
xvadda_d 0111 01000101 11011 ..... ..... ..... @xxx
+xvmax_b 0111 01000111 00000 ..... ..... ..... @xxx
+xvmax_h 0111 01000111 00001 ..... ..... ..... @xxx
+xvmax_w 0111 01000111 00010 ..... ..... ..... @xxx
+xvmax_d 0111 01000111 00011 ..... ..... ..... @xxx
+xvmax_bu 0111 01000111 01000 ..... ..... ..... @xxx
+xvmax_hu 0111 01000111 01001 ..... ..... ..... @xxx
+xvmax_wu 0111 01000111 01010 ..... ..... ..... @xxx
+xvmax_du 0111 01000111 01011 ..... ..... ..... @xxx
+
+xvmaxi_b 0111 01101001 00000 ..... ..... ..... @xx_i5
+xvmaxi_h 0111 01101001 00001 ..... ..... ..... @xx_i5
+xvmaxi_w 0111 01101001 00010 ..... ..... ..... @xx_i5
+xvmaxi_d 0111 01101001 00011 ..... ..... ..... @xx_i5
+xvmaxi_bu 0111 01101001 01000 ..... ..... ..... @xx_ui5
+xvmaxi_hu 0111 01101001 01001 ..... ..... ..... @xx_ui5
+xvmaxi_wu 0111 01101001 01010 ..... ..... ..... @xx_ui5
+xvmaxi_du 0111 01101001 01011 ..... ..... ..... @xx_ui5
+
+xvmin_b 0111 01000111 00100 ..... ..... ..... @xxx
+xvmin_h 0111 01000111 00101 ..... ..... ..... @xxx
+xvmin_w 0111 01000111 00110 ..... ..... ..... @xxx
+xvmin_d 0111 01000111 00111 ..... ..... ..... @xxx
+xvmin_bu 0111 01000111 01100 ..... ..... ..... @xxx
+xvmin_hu 0111 01000111 01101 ..... ..... ..... @xxx
+xvmin_wu 0111 01000111 01110 ..... ..... ..... @xxx
+xvmin_du 0111 01000111 01111 ..... ..... ..... @xxx
+
+xvmini_b 0111 01101001 00100 ..... ..... ..... @xx_i5
+xvmini_h 0111 01101001 00101 ..... ..... ..... @xx_i5
+xvmini_w 0111 01101001 00110 ..... ..... ..... @xx_i5
+xvmini_d 0111 01101001 00111 ..... ..... ..... @xx_i5
+xvmini_bu 0111 01101001 01100 ..... ..... ..... @xx_ui5
+xvmini_hu 0111 01101001 01101 ..... ..... ..... @xx_ui5
+xvmini_wu 0111 01101001 01110 ..... ..... ..... @xx_ui5
+xvmini_du 0111 01101001 01111 ..... ..... ..... @xx_ui5
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 52c230a681..486cf9f7f1 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -363,3 +363,33 @@ XDO_VADDA(xvadda_b, 8, XB, DO_VABS)
XDO_VADDA(xvadda_h, 16, XH, DO_VABS)
XDO_VADDA(xvadda_w, 32, XW, DO_VABS)
XDO_VADDA(xvadda_d, 64, XD, DO_VABS)
+
+#define XVMINMAXI(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, uint64_t imm, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ typedef __typeof(Xd->E(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i), (TD)imm); \
+ } \
+}
+
+XVMINMAXI(xvmini_b, 8, XB, DO_MIN)
+XVMINMAXI(xvmini_h, 16, XH, DO_MIN)
+XVMINMAXI(xvmini_w, 32, XW, DO_MIN)
+XVMINMAXI(xvmini_d, 64, XD, DO_MIN)
+XVMINMAXI(xvmaxi_b, 8, XB, DO_MAX)
+XVMINMAXI(xvmaxi_h, 16, XH, DO_MAX)
+XVMINMAXI(xvmaxi_w, 32, XW, DO_MAX)
+XVMINMAXI(xvmaxi_d, 64, XD, DO_MAX)
+XVMINMAXI(xvmini_bu, 8, UXB, DO_MIN)
+XVMINMAXI(xvmini_hu, 16, UXH, DO_MIN)
+XVMINMAXI(xvmini_wu, 32, UXW, DO_MIN)
+XVMINMAXI(xvmini_du, 64, UXD, DO_MIN)
+XVMINMAXI(xvmaxi_bu, 8, UXB, DO_MAX)
+XVMINMAXI(xvmaxi_hu, 16, UXH, DO_MAX)
+XVMINMAXI(xvmaxi_wu, 32, UXW, DO_MAX)
+XVMINMAXI(xvmaxi_du, 64, UXD, DO_MAX)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 72120c04a4..192cdb164c 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -335,9 +335,6 @@ DO_VADDA(vadda_h, 16, H, DO_VABS)
DO_VADDA(vadda_w, 32, W, DO_VABS)
DO_VADDA(vadda_d, 64, D, DO_VABS)
-#define DO_MIN(a, b) (a < b ? a : b)
-#define DO_MAX(a, b) (a > b ? a : b)
-
#define VMINMAXI(NAME, BIT, E, DO_OP) \
void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
{ \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 20b86c3119..96f216d569 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -58,4 +58,7 @@
#define DO_VABS(a) ((a < 0) ? (-a) : (a))
+#define DO_MIN(a, b) (a < b ? a : b)
+#define DO_MAX(a, b) (a > b ? a : b)
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 15/46] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od}
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (13 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 14/46] target/loongarch: Implement xvmax/xvmin Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 16/46] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
` (30 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVMUL.{B/H/W/D};
- XVMUH.{B/H/W/D}[U];
- XVMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 38 +++
target/loongarch/helper.h | 30 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 311 +++++++++++++++++++
target/loongarch/insns.decode | 38 +++
target/loongarch/lasx_helper.c | 74 +++++
target/loongarch/lsx_helper.c | 2 -
target/loongarch/vec.h | 2 +
7 files changed, 493 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ff22fcb90e..e7c46bc3a2 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1889,6 +1889,44 @@ INSN_LASX(xvmini_hu, xx_i)
INSN_LASX(xvmini_wu, xx_i)
INSN_LASX(xvmini_du, xx_i)
+INSN_LASX(xvmul_b, xxx)
+INSN_LASX(xvmul_h, xxx)
+INSN_LASX(xvmul_w, xxx)
+INSN_LASX(xvmul_d, xxx)
+INSN_LASX(xvmuh_b, xxx)
+INSN_LASX(xvmuh_h, xxx)
+INSN_LASX(xvmuh_w, xxx)
+INSN_LASX(xvmuh_d, xxx)
+INSN_LASX(xvmuh_bu, xxx)
+INSN_LASX(xvmuh_hu, xxx)
+INSN_LASX(xvmuh_wu, xxx)
+INSN_LASX(xvmuh_du, xxx)
+
+INSN_LASX(xvmulwev_h_b, xxx)
+INSN_LASX(xvmulwev_w_h, xxx)
+INSN_LASX(xvmulwev_d_w, xxx)
+INSN_LASX(xvmulwev_q_d, xxx)
+INSN_LASX(xvmulwod_h_b, xxx)
+INSN_LASX(xvmulwod_w_h, xxx)
+INSN_LASX(xvmulwod_d_w, xxx)
+INSN_LASX(xvmulwod_q_d, xxx)
+INSN_LASX(xvmulwev_h_bu, xxx)
+INSN_LASX(xvmulwev_w_hu, xxx)
+INSN_LASX(xvmulwev_d_wu, xxx)
+INSN_LASX(xvmulwev_q_du, xxx)
+INSN_LASX(xvmulwod_h_bu, xxx)
+INSN_LASX(xvmulwod_w_hu, xxx)
+INSN_LASX(xvmulwod_d_wu, xxx)
+INSN_LASX(xvmulwod_q_du, xxx)
+INSN_LASX(xvmulwev_h_bu_b, xxx)
+INSN_LASX(xvmulwev_w_hu_h, xxx)
+INSN_LASX(xvmulwev_d_wu_w, xxx)
+INSN_LASX(xvmulwev_q_du_d, xxx)
+INSN_LASX(xvmulwod_h_bu_b, xxx)
+INSN_LASX(xvmulwod_w_hu_h, xxx)
+INSN_LASX(xvmulwod_d_wu_w, xxx)
+INSN_LASX(xvmulwod_q_du_d, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d5ebc0b963..88ae707027 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -809,3 +809,33 @@ DEF_HELPER_FLAGS_4(xvmaxi_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvmaxi_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvmaxi_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvmaxi_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(xvmuh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmuh_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmulwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmulwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmulwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index cdf3dcc161..d57d867f17 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1052,6 +1052,317 @@ TRANS(xvmaxi_hu, gvec_xx_i, MO_16, do_xvmaxi_u)
TRANS(xvmaxi_wu, gvec_xx_i, MO_32, do_xvmaxi_u)
TRANS(xvmaxi_du, gvec_xx_i, MO_64, do_xvmaxi_u)
+TRANS(xvmul_b, gvec_xxx, MO_8, tcg_gen_gvec_mul)
+TRANS(xvmul_h, gvec_xxx, MO_16, tcg_gen_gvec_mul)
+TRANS(xvmul_w, gvec_xxx, MO_32, tcg_gen_gvec_mul)
+TRANS(xvmul_d, gvec_xxx, MO_64, tcg_gen_gvec_mul)
+
+static void do_xvmuh_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const GVecGen3 op[4] = {
+ {
+ .fno = gen_helper_xvmuh_b,
+ .vece = MO_8
+ },
+ {
+ .fno = gen_helper_xvmuh_h,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmuh_w,
+ .fno = gen_helper_xvmuh_w,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmuh_d,
+ .fno = gen_helper_xvmuh_d,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmuh_b, gvec_xxx, MO_8, do_xvmuh_s)
+TRANS(xvmuh_h, gvec_xxx, MO_16, do_xvmuh_s)
+TRANS(xvmuh_w, gvec_xxx, MO_32, do_xvmuh_s)
+TRANS(xvmuh_d, gvec_xxx, MO_64, do_xvmuh_s)
+
+static void do_xvmuh_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const GVecGen3 op[4] = {
+ {
+ .fno = gen_helper_xvmuh_bu,
+ .vece = MO_8
+ },
+ {
+ .fno = gen_helper_xvmuh_hu,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmuh_wu,
+ .fno = gen_helper_xvmuh_wu,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmuh_du,
+ .fno = gen_helper_xvmuh_du,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmuh_bu, gvec_xxx, MO_8, do_xvmuh_u)
+TRANS(xvmuh_hu, gvec_xxx, MO_16, do_xvmuh_u)
+TRANS(xvmuh_wu, gvec_xxx, MO_32, do_xvmuh_u)
+TRANS(xvmuh_du, gvec_xxx, MO_64, do_xvmuh_u)
+
+static void do_xvmulwev_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmulwev_s,
+ .fno = gen_helper_xvmulwev_h_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmulwev_w_h,
+ .fniv = gen_vmulwev_s,
+ .fno = gen_helper_xvmulwev_w_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmulwev_d_w,
+ .fniv = gen_vmulwev_s,
+ .fno = gen_helper_xvmulwev_d_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmulwev_h_b, gvec_xxx, MO_8, do_xvmulwev_s)
+TRANS(xvmulwev_w_h, gvec_xxx, MO_16, do_xvmulwev_s)
+TRANS(xvmulwev_d_w, gvec_xxx, MO_32, do_xvmulwev_s)
+
+#define XVMUL_Q(NAME, FN, idx1, idx2) \
+static bool trans_## NAME(DisasContext *ctx, arg_xxx * a) \
+{ \
+ TCGv_i64 rh, rl, arg1, arg2; \
+ int i; \
+ \
+ rh = tcg_temp_new_i64(); \
+ rl = tcg_temp_new_i64(); \
+ arg1 = tcg_temp_new_i64(); \
+ arg2 = tcg_temp_new_i64(); \
+ \
+ for (i = 0; i < 2; i++) { \
+ get_xreg64(arg1, a->xj, idx1 + i * 2); \
+ get_xreg64(arg2, a->xk, idx2 + i * 2); \
+ \
+ tcg_gen_## FN ##_i64(rl, rh, arg1, arg2); \
+ \
+ set_xreg64(rh, a->xd, 1 + i * 2); \
+ set_xreg64(rl, a->xd, 0 + i * 2); \
+ } \
+ \
+ return true; \
+}
+
+XVMUL_Q(xvmulwev_q_d, muls2, 0, 0)
+XVMUL_Q(xvmulwod_q_d, muls2, 1, 1)
+XVMUL_Q(xvmulwev_q_du, mulu2, 0, 0)
+XVMUL_Q(xvmulwod_q_du, mulu2, 1, 1)
+XVMUL_Q(xvmulwev_q_du_d, mulus2, 0, 0)
+XVMUL_Q(xvmulwod_q_du_d, mulus2, 1, 1)
+
+static void do_xvmulwod_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmulwod_s,
+ .fno = gen_helper_xvmulwod_h_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmulwod_w_h,
+ .fniv = gen_vmulwod_s,
+ .fno = gen_helper_xvmulwod_w_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmulwod_d_w,
+ .fniv = gen_vmulwod_s,
+ .fno = gen_helper_xvmulwod_d_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+TRANS(xvmulwod_h_b, gvec_xxx, MO_8, do_xvmulwod_s)
+TRANS(xvmulwod_w_h, gvec_xxx, MO_16, do_xvmulwod_s)
+TRANS(xvmulwod_d_w, gvec_xxx, MO_32, do_xvmulwod_s)
+
+static void do_xvmulwev_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_mul_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmulwev_u,
+ .fno = gen_helper_xvmulwev_h_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmulwev_w_hu,
+ .fniv = gen_vmulwev_u,
+ .fno = gen_helper_xvmulwev_w_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmulwev_d_wu,
+ .fniv = gen_vmulwev_u,
+ .fno = gen_helper_xvmulwev_d_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+TRANS(xvmulwev_h_bu, gvec_xxx, MO_8, do_xvmulwev_u)
+TRANS(xvmulwev_w_hu, gvec_xxx, MO_16, do_xvmulwev_u)
+TRANS(xvmulwev_d_wu, gvec_xxx, MO_32, do_xvmulwev_u)
+
+static void do_xvmulwod_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_mul_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmulwod_u,
+ .fno = gen_helper_xvmulwod_h_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmulwod_w_hu,
+ .fniv = gen_vmulwod_u,
+ .fno = gen_helper_xvmulwod_w_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmulwod_d_wu,
+ .fniv = gen_vmulwod_u,
+ .fno = gen_helper_xvmulwod_d_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+TRANS(xvmulwod_h_bu, gvec_xxx, MO_8, do_xvmulwod_u)
+TRANS(xvmulwod_w_hu, gvec_xxx, MO_16, do_xvmulwod_u)
+TRANS(xvmulwod_d_wu, gvec_xxx, MO_32, do_xvmulwod_u)
+
+static void do_xvmulwev_u_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmulwev_u_s,
+ .fno = gen_helper_xvmulwev_h_bu_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmulwev_w_hu_h,
+ .fniv = gen_vmulwev_u_s,
+ .fno = gen_helper_xvmulwev_w_hu_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmulwev_d_wu_w,
+ .fniv = gen_vmulwev_u_s,
+ .fno = gen_helper_xvmulwev_d_wu_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+TRANS(xvmulwev_h_bu_b, gvec_xxx, MO_8, do_xvmulwev_u_s)
+TRANS(xvmulwev_w_hu_h, gvec_xxx, MO_16, do_xvmulwev_u_s)
+TRANS(xvmulwev_d_wu_w, gvec_xxx, MO_32, do_xvmulwev_u_s)
+
+static void do_xvmulwod_u_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_sari_vec, INDEX_op_mul_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmulwod_u_s,
+ .fno = gen_helper_xvmulwod_h_bu_b,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmulwod_w_hu_h,
+ .fniv = gen_vmulwod_u_s,
+ .fno = gen_helper_xvmulwod_w_hu_h,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmulwod_d_wu_w,
+ .fniv = gen_vmulwod_u_s,
+ .fno = gen_helper_xvmulwod_d_wu_w,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+TRANS(xvmulwod_h_bu_b, gvec_xxx, MO_8, do_xvmulwod_u_s)
+TRANS(xvmulwod_w_hu_h, gvec_xxx, MO_16, do_xvmulwod_u_s)
+TRANS(xvmulwod_d_wu_w, gvec_xxx, MO_32, do_xvmulwod_u_s)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 29666f7925..872eeed7a8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1489,6 +1489,44 @@ xvmini_hu 0111 01101001 01101 ..... ..... ..... @xx_ui5
xvmini_wu 0111 01101001 01110 ..... ..... ..... @xx_ui5
xvmini_du 0111 01101001 01111 ..... ..... ..... @xx_ui5
+xvmul_b 0111 01001000 01000 ..... ..... ..... @xxx
+xvmul_h 0111 01001000 01001 ..... ..... ..... @xxx
+xvmul_w 0111 01001000 01010 ..... ..... ..... @xxx
+xvmul_d 0111 01001000 01011 ..... ..... ..... @xxx
+xvmuh_b 0111 01001000 01100 ..... ..... ..... @xxx
+xvmuh_h 0111 01001000 01101 ..... ..... ..... @xxx
+xvmuh_w 0111 01001000 01110 ..... ..... ..... @xxx
+xvmuh_d 0111 01001000 01111 ..... ..... ..... @xxx
+xvmuh_bu 0111 01001000 10000 ..... ..... ..... @xxx
+xvmuh_hu 0111 01001000 10001 ..... ..... ..... @xxx
+xvmuh_wu 0111 01001000 10010 ..... ..... ..... @xxx
+xvmuh_du 0111 01001000 10011 ..... ..... ..... @xxx
+
+xvmulwev_h_b 0111 01001001 00000 ..... ..... ..... @xxx
+xvmulwev_w_h 0111 01001001 00001 ..... ..... ..... @xxx
+xvmulwev_d_w 0111 01001001 00010 ..... ..... ..... @xxx
+xvmulwev_q_d 0111 01001001 00011 ..... ..... ..... @xxx
+xvmulwod_h_b 0111 01001001 00100 ..... ..... ..... @xxx
+xvmulwod_w_h 0111 01001001 00101 ..... ..... ..... @xxx
+xvmulwod_d_w 0111 01001001 00110 ..... ..... ..... @xxx
+xvmulwod_q_d 0111 01001001 00111 ..... ..... ..... @xxx
+xvmulwev_h_bu 0111 01001001 10000 ..... ..... ..... @xxx
+xvmulwev_w_hu 0111 01001001 10001 ..... ..... ..... @xxx
+xvmulwev_d_wu 0111 01001001 10010 ..... ..... ..... @xxx
+xvmulwev_q_du 0111 01001001 10011 ..... ..... ..... @xxx
+xvmulwod_h_bu 0111 01001001 10100 ..... ..... ..... @xxx
+xvmulwod_w_hu 0111 01001001 10101 ..... ..... ..... @xxx
+xvmulwod_d_wu 0111 01001001 10110 ..... ..... ..... @xxx
+xvmulwod_q_du 0111 01001001 10111 ..... ..... ..... @xxx
+xvmulwev_h_bu_b 0111 01001010 00000 ..... ..... ..... @xxx
+xvmulwev_w_hu_h 0111 01001010 00001 ..... ..... ..... @xxx
+xvmulwev_d_wu_w 0111 01001010 00010 ..... ..... ..... @xxx
+xvmulwev_q_du_d 0111 01001010 00011 ..... ..... ..... @xxx
+xvmulwod_h_bu_b 0111 01001010 00100 ..... ..... ..... @xxx
+xvmulwod_w_hu_h 0111 01001010 00101 ..... ..... ..... @xxx
+xvmulwod_d_wu_w 0111 01001010 00110 ..... ..... ..... @xxx
+xvmulwod_q_du_d 0111 01001010 00111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 486cf9f7f1..4c342b06e5 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -393,3 +393,77 @@ XVMINMAXI(xvmaxi_bu, 8, UXB, DO_MAX)
XVMINMAXI(xvmaxi_hu, 16, UXH, DO_MAX)
XVMINMAXI(xvmaxi_wu, 32, UXW, DO_MAX)
XVMINMAXI(xvmaxi_du, 64, UXD, DO_MAX)
+
+#define DO_XVMUH(NAME, BIT, E1, E2) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->E1(0)) T; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E2(i) = ((T)Xj->E2(i)) * ((T)Xk->E2(i)) >> BIT; \
+ } \
+}
+
+void HELPER(xvmuh_d)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ uint64_t l, h;
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+ int i;
+
+ for (i = 0; i < 4; i++) {
+ muls64(&l, &h, Xj->XD(i), Xk->XD(i));
+ Xd->XD(i) = h;
+ }
+}
+
+DO_XVMUH(xvmuh_b, 8, XH, XB)
+DO_XVMUH(xvmuh_h, 16, XW, XH)
+DO_XVMUH(xvmuh_w, 32, XD, XW)
+
+void HELPER(xvmuh_du)(void *xd, void *xj, void *xk, uint32_t v)
+{
+ uint64_t l, h;
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+ XReg *Xk = (XReg *)xk;
+ int i;
+
+ for (i = 0; i < 4; i++) {
+ mulu64(&l, &h, Xj->XD(i), Xk->XD(i));
+ Xd->XD(i) = h;
+ }
+}
+
+DO_XVMUH(xvmuh_bu, 8, UXH, UXB)
+DO_XVMUH(xvmuh_hu, 16, UXW, UXH)
+DO_XVMUH(xvmuh_wu, 32, UXD, UXW)
+
+XDO_EVEN(xvmulwev_h_b, 16, XH, XB, DO_MUL)
+XDO_EVEN(xvmulwev_w_h, 32, XW, XH, DO_MUL)
+XDO_EVEN(xvmulwev_d_w, 64, XD, XW, DO_MUL)
+
+XDO_ODD(xvmulwod_h_b, 16, XH, XB, DO_MUL)
+XDO_ODD(xvmulwod_w_h, 32, XW, XH, DO_MUL)
+XDO_ODD(xvmulwod_d_w, 64, XD, XW, DO_MUL)
+
+XDO_EVEN(xvmulwev_h_bu, 16, UXH, UXB, DO_MUL)
+XDO_EVEN(xvmulwev_w_hu, 32, UXW, UXH, DO_MUL)
+XDO_EVEN(xvmulwev_d_wu, 64, UXD, UXW, DO_MUL)
+
+XDO_ODD(xvmulwod_h_bu, 16, UXH, UXB, DO_MUL)
+XDO_ODD(xvmulwod_w_hu, 32, UXW, UXH, DO_MUL)
+XDO_ODD(xvmulwod_d_wu, 64, UXD, UXW, DO_MUL)
+
+XDO_EVEN_U_S(xvmulwev_h_bu_b, 16, XH, UXH, XB, UXB, DO_MUL)
+XDO_EVEN_U_S(xvmulwev_w_hu_h, 32, XW, UXW, XH, UXH, DO_MUL)
+XDO_EVEN_U_S(xvmulwev_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
+
+XDO_ODD_U_S(xvmulwod_h_bu_b, 16, XH, UXH, XB, UXB, DO_MUL)
+XDO_ODD_U_S(xvmulwod_w_hu_h, 32, XW, UXW, XH, UXH, DO_MUL)
+XDO_ODD_U_S(xvmulwod_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 192cdb164c..d384fbef3a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -415,8 +415,6 @@ DO_VMUH(vmuh_bu, 8, UH, UB, DO_MUH)
DO_VMUH(vmuh_hu, 16, UW, UH, DO_MUH)
DO_VMUH(vmuh_wu, 32, UD, UW, DO_MUH)
-#define DO_MUL(a, b) (a * b)
-
DO_EVEN(vmulwev_h_b, 16, H, B, DO_MUL)
DO_EVEN(vmulwev_w_h, 32, W, H, DO_MUL)
DO_EVEN(vmulwev_d_w, 64, D, W, DO_MUL)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 96f216d569..e3dbf0f893 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -61,4 +61,6 @@
#define DO_MIN(a, b) (a < b ? a : b)
#define DO_MAX(a, b) (a > b ? a : b)
+#define DO_MUL(a, b) (a * b)
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 16/46] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (14 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 15/46] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 17/46] target/loongarch; Implement xvdiv/xvmod Song Gao
` (29 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVMADD.{B/H/W/D};
- XVMSUB.{B/H/W/D};
- XVMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 34 ++
target/loongarch/helper.h | 30 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 367 +++++++++++++++++++
target/loongarch/insns.decode | 34 ++
target/loongarch/lasx_helper.c | 104 ++++++
target/loongarch/vec.h | 3 +
6 files changed, 572 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e7c46bc3a2..ddfc4921b9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1927,6 +1927,40 @@ INSN_LASX(xvmulwod_w_hu_h, xxx)
INSN_LASX(xvmulwod_d_wu_w, xxx)
INSN_LASX(xvmulwod_q_du_d, xxx)
+INSN_LASX(xvmadd_b, xxx)
+INSN_LASX(xvmadd_h, xxx)
+INSN_LASX(xvmadd_w, xxx)
+INSN_LASX(xvmadd_d, xxx)
+INSN_LASX(xvmsub_b, xxx)
+INSN_LASX(xvmsub_h, xxx)
+INSN_LASX(xvmsub_w, xxx)
+INSN_LASX(xvmsub_d, xxx)
+
+INSN_LASX(xvmaddwev_h_b, xxx)
+INSN_LASX(xvmaddwev_w_h, xxx)
+INSN_LASX(xvmaddwev_d_w, xxx)
+INSN_LASX(xvmaddwev_q_d, xxx)
+INSN_LASX(xvmaddwod_h_b, xxx)
+INSN_LASX(xvmaddwod_w_h, xxx)
+INSN_LASX(xvmaddwod_d_w, xxx)
+INSN_LASX(xvmaddwod_q_d, xxx)
+INSN_LASX(xvmaddwev_h_bu, xxx)
+INSN_LASX(xvmaddwev_w_hu, xxx)
+INSN_LASX(xvmaddwev_d_wu, xxx)
+INSN_LASX(xvmaddwev_q_du, xxx)
+INSN_LASX(xvmaddwod_h_bu, xxx)
+INSN_LASX(xvmaddwod_w_hu, xxx)
+INSN_LASX(xvmaddwod_d_wu, xxx)
+INSN_LASX(xvmaddwod_q_du, xxx)
+INSN_LASX(xvmaddwev_h_bu_b, xxx)
+INSN_LASX(xvmaddwev_w_hu_h, xxx)
+INSN_LASX(xvmaddwev_d_wu_w, xxx)
+INSN_LASX(xvmaddwev_q_du_d, xxx)
+INSN_LASX(xvmaddwod_h_bu_b, xxx)
+INSN_LASX(xvmaddwod_w_hu_h, xxx)
+INSN_LASX(xvmaddwod_d_wu_w, xxx)
+INSN_LASX(xvmaddwod_q_du_d, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 88ae707027..0dc4cc18da 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -839,3 +839,33 @@ DEF_HELPER_FLAGS_4(xvmulwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvmulwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvmulwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvmulwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmadd_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmadd_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmadd_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmsub_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmsub_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmsub_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwev_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmaddwev_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwev_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwev_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_h_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_w_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_d_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(xvmaddwev_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwev_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index d57d867f17..78ba31b8c2 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1363,6 +1363,373 @@ TRANS(xvmulwod_h_bu_b, gvec_xxx, MO_8, do_xvmulwod_u_s)
TRANS(xvmulwod_w_hu_h, gvec_xxx, MO_16, do_xvmulwod_u_s)
TRANS(xvmulwod_d_wu_w, gvec_xxx, MO_32, do_xvmulwod_u_s)
+static void do_xvmadd(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vmadd,
+ .fno = gen_helper_xvmadd_b,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vmadd,
+ .fno = gen_helper_xvmadd_h,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmadd_w,
+ .fniv = gen_vmadd,
+ .fno = gen_helper_xvmadd_w,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmadd_d,
+ .fniv = gen_vmadd,
+ .fno = gen_helper_xvmadd_d,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmadd_b, gvec_xxx, MO_8, do_xvmadd)
+TRANS(xvmadd_h, gvec_xxx, MO_16, do_xvmadd)
+TRANS(xvmadd_w, gvec_xxx, MO_32, do_xvmadd)
+TRANS(xvmadd_d, gvec_xxx, MO_64, do_xvmadd)
+
+static void do_xvmsub(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_mul_vec, INDEX_op_sub_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vmsub,
+ .fno = gen_helper_xvmsub_b,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vmsub,
+ .fno = gen_helper_xvmsub_h,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmsub_w,
+ .fniv = gen_vmsub,
+ .fno = gen_helper_xvmsub_w,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmsub_d,
+ .fniv = gen_vmsub,
+ .fno = gen_helper_xvmsub_d,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmsub_b, gvec_xxx, MO_8, do_xvmsub)
+TRANS(xvmsub_h, gvec_xxx, MO_16, do_xvmsub)
+TRANS(xvmsub_w, gvec_xxx, MO_32, do_xvmsub)
+TRANS(xvmsub_d, gvec_xxx, MO_64, do_xvmsub)
+
+static void do_xvmaddwev_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec,
+ INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmaddwev_s,
+ .fno = gen_helper_xvmaddwev_h_b,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmaddwev_w_h,
+ .fniv = gen_vmaddwev_s,
+ .fno = gen_helper_xvmaddwev_w_h,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmaddwev_d_w,
+ .fniv = gen_vmaddwev_s,
+ .fno = gen_helper_xvmaddwev_d_w,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmaddwev_h_b, gvec_xxx, MO_8, do_xvmaddwev_s)
+TRANS(xvmaddwev_w_h, gvec_xxx, MO_16, do_xvmaddwev_s)
+TRANS(xvmaddwev_d_w, gvec_xxx, MO_32, do_xvmaddwev_s)
+
+#define XVMADD_Q(NAME, FN, idx1, idx2) \
+static bool trans_## NAME(DisasContext *ctx, arg_xxx * a) \
+{ \
+ TCGv_i64 rh, rl, arg1, arg2, th, tl; \
+ int i; \
+ \
+ rh = tcg_temp_new_i64(); \
+ rl = tcg_temp_new_i64(); \
+ arg1 = tcg_temp_new_i64(); \
+ arg2 = tcg_temp_new_i64(); \
+ th = tcg_temp_new_i64(); \
+ tl = tcg_temp_new_i64(); \
+ \
+ for (i = 0; i < 2; i++) { \
+ get_xreg64(arg1, a->xj, idx1 + i * 2); \
+ get_xreg64(arg2, a->xk, idx2 + i * 2); \
+ get_xreg64(rh, a->xd, 1 + i * 2); \
+ get_xreg64(rl, a->xd, 0 + i * 2); \
+ \
+ tcg_gen_## FN ##_i64(tl, th, arg1, arg2); \
+ tcg_gen_add2_i64(rl, rh, rl, rh, tl, th); \
+ \
+ set_xreg64(rh, a->xd, 1 + i * 2); \
+ set_xreg64(rl, a->xd, 0 + i * 2); \
+ } \
+ \
+ return true; \
+}
+
+XVMADD_Q(xvmaddwev_q_d, muls2, 0, 0)
+XVMADD_Q(xvmaddwod_q_d, muls2, 1, 1)
+XVMADD_Q(xvmaddwev_q_du, mulu2, 0, 0)
+XVMADD_Q(xvmaddwod_q_du, mulu2, 1, 1)
+XVMADD_Q(xvmaddwev_q_du_d, mulus2, 0, 0)
+XVMADD_Q(xvmaddwod_q_du_d, mulus2, 1, 1)
+
+static void do_xvmaddwod_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_sari_vec, INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmaddwod_s,
+ .fno = gen_helper_xvmaddwod_h_b,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmaddwod_w_h,
+ .fniv = gen_vmaddwod_s,
+ .fno = gen_helper_xvmaddwod_w_h,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmaddwod_d_w,
+ .fniv = gen_vmaddwod_s,
+ .fno = gen_helper_xvmaddwod_d_w,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmaddwod_h_b, gvec_xxx, MO_8, do_xvmaddwod_s)
+TRANS(xvmaddwod_w_h, gvec_xxx, MO_16, do_xvmaddwod_s)
+TRANS(xvmaddwod_d_w, gvec_xxx, MO_32, do_xvmaddwod_s)
+
+static void do_xvmaddwev_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmaddwev_u,
+ .fno = gen_helper_xvmaddwev_h_bu,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmaddwev_w_hu,
+ .fniv = gen_vmaddwev_u,
+ .fno = gen_helper_xvmaddwev_w_hu,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmaddwev_d_wu,
+ .fniv = gen_vmaddwev_u,
+ .fno = gen_helper_xvmaddwev_d_wu,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmaddwev_h_bu, gvec_xxx, MO_8, do_xvmaddwev_u)
+TRANS(xvmaddwev_w_hu, gvec_xxx, MO_16, do_xvmaddwev_u)
+TRANS(xvmaddwev_d_wu, gvec_xxx, MO_32, do_xvmaddwev_u)
+
+static void do_xvmaddwod_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmaddwod_u,
+ .fno = gen_helper_xvmaddwod_h_bu,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmaddwod_w_hu,
+ .fniv = gen_vmaddwod_u,
+ .fno = gen_helper_xvmaddwod_w_hu,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmaddwod_d_wu,
+ .fniv = gen_vmaddwod_u,
+ .fno = gen_helper_xvmaddwod_d_wu,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmaddwod_h_bu, gvec_xxx, MO_8, do_xvmaddwod_u)
+TRANS(xvmaddwod_w_hu, gvec_xxx, MO_16, do_xvmaddwod_u)
+TRANS(xvmaddwod_d_wu, gvec_xxx, MO_32, do_xvmaddwod_u)
+
+static void do_xvmaddwev_u_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_sari_vec,
+ INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmaddwev_u_s,
+ .fno = gen_helper_xvmaddwev_h_bu_b,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmaddwev_w_hu_h,
+ .fniv = gen_vmaddwev_u_s,
+ .fno = gen_helper_xvmaddwev_w_hu_h,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmaddwev_d_wu_w,
+ .fniv = gen_vmaddwev_u_s,
+ .fno = gen_helper_xvmaddwev_d_wu_w,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmaddwev_h_bu_b, gvec_xxx, MO_8, do_xvmaddwev_u_s)
+TRANS(xvmaddwev_w_hu_h, gvec_xxx, MO_16, do_xvmaddwev_u_s)
+TRANS(xvmaddwev_d_wu_w, gvec_xxx, MO_32, do_xvmaddwev_u_s)
+
+static void do_xvmaddwod_u_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shri_vec, INDEX_op_sari_vec,
+ INDEX_op_mul_vec, INDEX_op_add_vec, 0
+ };
+ static const GVecGen3 op[3] = {
+ {
+ .fniv = gen_vmaddwod_u_s,
+ .fno = gen_helper_xvmaddwod_h_bu_b,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fni4 = gen_vmaddwod_w_hu_h,
+ .fniv = gen_vmaddwod_u_s,
+ .fno = gen_helper_xvmaddwod_w_hu_h,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fni8 = gen_vmaddwod_d_wu_w,
+ .fniv = gen_vmaddwod_u_s,
+ .fno = gen_helper_xvmaddwod_d_wu_w,
+ .load_dest = true,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvmaddwod_h_bu_b, gvec_xxx, MO_8, do_xvmaddwod_u_s)
+TRANS(xvmaddwod_w_hu_h, gvec_xxx, MO_16, do_xvmaddwod_u_s)
+TRANS(xvmaddwod_d_wu_w, gvec_xxx, MO_32, do_xvmaddwod_u_s)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 872eeed7a8..cc210314ff 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1527,6 +1527,40 @@ xvmulwod_w_hu_h 0111 01001010 00101 ..... ..... ..... @xxx
xvmulwod_d_wu_w 0111 01001010 00110 ..... ..... ..... @xxx
xvmulwod_q_du_d 0111 01001010 00111 ..... ..... ..... @xxx
+xvmadd_b 0111 01001010 10000 ..... ..... ..... @xxx
+xvmadd_h 0111 01001010 10001 ..... ..... ..... @xxx
+xvmadd_w 0111 01001010 10010 ..... ..... ..... @xxx
+xvmadd_d 0111 01001010 10011 ..... ..... ..... @xxx
+xvmsub_b 0111 01001010 10100 ..... ..... ..... @xxx
+xvmsub_h 0111 01001010 10101 ..... ..... ..... @xxx
+xvmsub_w 0111 01001010 10110 ..... ..... ..... @xxx
+xvmsub_d 0111 01001010 10111 ..... ..... ..... @xxx
+
+xvmaddwev_h_b 0111 01001010 11000 ..... ..... ..... @xxx
+xvmaddwev_w_h 0111 01001010 11001 ..... ..... ..... @xxx
+xvmaddwev_d_w 0111 01001010 11010 ..... ..... ..... @xxx
+xvmaddwev_q_d 0111 01001010 11011 ..... ..... ..... @xxx
+xvmaddwod_h_b 0111 01001010 11100 ..... ..... ..... @xxx
+xvmaddwod_w_h 0111 01001010 11101 ..... ..... ..... @xxx
+xvmaddwod_d_w 0111 01001010 11110 ..... ..... ..... @xxx
+xvmaddwod_q_d 0111 01001010 11111 ..... ..... ..... @xxx
+xvmaddwev_h_bu 0111 01001011 01000 ..... ..... ..... @xxx
+xvmaddwev_w_hu 0111 01001011 01001 ..... ..... ..... @xxx
+xvmaddwev_d_wu 0111 01001011 01010 ..... ..... ..... @xxx
+xvmaddwev_q_du 0111 01001011 01011 ..... ..... ..... @xxx
+xvmaddwod_h_bu 0111 01001011 01100 ..... ..... ..... @xxx
+xvmaddwod_w_hu 0111 01001011 01101 ..... ..... ..... @xxx
+xvmaddwod_d_wu 0111 01001011 01110 ..... ..... ..... @xxx
+xvmaddwod_q_du 0111 01001011 01111 ..... ..... ..... @xxx
+xvmaddwev_h_bu_b 0111 01001011 11000 ..... ..... ..... @xxx
+xvmaddwev_w_hu_h 0111 01001011 11001 ..... ..... ..... @xxx
+xvmaddwev_d_wu_w 0111 01001011 11010 ..... ..... ..... @xxx
+xvmaddwev_q_du_d 0111 01001011 11011 ..... ..... ..... @xxx
+xvmaddwod_h_bu_b 0111 01001011 11100 ..... ..... ..... @xxx
+xvmaddwod_w_hu_h 0111 01001011 11101 ..... ..... ..... @xxx
+xvmaddwod_d_wu_w 0111 01001011 11110 ..... ..... ..... @xxx
+xvmaddwod_q_du_d 0111 01001011 11111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 4c342b06e5..df85fa04f0 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -467,3 +467,107 @@ XDO_EVEN_U_S(xvmulwev_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
XDO_ODD_U_S(xvmulwod_h_bu_b, 16, XH, UXH, XB, UXB, DO_MUL)
XDO_ODD_U_S(xvmulwod_w_hu_h, 32, XW, UXW, XH, UXH, DO_MUL)
XDO_ODD_U_S(xvmulwod_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
+
+#define XVMADDSUB(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xd->E(i), Xj->E(i), Xk->E(i)); \
+ } \
+}
+
+XVMADDSUB(xvmadd_b, 8, XB, DO_MADD)
+XVMADDSUB(xvmadd_h, 16, XH, DO_MADD)
+XVMADDSUB(xvmadd_w, 32, XW, DO_MADD)
+XVMADDSUB(xvmadd_d, 64, XD, DO_MADD)
+XVMADDSUB(xvmsub_b, 8, XB, DO_MSUB)
+XVMADDSUB(xvmsub_h, 16, XH, DO_MSUB)
+XVMADDSUB(xvmsub_w, 32, XW, DO_MSUB)
+XVMADDSUB(xvmsub_d, 64, XD, DO_MSUB)
+
+#define XVMADDWEV(NAME, BIT, E1, E2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->E1(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) += DO_OP((TD)Xj->E2(2 * i), (TD)Xk->E2(2 * i)); \
+ } \
+}
+
+XVMADDWEV(xvmaddwev_h_b, 16, XH, XB, DO_MUL)
+XVMADDWEV(xvmaddwev_w_h, 32, XW, XH, DO_MUL)
+XVMADDWEV(xvmaddwev_d_w, 64, XD, XW, DO_MUL)
+XVMADDWEV(xvmaddwev_h_bu, 16, UXH, UXB, DO_MUL)
+XVMADDWEV(xvmaddwev_w_hu, 32, UXW, UXH, DO_MUL)
+XVMADDWEV(xvmaddwev_d_wu, 64, UXD, UXW, DO_MUL)
+
+#define XVMADDWOD(NAME, BIT, E1, E2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->E1(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) += DO_OP((TD)Xj->E2(2 * i + 1), \
+ (TD)Xk->E2(2 * i + 1)); \
+ } \
+}
+
+XVMADDWOD(xvmaddwod_h_b, 16, XH, XB, DO_MUL)
+XVMADDWOD(xvmaddwod_w_h, 32, XW, XH, DO_MUL)
+XVMADDWOD(xvmaddwod_d_w, 64, XD, XW, DO_MUL)
+XVMADDWOD(xvmaddwod_h_bu, 16, UXH, UXB, DO_MUL)
+XVMADDWOD(xvmaddwod_w_hu, 32, UXW, UXH, DO_MUL)
+XVMADDWOD(xvmaddwod_d_wu, 64, UXD, UXW, DO_MUL)
+
+#define XVMADDWEV_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->ES1(0)) TS1; \
+ typedef __typeof(Xd->EU1(0)) TU1; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->ES1(i) += DO_OP((TU1)Xj->EU2(2 * i), \
+ (TS1)Xk->ES2(2 * i)); \
+ } \
+}
+
+XVMADDWEV_U_S(xvmaddwev_h_bu_b, 16, XH, UXH, XB, UXB, DO_MUL)
+XVMADDWEV_U_S(xvmaddwev_w_hu_h, 32, XW, UXW, XH, UXH, DO_MUL)
+XVMADDWEV_U_S(xvmaddwev_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
+
+#define XVMADDWOD_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ typedef __typeof(Xd->ES1(0)) TS1; \
+ typedef __typeof(Xd->EU1(0)) TU1; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->ES1(i) += DO_OP((TU1)Xj->EU2(2 * i + 1), \
+ (TS1)Xk->ES2(2 * i + 1)); \
+ } \
+}
+
+XVMADDWOD_U_S(xvmaddwod_h_bu_b, 16, XH, UXH, XB, UXB, DO_MUL)
+XVMADDWOD_U_S(xvmaddwod_w_hu_h, 32, XW, UXW, XH, UXH, DO_MUL)
+XVMADDWOD_U_S(xvmaddwod_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index e3dbf0f893..06992410ad 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -63,4 +63,7 @@
#define DO_MUL(a, b) (a * b)
+#define DO_MADD(a, b, c) (a + b * c)
+#define DO_MSUB(a, b, c) (a - b * c)
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 17/46] target/loongarch; Implement xvdiv/xvmod
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (15 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 16/46] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 18/46] target/loongarch: Implement xvsat Song Gao
` (28 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVDIV.{B/H/W/D}[U];
- XVMOD.{B/H/W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 17 +++++++++++
target/loongarch/helper.h | 17 +++++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++++
target/loongarch/insns.decode | 17 +++++++++++
target/loongarch/lasx_helper.c | 30 ++++++++++++++++++++
target/loongarch/lsx_helper.c | 7 -----
target/loongarch/vec.h | 7 +++++
7 files changed, 105 insertions(+), 7 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ddfc4921b9..83efde440f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1961,6 +1961,23 @@ INSN_LASX(xvmaddwod_w_hu_h, xxx)
INSN_LASX(xvmaddwod_d_wu_w, xxx)
INSN_LASX(xvmaddwod_q_du_d, xxx)
+INSN_LASX(xvdiv_b, xxx)
+INSN_LASX(xvdiv_h, xxx)
+INSN_LASX(xvdiv_w, xxx)
+INSN_LASX(xvdiv_d, xxx)
+INSN_LASX(xvdiv_bu, xxx)
+INSN_LASX(xvdiv_hu, xxx)
+INSN_LASX(xvdiv_wu, xxx)
+INSN_LASX(xvdiv_du, xxx)
+INSN_LASX(xvmod_b, xxx)
+INSN_LASX(xvmod_h, xxx)
+INSN_LASX(xvmod_w, xxx)
+INSN_LASX(xvmod_d, xxx)
+INSN_LASX(xvmod_bu, xxx)
+INSN_LASX(xvmod_hu, xxx)
+INSN_LASX(xvmod_wu, xxx)
+INSN_LASX(xvmod_du, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 0dc4cc18da..95c7ecba3b 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -869,3 +869,20 @@ DEF_HELPER_FLAGS_4(xvmaddwev_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_4(xvdiv_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvdiv_du, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_wu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvmod_du, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 78ba31b8c2..930872c939 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1730,6 +1730,23 @@ TRANS(xvmaddwod_h_bu_b, gvec_xxx, MO_8, do_xvmaddwod_u_s)
TRANS(xvmaddwod_w_hu_h, gvec_xxx, MO_16, do_xvmaddwod_u_s)
TRANS(xvmaddwod_d_wu_w, gvec_xxx, MO_32, do_xvmaddwod_u_s)
+TRANS(xvdiv_b, gen_xxx, gen_helper_xvdiv_b)
+TRANS(xvdiv_h, gen_xxx, gen_helper_xvdiv_h)
+TRANS(xvdiv_w, gen_xxx, gen_helper_xvdiv_w)
+TRANS(xvdiv_d, gen_xxx, gen_helper_xvdiv_d)
+TRANS(xvdiv_bu, gen_xxx, gen_helper_xvdiv_bu)
+TRANS(xvdiv_hu, gen_xxx, gen_helper_xvdiv_hu)
+TRANS(xvdiv_wu, gen_xxx, gen_helper_xvdiv_wu)
+TRANS(xvdiv_du, gen_xxx, gen_helper_xvdiv_du)
+TRANS(xvmod_b, gen_xxx, gen_helper_xvmod_b)
+TRANS(xvmod_h, gen_xxx, gen_helper_xvmod_h)
+TRANS(xvmod_w, gen_xxx, gen_helper_xvmod_w)
+TRANS(xvmod_d, gen_xxx, gen_helper_xvmod_d)
+TRANS(xvmod_bu, gen_xxx, gen_helper_xvmod_bu)
+TRANS(xvmod_hu, gen_xxx, gen_helper_xvmod_hu)
+TRANS(xvmod_wu, gen_xxx, gen_helper_xvmod_wu)
+TRANS(xvmod_du, gen_xxx, gen_helper_xvmod_du)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index cc210314ff..0bd4e7709a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1561,6 +1561,23 @@ xvmaddwod_w_hu_h 0111 01001011 11101 ..... ..... ..... @xxx
xvmaddwod_d_wu_w 0111 01001011 11110 ..... ..... ..... @xxx
xvmaddwod_q_du_d 0111 01001011 11111 ..... ..... ..... @xxx
+xvdiv_b 0111 01001110 00000 ..... ..... ..... @xxx
+xvdiv_h 0111 01001110 00001 ..... ..... ..... @xxx
+xvdiv_w 0111 01001110 00010 ..... ..... ..... @xxx
+xvdiv_d 0111 01001110 00011 ..... ..... ..... @xxx
+xvmod_b 0111 01001110 00100 ..... ..... ..... @xxx
+xvmod_h 0111 01001110 00101 ..... ..... ..... @xxx
+xvmod_w 0111 01001110 00110 ..... ..... ..... @xxx
+xvmod_d 0111 01001110 00111 ..... ..... ..... @xxx
+xvdiv_bu 0111 01001110 01000 ..... ..... ..... @xxx
+xvdiv_hu 0111 01001110 01001 ..... ..... ..... @xxx
+xvdiv_wu 0111 01001110 01010 ..... ..... ..... @xxx
+xvdiv_du 0111 01001110 01011 ..... ..... ..... @xxx
+xvmod_bu 0111 01001110 01100 ..... ..... ..... @xxx
+xvmod_hu 0111 01001110 01101 ..... ..... ..... @xxx
+xvmod_wu 0111 01001110 01110 ..... ..... ..... @xxx
+xvmod_du 0111 01001110 01111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index df85fa04f0..d4a4a7659a 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -571,3 +571,33 @@ void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
XVMADDWOD_U_S(xvmaddwod_h_bu_b, 16, XH, UXH, XB, UXB, DO_MUL)
XVMADDWOD_U_S(xvmaddwod_w_hu_h, 32, XW, UXW, XH, UXH, DO_MUL)
XVMADDWOD_U_S(xvmaddwod_d_wu_w, 64, XD, UXD, XW, UXW, DO_MUL)
+
+#define XVDIV(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i), Xk->E(i)); \
+ } \
+}
+
+XVDIV(xvdiv_b, 8, XB, DO_DIV)
+XVDIV(xvdiv_h, 16, XH, DO_DIV)
+XVDIV(xvdiv_w, 32, XW, DO_DIV)
+XVDIV(xvdiv_d, 64, XD, DO_DIV)
+XVDIV(xvdiv_bu, 8, UXB, DO_DIVU)
+XVDIV(xvdiv_hu, 16, UXH, DO_DIVU)
+XVDIV(xvdiv_wu, 32, UXW, DO_DIVU)
+XVDIV(xvdiv_du, 64, UXD, DO_DIVU)
+XVDIV(xvmod_b, 8, XB, DO_REM)
+XVDIV(xvmod_h, 16, XH, DO_REM)
+XVDIV(xvmod_w, 32, XW, DO_REM)
+XVDIV(xvmod_d, 64, XD, DO_REM)
+XVDIV(xvmod_bu, 8, UXB, DO_REMU)
+XVDIV(xvmod_hu, 16, UXH, DO_REMU)
+XVDIV(xvmod_wu, 32, UXW, DO_REMU)
+XVDIV(xvmod_du, 64, UXD, DO_REMU)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d384fbef3a..5aac0c9ef5 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -546,13 +546,6 @@ VMADDWOD_U_S(vmaddwod_h_bu_b, 16, H, UH, B, UB, DO_MUL)
VMADDWOD_U_S(vmaddwod_w_hu_h, 32, W, UW, H, UH, DO_MUL)
VMADDWOD_U_S(vmaddwod_d_wu_w, 64, D, UD, W, UW, DO_MUL)
-#define DO_DIVU(N, M) (unlikely(M == 0) ? 0 : N / M)
-#define DO_REMU(N, M) (unlikely(M == 0) ? 0 : N % M)
-#define DO_DIV(N, M) (unlikely(M == 0) ? 0 :\
- unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
-#define DO_REM(N, M) (unlikely(M == 0) ? 0 :\
- unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
-
#define VDIV(NAME, BIT, E, DO_OP) \
void HELPER(NAME)(CPULoongArchState *env, \
uint32_t vd, uint32_t vj, uint32_t vk) \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 06992410ad..c748957158 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -66,4 +66,11 @@
#define DO_MADD(a, b, c) (a + b * c)
#define DO_MSUB(a, b, c) (a - b * c)
+#define DO_DIVU(N, M) (unlikely(M == 0) ? 0 : N / M)
+#define DO_REMU(N, M) (unlikely(M == 0) ? 0 : N % M)
+#define DO_DIV(N, M) (unlikely(M == 0) ? 0 :\
+ unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
+#define DO_REM(N, M) (unlikely(M == 0) ? 0 :\
+ unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 18/46] target/loongarch: Implement xvsat
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (16 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 17/46] target/loongarch; Implement xvdiv/xvmod Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 19/46] target/loongarch: Implement xvexth Song Gao
` (27 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSAT.{B/H/W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 9 ++
target/loongarch/helper.h | 9 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 86 ++++++++++++++++++++
target/loongarch/insns.decode | 13 +++
target/loongarch/lasx_helper.c | 37 +++++++++
5 files changed, 154 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 83efde440f..18fa454be8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1978,6 +1978,15 @@ INSN_LASX(xvmod_hu, xxx)
INSN_LASX(xvmod_wu, xxx)
INSN_LASX(xvmod_du, xxx)
+INSN_LASX(xvsat_b, xx_i)
+INSN_LASX(xvsat_h, xx_i)
+INSN_LASX(xvsat_w, xx_i)
+INSN_LASX(xvsat_d, xx_i)
+INSN_LASX(xvsat_bu, xx_i)
+INSN_LASX(xvsat_hu, xx_i)
+INSN_LASX(xvsat_wu, xx_i)
+INSN_LASX(xvsat_du, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 95c7ecba3b..741872a24d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -886,3 +886,12 @@ DEF_HELPER_4(xvmod_bu, void, env, i32, i32, i32)
DEF_HELPER_4(xvmod_hu, void, env, i32, i32, i32)
DEF_HELPER_4(xvmod_wu, void, env, i32, i32, i32)
DEF_HELPER_4(xvmod_du, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvsat_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 930872c939..350d575a6a 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1747,6 +1747,92 @@ TRANS(xvmod_hu, gen_xxx, gen_helper_xvmod_hu)
TRANS(xvmod_wu, gen_xxx, gen_helper_xvmod_wu)
TRANS(xvmod_du, gen_xxx, gen_helper_xvmod_du)
+static void do_xvsat_s(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_smax_vec, INDEX_op_smin_vec, 0
+ };
+ static const GVecGen2s op[4] = {
+ {
+ .fniv = gen_vsat_s,
+ .fno = gen_helper_xvsat_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vsat_s,
+ .fno = gen_helper_xvsat_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vsat_s,
+ .fno = gen_helper_xvsat_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vsat_s,
+ .fno = gen_helper_xvsat_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2s(xd_ofs, xj_ofs, oprsz, maxsz,
+ tcg_constant_i64((1ll << imm) - 1), &op[vece]);
+}
+
+TRANS(xvsat_b, gvec_xx_i, MO_8, do_xvsat_s)
+TRANS(xvsat_h, gvec_xx_i, MO_16, do_xvsat_s)
+TRANS(xvsat_w, gvec_xx_i, MO_32, do_xvsat_s)
+TRANS(xvsat_d, gvec_xx_i, MO_64, do_xvsat_s)
+
+static void do_xvsat_u(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ uint64_t max;
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_umin_vec, 0
+ };
+ static const GVecGen2s op[4] = {
+ {
+ .fniv = gen_vsat_u,
+ .fno = gen_helper_xvsat_bu,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vsat_u,
+ .fno = gen_helper_xvsat_hu,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vsat_u,
+ .fno = gen_helper_xvsat_wu,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vsat_u,
+ .fno = gen_helper_xvsat_du,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ max = (imm == 0x3f) ? UINT64_MAX : (1ull << (imm + 1)) - 1;
+ tcg_gen_gvec_2s(xd_ofs, xj_ofs, oprsz, maxsz,
+ tcg_constant_i64(max), &op[vece]);
+}
+
+TRANS(xvsat_bu, gvec_xx_i, MO_8, do_xvsat_u)
+TRANS(xvsat_hu, gvec_xx_i, MO_16, do_xvsat_u)
+TRANS(xvsat_wu, gvec_xx_i, MO_32, do_xvsat_u)
+TRANS(xvsat_du, gvec_xx_i, MO_64, do_xvsat_u)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0bd4e7709a..9efb5f2032 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1314,7 +1314,11 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
@xr .... ........ ..... ..... rj:5 xd:5 &xr
@xx_i5 .... ........ ..... imm:s5 xj:5 xd:5 &xx_i
+@xx_ui3 .... ........ ..... .. imm:3 xj:5 xd:5 &xx_i
+@xx_ui4 .... ........ ..... . imm:4 xj:5 xd:5 &xx_i
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
+@xx_ui6 .... ........ .... imm:6 xj:5 xd:5 &xx_i
+
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1578,6 +1582,15 @@ xvmod_hu 0111 01001110 01101 ..... ..... ..... @xxx
xvmod_wu 0111 01001110 01110 ..... ..... ..... @xxx
xvmod_du 0111 01001110 01111 ..... ..... ..... @xxx
+xvsat_b 0111 01110010 01000 01 ... ..... ..... @xx_ui3
+xvsat_h 0111 01110010 01000 1 .... ..... ..... @xx_ui4
+xvsat_w 0111 01110010 01001 ..... ..... ..... @xx_ui5
+xvsat_d 0111 01110010 0101 ...... ..... ..... @xx_ui6
+xvsat_bu 0111 01110010 10000 01 ... ..... ..... @xx_ui3
+xvsat_hu 0111 01110010 10000 1 .... ..... ..... @xx_ui4
+xvsat_wu 0111 01110010 10001 ..... ..... ..... @xx_ui5
+xvsat_du 0111 01110010 1001 ...... ..... ..... @xx_ui6
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index d4a4a7659a..33da60f2d8 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -601,3 +601,40 @@ XVDIV(xvmod_bu, 8, UXB, DO_REMU)
XVDIV(xvmod_hu, 16, UXH, DO_REMU)
XVDIV(xvmod_wu, 32, UXW, DO_REMU)
XVDIV(xvmod_du, 64, UXD, DO_REMU)
+
+#define XVSAT_S(NAME, BIT, E) \
+void HELPER(NAME)(void *xd, void *xj, uint64_t max, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ typedef __typeof(Xd->E(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = Xj->E(i) > (TD)max ? (TD)max : \
+ Xj->E(i) < (TD)~max ? (TD)~max : Xj->E(i); \
+ } \
+}
+
+XVSAT_S(xvsat_b, 8, XB)
+XVSAT_S(xvsat_h, 16, XH)
+XVSAT_S(xvsat_w, 32, XW)
+XVSAT_S(xvsat_d, 64, XD)
+
+#define XVSAT_U(NAME, BIT, E) \
+void HELPER(NAME)(void *xd, void *xj, uint64_t max, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ typedef __typeof(Xd->E(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = Xj->E(i) > (TD)max ? (TD)max : Xj->E(i); \
+ } \
+}
+
+XVSAT_U(xvsat_bu, 8, UXB)
+XVSAT_U(xvsat_hu, 16, UXH)
+XVSAT_U(xvsat_wu, 32, UXW)
+XVSAT_U(xvsat_du, 64, UXD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 19/46] target/loongarch: Implement xvexth
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (17 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 18/46] target/loongarch: Implement xvsat Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 20/46] target/loongarch: Implement vext2xv Song Gao
` (26 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVEXTH.{H.B/W.H/D.W/Q.D};
- XVEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 9 +++++
target/loongarch/helper.h | 9 +++++
target/loongarch/insn_trans/trans_lasx.c.inc | 20 ++++++++++
target/loongarch/insns.decode | 9 +++++
target/loongarch/lasx_helper.c | 39 ++++++++++++++++++++
5 files changed, 86 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 18fa454be8..5ac374bc63 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1987,6 +1987,15 @@ INSN_LASX(xvsat_hu, xx_i)
INSN_LASX(xvsat_wu, xx_i)
INSN_LASX(xvsat_du, xx_i)
+INSN_LASX(xvexth_h_b, xx)
+INSN_LASX(xvexth_w_h, xx)
+INSN_LASX(xvexth_d_w, xx)
+INSN_LASX(xvexth_q_d, xx)
+INSN_LASX(xvexth_hu_bu, xx)
+INSN_LASX(xvexth_wu_hu, xx)
+INSN_LASX(xvexth_du_wu, xx)
+INSN_LASX(xvexth_qu_du, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 741872a24d..17e54eb29a 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -895,3 +895,12 @@ DEF_HELPER_FLAGS_4(xvsat_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_3(xvexth_h_b, void, env, i32, i32)
+DEF_HELPER_3(xvexth_w_h, void, env, i32, i32)
+DEF_HELPER_3(xvexth_d_w, void, env, i32, i32)
+DEF_HELPER_3(xvexth_q_d, void, env, i32, i32)
+DEF_HELPER_3(xvexth_hu_bu, void, env, i32, i32)
+DEF_HELPER_3(xvexth_wu_hu, void, env, i32, i32)
+DEF_HELPER_3(xvexth_du_wu, void, env, i32, i32)
+DEF_HELPER_3(xvexth_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 350d575a6a..5110cf9a33 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -28,6 +28,17 @@ static bool gen_xxx(DisasContext *ctx, arg_xxx *a,
return true;
}
+static bool gen_xx(DisasContext *ctx, arg_xx *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+
+ CHECK_SXE;
+ func(cpu_env, xd, xj);
+ return true;
+}
+
static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
void (*func)(unsigned, uint32_t, uint32_t,
uint32_t, uint32_t, uint32_t))
@@ -1833,6 +1844,15 @@ TRANS(xvsat_hu, gvec_xx_i, MO_16, do_xvsat_u)
TRANS(xvsat_wu, gvec_xx_i, MO_32, do_xvsat_u)
TRANS(xvsat_du, gvec_xx_i, MO_64, do_xvsat_u)
+TRANS(xvexth_h_b, gen_xx, gen_helper_xvexth_h_b)
+TRANS(xvexth_w_h, gen_xx, gen_helper_xvexth_w_h)
+TRANS(xvexth_d_w, gen_xx, gen_helper_xvexth_d_w)
+TRANS(xvexth_q_d, gen_xx, gen_helper_xvexth_q_d)
+TRANS(xvexth_hu_bu, gen_xx, gen_helper_xvexth_hu_bu)
+TRANS(xvexth_wu_hu, gen_xx, gen_helper_xvexth_wu_hu)
+TRANS(xvexth_du_wu, gen_xx, gen_helper_xvexth_du_wu)
+TRANS(xvexth_qu_du, gen_xx, gen_helper_xvexth_qu_du)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9efb5f2032..98de616846 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1591,6 +1591,15 @@ xvsat_hu 0111 01110010 10000 1 .... ..... ..... @xx_ui4
xvsat_wu 0111 01110010 10001 ..... ..... ..... @xx_ui5
xvsat_du 0111 01110010 1001 ...... ..... ..... @xx_ui6
+xvexth_h_b 0111 01101001 11101 11000 ..... ..... @xx
+xvexth_w_h 0111 01101001 11101 11001 ..... ..... @xx
+xvexth_d_w 0111 01101001 11101 11010 ..... ..... @xx
+xvexth_q_d 0111 01101001 11101 11011 ..... ..... @xx
+xvexth_hu_bu 0111 01101001 11101 11100 ..... ..... @xx
+xvexth_wu_hu 0111 01101001 11101 11101 ..... ..... @xx
+xvexth_du_wu 0111 01101001 11101 11110 ..... ..... @xx
+xvexth_qu_du 0111 01101001 11101 11111 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 33da60f2d8..ca74263c6e 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -638,3 +638,42 @@ XVSAT_U(xvsat_bu, 8, UXB)
XVSAT_U(xvsat_hu, 16, UXH)
XVSAT_U(xvsat_wu, 32, UXW)
XVSAT_U(xvsat_du, 64, UXD)
+
+#define XVEXTH(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = Xj->E2(i + max); \
+ Xd->E1(i + max) = Xj->E2(i + max * 3); \
+ } \
+}
+
+void HELPER(xvexth_q_d)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ Xd->XQ(0) = int128_makes64(Xj->XD(1));
+ Xd->XQ(1) = int128_makes64(Xj->XD(3));
+}
+
+void HELPER(xvexth_qu_du)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ Xd->XQ(0) = int128_make64(Xj->UXD(1));
+ Xd->XQ(1) = int128_make64(Xj->UXD(3));
+}
+
+XVEXTH(xvexth_h_b, 16, XH, XB)
+XVEXTH(xvexth_w_h, 32, XW, XH)
+XVEXTH(xvexth_d_w, 64, XD, XW)
+XVEXTH(xvexth_hu_bu, 16, UXH, UXB)
+XVEXTH(xvexth_wu_hu, 32, UXW, UXH)
+XVEXTH(xvexth_du_wu, 64, UXD, UXW)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 20/46] target/loongarch: Implement vext2xv
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (18 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 19/46] target/loongarch: Implement xvexth Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 21/46] target/loongarch: Implement xvsigncov Song Gao
` (25 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- VEXT2XV.{H/W/D}.B, VEXT2XV.{HU/WU/DU}.BU;
- VEXT2XV.{W/D}.B, VEXT2XV.{WU/DU}.HU;
- VEXT2XV.D.W, VEXT2XV.DU.WU.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 13 ++++++++++
target/loongarch/helper.h | 13 ++++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 13 ++++++++++
target/loongarch/insns.decode | 13 ++++++++++
target/loongarch/lasx_helper.c | 27 ++++++++++++++++++++
5 files changed, 79 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5ac374bc63..1897aa7ba1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1996,6 +1996,19 @@ INSN_LASX(xvexth_wu_hu, xx)
INSN_LASX(xvexth_du_wu, xx)
INSN_LASX(xvexth_qu_du, xx)
+INSN_LASX(vext2xv_h_b, xx)
+INSN_LASX(vext2xv_w_b, xx)
+INSN_LASX(vext2xv_d_b, xx)
+INSN_LASX(vext2xv_w_h, xx)
+INSN_LASX(vext2xv_d_h, xx)
+INSN_LASX(vext2xv_d_w, xx)
+INSN_LASX(vext2xv_hu_bu, xx)
+INSN_LASX(vext2xv_wu_bu, xx)
+INSN_LASX(vext2xv_du_bu, xx)
+INSN_LASX(vext2xv_wu_hu, xx)
+INSN_LASX(vext2xv_du_hu, xx)
+INSN_LASX(vext2xv_du_wu, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 17e54eb29a..7a303ee3f1 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -904,3 +904,16 @@ DEF_HELPER_3(xvexth_hu_bu, void, env, i32, i32)
DEF_HELPER_3(xvexth_wu_hu, void, env, i32, i32)
DEF_HELPER_3(xvexth_du_wu, void, env, i32, i32)
DEF_HELPER_3(xvexth_qu_du, void, env, i32, i32)
+
+DEF_HELPER_3(vext2xv_h_b, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_w_b, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_d_b, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_w_h, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_d_h, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_d_w, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_hu_bu, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_wu_bu, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_du_bu, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_wu_hu, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_du_hu, void, env, i32, i32)
+DEF_HELPER_3(vext2xv_du_wu, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 5110cf9a33..c04469af75 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1853,6 +1853,19 @@ TRANS(xvexth_wu_hu, gen_xx, gen_helper_xvexth_wu_hu)
TRANS(xvexth_du_wu, gen_xx, gen_helper_xvexth_du_wu)
TRANS(xvexth_qu_du, gen_xx, gen_helper_xvexth_qu_du)
+TRANS(vext2xv_h_b, gen_xx, gen_helper_vext2xv_h_b)
+TRANS(vext2xv_w_b, gen_xx, gen_helper_vext2xv_w_b)
+TRANS(vext2xv_d_b, gen_xx, gen_helper_vext2xv_d_b)
+TRANS(vext2xv_w_h, gen_xx, gen_helper_vext2xv_w_h)
+TRANS(vext2xv_d_h, gen_xx, gen_helper_vext2xv_d_h)
+TRANS(vext2xv_d_w, gen_xx, gen_helper_vext2xv_d_w)
+TRANS(vext2xv_hu_bu, gen_xx, gen_helper_vext2xv_hu_bu)
+TRANS(vext2xv_wu_bu, gen_xx, gen_helper_vext2xv_wu_bu)
+TRANS(vext2xv_du_bu, gen_xx, gen_helper_vext2xv_du_bu)
+TRANS(vext2xv_wu_hu, gen_xx, gen_helper_vext2xv_wu_hu)
+TRANS(vext2xv_du_hu, gen_xx, gen_helper_vext2xv_du_hu)
+TRANS(vext2xv_du_wu, gen_xx, gen_helper_vext2xv_du_wu)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 98de616846..9f1cb04368 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1600,6 +1600,19 @@ xvexth_wu_hu 0111 01101001 11101 11101 ..... ..... @xx
xvexth_du_wu 0111 01101001 11101 11110 ..... ..... @xx
xvexth_qu_du 0111 01101001 11101 11111 ..... ..... @xx
+vext2xv_h_b 0111 01101001 11110 00100 ..... ..... @xx
+vext2xv_w_b 0111 01101001 11110 00101 ..... ..... @xx
+vext2xv_d_b 0111 01101001 11110 00110 ..... ..... @xx
+vext2xv_w_h 0111 01101001 11110 00111 ..... ..... @xx
+vext2xv_d_h 0111 01101001 11110 01000 ..... ..... @xx
+vext2xv_d_w 0111 01101001 11110 01001 ..... ..... @xx
+vext2xv_hu_bu 0111 01101001 11110 01010 ..... ..... @xx
+vext2xv_wu_bu 0111 01101001 11110 01011 ..... ..... @xx
+vext2xv_du_bu 0111 01101001 11110 01100 ..... ..... @xx
+vext2xv_wu_hu 0111 01101001 11110 01101 ..... ..... @xx
+vext2xv_du_hu 0111 01101001 11110 01110 ..... ..... @xx
+vext2xv_du_wu 0111 01101001 11110 01111 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index ca74263c6e..ca82d03ff4 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -677,3 +677,30 @@ XVEXTH(xvexth_d_w, 64, XD, XW)
XVEXTH(xvexth_hu_bu, 16, UXH, UXB)
XVEXTH(xvexth_wu_hu, 32, UXW, UXH)
XVEXTH(xvexth_du_wu, 64, UXD, UXW)
+
+#define VEXT2XV(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg temp; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ temp.E1(i) = Xj->E2(i); \
+ } \
+ *Xd = temp; \
+}
+
+VEXT2XV(vext2xv_h_b, 16, XH, XB)
+VEXT2XV(vext2xv_w_b, 32, XW, XB)
+VEXT2XV(vext2xv_d_b, 64, XD, XB)
+VEXT2XV(vext2xv_w_h, 32, XW, XH)
+VEXT2XV(vext2xv_d_h, 64, XD, XH)
+VEXT2XV(vext2xv_d_w, 64, XD, XW)
+VEXT2XV(vext2xv_hu_bu, 16, UXH, UXB)
+VEXT2XV(vext2xv_wu_bu, 32, UXW, UXB)
+VEXT2XV(vext2xv_du_bu, 64, UXD, UXB)
+VEXT2XV(vext2xv_wu_hu, 32, UXW, UXH)
+VEXT2XV(vext2xv_du_hu, 64, UXD, UXH)
+VEXT2XV(vext2xv_du_wu, 64, UXD, UXW)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 21/46] target/loongarch: Implement xvsigncov
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (19 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 20/46] target/loongarch: Implement vext2xv Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
` (24 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSIGNCOV.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 5 +++
target/loongarch/helper.h | 5 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 41 ++++++++++++++++++++
target/loongarch/insns.decode | 5 +++
target/loongarch/lasx_helper.c | 5 +++
target/loongarch/lsx_helper.c | 2 -
target/loongarch/vec.h | 2 +
7 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1897aa7ba1..d0ccf3e86c 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2009,6 +2009,11 @@ INSN_LASX(vext2xv_wu_hu, xx)
INSN_LASX(vext2xv_du_hu, xx)
INSN_LASX(vext2xv_du_wu, xx)
+INSN_LASX(xvsigncov_b, xxx)
+INSN_LASX(xvsigncov_h, xxx)
+INSN_LASX(xvsigncov_w, xxx)
+INSN_LASX(xvsigncov_d, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 7a303ee3f1..53a33703b3 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -917,3 +917,8 @@ DEF_HELPER_3(vext2xv_du_bu, void, env, i32, i32)
DEF_HELPER_3(vext2xv_wu_hu, void, env, i32, i32)
DEF_HELPER_3(vext2xv_du_hu, void, env, i32, i32)
DEF_HELPER_3(vext2xv_du_wu, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index c04469af75..9c24e82ac0 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1866,6 +1866,47 @@ TRANS(vext2xv_wu_hu, gen_xx, gen_helper_vext2xv_wu_hu)
TRANS(vext2xv_du_hu, gen_xx, gen_helper_vext2xv_du_hu)
TRANS(vext2xv_du_wu, gen_xx, gen_helper_vext2xv_du_wu)
+static void do_xvsigncov(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_neg_vec, INDEX_op_cmpsel_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vsigncov,
+ .fno = gen_helper_xvsigncov_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vsigncov,
+ .fno = gen_helper_xvsigncov_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vsigncov,
+ .fno = gen_helper_xvsigncov_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vsigncov,
+ .fno = gen_helper_xvsigncov_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvsigncov_b, gvec_xxx, MO_8, do_xvsigncov)
+TRANS(xvsigncov_h, gvec_xxx, MO_16, do_xvsigncov)
+TRANS(xvsigncov_w, gvec_xxx, MO_32, do_xvsigncov)
+TRANS(xvsigncov_d, gvec_xxx, MO_64, do_xvsigncov)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9f1cb04368..887d7f5a90 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1613,6 +1613,11 @@ vext2xv_wu_hu 0111 01101001 11110 01101 ..... ..... @xx
vext2xv_du_hu 0111 01101001 11110 01110 ..... ..... @xx
vext2xv_du_wu 0111 01101001 11110 01111 ..... ..... @xx
+xvsigncov_b 0111 01010010 11100 ..... ..... ..... @xxx
+xvsigncov_h 0111 01010010 11101 ..... ..... ..... @xxx
+xvsigncov_w 0111 01010010 11110 ..... ..... ..... @xxx
+xvsigncov_d 0111 01010010 11111 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index ca82d03ff4..db7905fa4d 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -704,3 +704,8 @@ VEXT2XV(vext2xv_du_bu, 64, UXD, UXB)
VEXT2XV(vext2xv_wu_hu, 32, UXW, UXH)
VEXT2XV(vext2xv_du_hu, 64, UXD, UXH)
VEXT2XV(vext2xv_du_wu, 64, UXD, UXW)
+
+XDO_3OP(xvsigncov_b, 8, XB, DO_SIGNCOV)
+XDO_3OP(xvsigncov_h, 16, XH, DO_SIGNCOV)
+XDO_3OP(xvsigncov_w, 32, XW, DO_SIGNCOV)
+XDO_3OP(xvsigncov_d, 64, XD, DO_SIGNCOV)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 5aac0c9ef5..dadba47513 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -648,8 +648,6 @@ VEXTH(vexth_hu_bu, 16, UH, UB)
VEXTH(vexth_wu_hu, 32, UW, UH)
VEXTH(vexth_du_wu, 64, UD, UW)
-#define DO_SIGNCOV(a, b) (a == 0 ? 0 : a < 0 ? -b : b)
-
DO_3OP(vsigncov_b, 8, B, DO_SIGNCOV)
DO_3OP(vsigncov_h, 16, H, DO_SIGNCOV)
DO_3OP(vsigncov_w, 32, W, DO_SIGNCOV)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index c748957158..f6ad3f78dd 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -73,4 +73,6 @@
#define DO_REM(N, M) (unlikely(M == 0) ? 0 :\
unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
+#define DO_SIGNCOV(a, b) (a == 0 ? 0 : a < 0 ? -b : b)
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (20 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 21/46] target/loongarch: Implement xvsigncov Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 23/46] target/loognarch: Implement xvldi Song Gao
` (23 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVMSKLTZ.{B/H/W/D};
- XVMSKGEZ.B;
- XVMSKNZ.B.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 7 ++
target/loongarch/helper.h | 7 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 7 ++
target/loongarch/insns.decode | 7 ++
target/loongarch/lasx_helper.c | 95 ++++++++++++++++++++
target/loongarch/lsx_helper.c | 10 +--
target/loongarch/vec.h | 6 ++
7 files changed, 134 insertions(+), 5 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d0ccf3e86c..5a3c14f33d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2014,6 +2014,13 @@ INSN_LASX(xvsigncov_h, xxx)
INSN_LASX(xvsigncov_w, xxx)
INSN_LASX(xvsigncov_d, xxx)
+INSN_LASX(xvmskltz_b, xx)
+INSN_LASX(xvmskltz_h, xx)
+INSN_LASX(xvmskltz_w, xx)
+INSN_LASX(xvmskltz_d, xx)
+INSN_LASX(xvmskgez_b, xx)
+INSN_LASX(xvmsknz_b, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 53a33703b3..b7ba78ee06 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -922,3 +922,10 @@ DEF_HELPER_FLAGS_4(xvsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
DEF_HELPER_FLAGS_4(xvsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_3(xvmskltz_b, void, env, i32, i32)
+DEF_HELPER_3(xvmskltz_h, void, env, i32, i32)
+DEF_HELPER_3(xvmskltz_w, void, env, i32, i32)
+DEF_HELPER_3(xvmskltz_d, void, env, i32, i32)
+DEF_HELPER_3(xvmskgez_b, void, env, i32, i32)
+DEF_HELPER_3(xvmsknz_b, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 9c24e82ac0..b0aad21a9d 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1907,6 +1907,13 @@ TRANS(xvsigncov_h, gvec_xxx, MO_16, do_xvsigncov)
TRANS(xvsigncov_w, gvec_xxx, MO_32, do_xvsigncov)
TRANS(xvsigncov_d, gvec_xxx, MO_64, do_xvsigncov)
+TRANS(xvmskltz_b, gen_xx, gen_helper_xvmskltz_b)
+TRANS(xvmskltz_h, gen_xx, gen_helper_xvmskltz_h)
+TRANS(xvmskltz_w, gen_xx, gen_helper_xvmskltz_w)
+TRANS(xvmskltz_d, gen_xx, gen_helper_xvmskltz_d)
+TRANS(xvmskgez_b, gen_xx, gen_helper_xvmskgez_b)
+TRANS(xvmsknz_b, gen_xx, gen_helper_xvmsknz_b)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 887d7f5a90..b792a68fdf 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1618,6 +1618,13 @@ xvsigncov_h 0111 01010010 11101 ..... ..... ..... @xxx
xvsigncov_w 0111 01010010 11110 ..... ..... ..... @xxx
xvsigncov_d 0111 01010010 11111 ..... ..... ..... @xxx
+xvmskltz_b 0111 01101001 11000 10000 ..... ..... @xx
+xvmskltz_h 0111 01101001 11000 10001 ..... ..... @xx
+xvmskltz_w 0111 01101001 11000 10010 ..... ..... @xx
+xvmskltz_d 0111 01101001 11000 10011 ..... ..... @xx
+xvmskgez_b 0111 01101001 11000 10100 ..... ..... @xx
+xvmsknz_b 0111 01101001 11000 11000 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index db7905fa4d..6aec554645 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -709,3 +709,98 @@ XDO_3OP(xvsigncov_b, 8, XB, DO_SIGNCOV)
XDO_3OP(xvsigncov_h, 16, XH, DO_SIGNCOV)
XDO_3OP(xvsigncov_w, 32, XW, DO_SIGNCOV)
XDO_3OP(xvsigncov_d, 64, XD, DO_SIGNCOV)
+
+void HELPER(xvmskltz_b)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ uint16_t temp;
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ for (i = 0; i < 2; i++) {
+ temp = 0;
+ temp = do_vmskltz_b(Xj->XD(2 * i));
+ temp |= (do_vmskltz_b(Xj->XD(2 * i + 1)) << 8);
+ Xd->XD(2 * i) = temp;
+ Xd->XD(2 * i + 1) = 0;
+ }
+}
+
+void HELPER(xvmskltz_h)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ uint16_t temp;
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ for (i = 0; i < 2; i++) {
+ temp = 0;
+ temp = do_vmskltz_h(Xj->XD(2 * i));
+ temp |= (do_vmskltz_h(Xj->XD(2 * i + 1)) << 4);
+ Xd->XD(2 * i) = temp;
+ Xd->XD(2 * i + 1) = 0;
+ }
+}
+
+void HELPER(xvmskltz_w)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ uint16_t temp;
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ for (i = 0; i < 2; i++) {
+ temp = do_vmskltz_w(Xj->XD(2 * i));
+ temp |= (do_vmskltz_w(Xj->XD(2 * i + 1)) << 2);
+ Xd->XD(2 * i) = temp;
+ Xd->XD(2 * i + 1) = 0;
+ }
+}
+
+void HELPER(xvmskltz_d)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ uint16_t temp;
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ for (i = 0; i < 2; i++) {
+ temp = 0;
+ temp = do_vmskltz_d(Xj->XD(2 * i));
+ temp |= (do_vmskltz_d(Xj->XD(2 * i + 1)) << 1);
+ Xd->XD(2 * i) = temp;
+ Xd->XD(2 * i + 1) = 0;
+ }
+}
+
+void HELPER(xvmskgez_b)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ uint16_t temp;
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ for (i = 0; i < 2; i++) {
+ temp = 0;
+ temp = do_vmskltz_b(Xj->XD(2 * i));
+ temp |= (do_vmskltz_b(Xj->XD(2 * i + 1)) << 8);
+ Xd->XD(2 * i) = (uint16_t)(~temp);
+ Xd->XD(2 * i + 1) = 0;
+ }
+}
+
+void HELPER(xvmsknz_b)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ uint16_t temp;
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ for (i = 0; i < 2; i++) {
+ temp = 0;
+ temp = do_vmskez_b(Xj->XD(2 * i));
+ temp |= (do_vmskez_b(Xj->XD(2 * i + 1)) << 8);
+ Xd->XD(2 * i) = (uint16_t)(~temp);
+ Xd->XD(2 * i + 1) = 0;
+ }
+}
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index dadba47513..e64155f38c 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -653,7 +653,7 @@ DO_3OP(vsigncov_h, 16, H, DO_SIGNCOV)
DO_3OP(vsigncov_w, 32, W, DO_SIGNCOV)
DO_3OP(vsigncov_d, 64, D, DO_SIGNCOV)
-static uint64_t do_vmskltz_b(int64_t val)
+uint64_t do_vmskltz_b(int64_t val)
{
uint64_t m = 0x8080808080808080ULL;
uint64_t c = val & m;
@@ -675,7 +675,7 @@ void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
Vd->D(1) = 0;
}
-static uint64_t do_vmskltz_h(int64_t val)
+uint64_t do_vmskltz_h(int64_t val)
{
uint64_t m = 0x8000800080008000ULL;
uint64_t c = val & m;
@@ -696,7 +696,7 @@ void HELPER(vmskltz_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
Vd->D(1) = 0;
}
-static uint64_t do_vmskltz_w(int64_t val)
+uint64_t do_vmskltz_w(int64_t val)
{
uint64_t m = 0x8000000080000000ULL;
uint64_t c = val & m;
@@ -716,7 +716,7 @@ void HELPER(vmskltz_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
Vd->D(1) = 0;
}
-static uint64_t do_vmskltz_d(int64_t val)
+uint64_t do_vmskltz_d(int64_t val)
{
return (uint64_t)val >> 63;
}
@@ -744,7 +744,7 @@ void HELPER(vmskgez_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
Vd->D(1) = 0;
}
-static uint64_t do_vmskez_b(uint64_t a)
+uint64_t do_vmskez_b(uint64_t a)
{
uint64_t m = 0x7f7f7f7f7f7f7f7fULL;
uint64_t c = ~(((a & m) + m) | a | m);
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index f6ad3f78dd..d5a880b3fd 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -75,4 +75,10 @@
#define DO_SIGNCOV(a, b) (a == 0 ? 0 : a < 0 ? -b : b)
+uint64_t do_vmskltz_b(int64_t val);
+uint64_t do_vmskltz_h(int64_t val);
+uint64_t do_vmskltz_w(int64_t val);
+uint64_t do_vmskltz_d(int64_t val);
+uint64_t do_vmskez_b(uint64_t val);
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 23/46] target/loognarch: Implement xvldi
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (21 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 24/46] target/loongarch: Implement LASX logic instructions Song Gao
` (22 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVLDI.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 7 +++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 21 ++++++++++++++++++++
target/loongarch/insns.decode | 5 ++++-
3 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5a3c14f33d..82a9826eb7 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1703,6 +1703,11 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
return true; \
}
+static void output_x_i(DisasContext *ctx, arg_x_i *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, 0x%x", a->xd, a->imm);
+}
+
static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic)
{
output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk);
@@ -2021,6 +2026,8 @@ INSN_LASX(xvmskltz_d, xx)
INSN_LASX(xvmskgez_b, xx)
INSN_LASX(xvmsknz_b, xx)
+INSN_LASX(xvldi, x_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index b0aad21a9d..bf277e1fd9 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1914,6 +1914,27 @@ TRANS(xvmskltz_d, gen_xx, gen_helper_xvmskltz_d)
TRANS(xvmskgez_b, gen_xx, gen_helper_xvmskgez_b)
TRANS(xvmsknz_b, gen_xx, gen_helper_xvmsknz_b)
+static bool trans_xvldi(DisasContext *ctx, arg_xvldi * a)
+{
+ int sel, vece;
+ uint64_t value;
+ CHECK_ASXE;
+
+ sel = (a->imm >> 12) & 0x1;
+
+ if (sel) {
+ value = vldi_get_value(ctx, a->imm);
+ vece = MO_64;
+ } else {
+ value = ((int32_t)(a->imm << 22)) >> 22;
+ vece = (a->imm >> 10) & 0x3;
+ }
+
+ tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->xd), 32, ctx->vl / 8,
+ tcg_constant_i64(value));
+ return true;
+}
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index b792a68fdf..fbd0dd229a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1305,11 +1305,13 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&xxx xd xj xk
&xr xd rj
&xx_i xd xj imm
+&x_i xd imm
#
# LASX Formats
#
+@x_i13 .... ........ .. imm:13 xd:5 &x_i
@xx .... ........ ..... ..... xj:5 xd:5 &xx
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
@xr .... ........ ..... ..... rj:5 xd:5 &xr
@@ -1319,7 +1321,6 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
@xx_ui6 .... ........ .... imm:6 xj:5 xd:5 &xx_i
-
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
xvadd_w 0111 01000000 10110 ..... ..... ..... @xxx
@@ -1625,6 +1626,8 @@ xvmskltz_d 0111 01101001 11000 10011 ..... ..... @xx
xvmskgez_b 0111 01101001 11000 10100 ..... ..... @xx
xvmsknz_b 0111 01101001 11000 11000 ..... ..... @xx
+xvldi 0111 01111110 00 ............. ..... @x_i13
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 24/46] target/loongarch: Implement LASX logic instructions
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (22 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 23/46] target/loognarch: Implement xvldi Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr Song Gao
` (21 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XV{AND/OR/XOR/NOR/ANDN/ORN}.V;
- XV{AND/OR/XOR/NOR}I.B.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 12 ++++++
target/loongarch/helper.h | 2 +
target/loongarch/insn_trans/trans_lasx.c.inc | 42 ++++++++++++++++++++
target/loongarch/insns.decode | 13 ++++++
target/loongarch/lasx_helper.c | 11 +++++
5 files changed, 80 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 82a9826eb7..2f1da9db80 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2028,6 +2028,18 @@ INSN_LASX(xvmsknz_b, xx)
INSN_LASX(xvldi, x_i)
+INSN_LASX(xvand_v, xxx)
+INSN_LASX(xvor_v, xxx)
+INSN_LASX(xvxor_v, xxx)
+INSN_LASX(xvnor_v, xxx)
+INSN_LASX(xvandn_v, xxx)
+INSN_LASX(xvorn_v, xxx)
+
+INSN_LASX(xvandi_b, xx_i)
+INSN_LASX(xvori_b, xx_i)
+INSN_LASX(xvxori_b, xx_i)
+INSN_LASX(xvnori_b, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b7ba78ee06..4e0a900318 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -929,3 +929,5 @@ DEF_HELPER_3(xvmskltz_w, void, env, i32, i32)
DEF_HELPER_3(xvmskltz_d, void, env, i32, i32)
DEF_HELPER_3(xvmskgez_b, void, env, i32, i32)
DEF_HELPER_3(xvmsknz_b, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index bf277e1fd9..d48f76f118 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1935,6 +1935,48 @@ static bool trans_xvldi(DisasContext *ctx, arg_xvldi * a)
return true;
}
+TRANS(xvand_v, gvec_xxx, MO_64, tcg_gen_gvec_and)
+TRANS(xvor_v, gvec_xxx, MO_64, tcg_gen_gvec_or)
+TRANS(xvxor_v, gvec_xxx, MO_64, tcg_gen_gvec_xor)
+TRANS(xvnor_v, gvec_xxx, MO_64, tcg_gen_gvec_nor)
+
+static bool trans_xvandn_v(DisasContext *ctx, arg_xxx * a)
+{
+ uint32_t xd_ofs, xj_ofs, xk_ofs;
+
+ CHECK_ASXE;
+
+ xd_ofs = vec_full_offset(a->xd);
+ xj_ofs = vec_full_offset(a->xj);
+ xk_ofs = vec_full_offset(a->xk);
+
+ tcg_gen_gvec_andc(MO_64, xd_ofs, xk_ofs, xj_ofs, 32, ctx->vl / 8);
+ return true;
+}
+TRANS(xvorn_v, gvec_xxx, MO_64, tcg_gen_gvec_orc)
+TRANS(xvandi_b, gvec_xx_i, MO_8, tcg_gen_gvec_andi)
+TRANS(xvori_b, gvec_xx_i, MO_8, tcg_gen_gvec_ori)
+TRANS(xvxori_b, gvec_xx_i, MO_8, tcg_gen_gvec_xori)
+
+static void do_xvnori_b(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_nor_vec, 0
+ };
+ static const GVecGen2i op = {
+ .fni8 = gen_vnori_b,
+ .fniv = gen_vnori,
+ .fnoi = gen_helper_xvnori_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op);
+}
+
+TRANS(xvnori_b, gvec_xx_i, MO_8, do_xvnori_b)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index fbd0dd229a..ce2ad47b88 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1320,6 +1320,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx_ui4 .... ........ ..... . imm:4 xj:5 xd:5 &xx_i
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
@xx_ui6 .... ........ .... imm:6 xj:5 xd:5 &xx_i
+@xx_ui8 .... ........ .. imm:8 xj:5 xd:5 &xx_i
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1628,6 +1629,18 @@ xvmsknz_b 0111 01101001 11000 11000 ..... ..... @xx
xvldi 0111 01111110 00 ............. ..... @x_i13
+xvand_v 0111 01010010 01100 ..... ..... ..... @xxx
+xvor_v 0111 01010010 01101 ..... ..... ..... @xxx
+xvxor_v 0111 01010010 01110 ..... ..... ..... @xxx
+xvnor_v 0111 01010010 01111 ..... ..... ..... @xxx
+xvandn_v 0111 01010010 10000 ..... ..... ..... @xxx
+xvorn_v 0111 01010010 10001 ..... ..... ..... @xxx
+
+xvandi_b 0111 01111101 00 ........ ..... ..... @xx_ui8
+xvori_b 0111 01111101 01 ........ ..... ..... @xx_ui8
+xvxori_b 0111 01111101 10 ........ ..... ..... @xx_ui8
+xvnori_b 0111 01111101 11 ........ ..... ..... @xx_ui8
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 6aec554645..8e8860c1bb 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -804,3 +804,14 @@ void HELPER(xvmsknz_b)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
Xd->XD(2 * i + 1) = 0;
}
}
+
+void HELPER(xvnori_b)(void *xd, void *xj, uint64_t imm, uint32_t v)
+{
+ int i;
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+
+ for (i = 0; i < LASX_LEN / 8; i++) {
+ Xd->XB(i) = ~(Xj->XB(i) | (uint8_t)imm);
+ }
+}
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (23 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 24/46] target/loongarch: Implement LASX logic instructions Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 26/46] target/loongarch: Implement xvsllwil xvextl Song Gao
` (20 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSLL[I].{B/H/W/D};
- XVSRL[I].{B/H/W/D};
- XVSRA[I].{B/H/W/D};
- XVROTR[I].{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 36 ++++++++++++++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 36 ++++++++++++++++++++
target/loongarch/insns.decode | 33 ++++++++++++++++++
3 files changed, 105 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 2f1da9db80..0c1c7a7e6e 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2040,6 +2040,42 @@ INSN_LASX(xvori_b, xx_i)
INSN_LASX(xvxori_b, xx_i)
INSN_LASX(xvnori_b, xx_i)
+INSN_LASX(xvsll_b, xxx)
+INSN_LASX(xvsll_h, xxx)
+INSN_LASX(xvsll_w, xxx)
+INSN_LASX(xvsll_d, xxx)
+INSN_LASX(xvslli_b, xx_i)
+INSN_LASX(xvslli_h, xx_i)
+INSN_LASX(xvslli_w, xx_i)
+INSN_LASX(xvslli_d, xx_i)
+
+INSN_LASX(xvsrl_b, xxx)
+INSN_LASX(xvsrl_h, xxx)
+INSN_LASX(xvsrl_w, xxx)
+INSN_LASX(xvsrl_d, xxx)
+INSN_LASX(xvsrli_b, xx_i)
+INSN_LASX(xvsrli_h, xx_i)
+INSN_LASX(xvsrli_w, xx_i)
+INSN_LASX(xvsrli_d, xx_i)
+
+INSN_LASX(xvsra_b, xxx)
+INSN_LASX(xvsra_h, xxx)
+INSN_LASX(xvsra_w, xxx)
+INSN_LASX(xvsra_d, xxx)
+INSN_LASX(xvsrai_b, xx_i)
+INSN_LASX(xvsrai_h, xx_i)
+INSN_LASX(xvsrai_w, xx_i)
+INSN_LASX(xvsrai_d, xx_i)
+
+INSN_LASX(xvrotr_b, xxx)
+INSN_LASX(xvrotr_h, xxx)
+INSN_LASX(xvrotr_w, xxx)
+INSN_LASX(xvrotr_d, xxx)
+INSN_LASX(xvrotri_b, xx_i)
+INSN_LASX(xvrotri_h, xx_i)
+INSN_LASX(xvrotri_w, xx_i)
+INSN_LASX(xvrotri_d, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index d48f76f118..5d7deb312e 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -1977,6 +1977,42 @@ static void do_xvnori_b(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
TRANS(xvnori_b, gvec_xx_i, MO_8, do_xvnori_b)
+TRANS(xvsll_b, gvec_xxx, MO_8, tcg_gen_gvec_shlv)
+TRANS(xvsll_h, gvec_xxx, MO_16, tcg_gen_gvec_shlv)
+TRANS(xvsll_w, gvec_xxx, MO_32, tcg_gen_gvec_shlv)
+TRANS(xvsll_d, gvec_xxx, MO_64, tcg_gen_gvec_shlv)
+TRANS(xvslli_b, gvec_xx_i, MO_8, tcg_gen_gvec_shli)
+TRANS(xvslli_h, gvec_xx_i, MO_16, tcg_gen_gvec_shli)
+TRANS(xvslli_w, gvec_xx_i, MO_32, tcg_gen_gvec_shli)
+TRANS(xvslli_d, gvec_xx_i, MO_64, tcg_gen_gvec_shli)
+
+TRANS(xvsrl_b, gvec_xxx, MO_8, tcg_gen_gvec_shrv)
+TRANS(xvsrl_h, gvec_xxx, MO_16, tcg_gen_gvec_shrv)
+TRANS(xvsrl_w, gvec_xxx, MO_32, tcg_gen_gvec_shrv)
+TRANS(xvsrl_d, gvec_xxx, MO_64, tcg_gen_gvec_shrv)
+TRANS(xvsrli_b, gvec_xx_i, MO_8, tcg_gen_gvec_shri)
+TRANS(xvsrli_h, gvec_xx_i, MO_16, tcg_gen_gvec_shri)
+TRANS(xvsrli_w, gvec_xx_i, MO_32, tcg_gen_gvec_shri)
+TRANS(xvsrli_d, gvec_xx_i, MO_64, tcg_gen_gvec_shri)
+
+TRANS(xvsra_b, gvec_xxx, MO_8, tcg_gen_gvec_sarv)
+TRANS(xvsra_h, gvec_xxx, MO_16, tcg_gen_gvec_sarv)
+TRANS(xvsra_w, gvec_xxx, MO_32, tcg_gen_gvec_sarv)
+TRANS(xvsra_d, gvec_xxx, MO_64, tcg_gen_gvec_sarv)
+TRANS(xvsrai_b, gvec_xx_i, MO_8, tcg_gen_gvec_sari)
+TRANS(xvsrai_h, gvec_xx_i, MO_16, tcg_gen_gvec_sari)
+TRANS(xvsrai_w, gvec_xx_i, MO_32, tcg_gen_gvec_sari)
+TRANS(xvsrai_d, gvec_xx_i, MO_64, tcg_gen_gvec_sari)
+
+TRANS(xvrotr_b, gvec_xxx, MO_8, tcg_gen_gvec_rotrv)
+TRANS(xvrotr_h, gvec_xxx, MO_16, tcg_gen_gvec_rotrv)
+TRANS(xvrotr_w, gvec_xxx, MO_32, tcg_gen_gvec_rotrv)
+TRANS(xvrotr_d, gvec_xxx, MO_64, tcg_gen_gvec_rotrv)
+TRANS(xvrotri_b, gvec_xx_i, MO_8, tcg_gen_gvec_rotri)
+TRANS(xvrotri_h, gvec_xx_i, MO_16, tcg_gen_gvec_rotri)
+TRANS(xvrotri_w, gvec_xx_i, MO_32, tcg_gen_gvec_rotri)
+TRANS(xvrotri_d, gvec_xx_i, MO_64, tcg_gen_gvec_rotri)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ce2ad47b88..03c3aa0019 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1641,6 +1641,39 @@ xvori_b 0111 01111101 01 ........ ..... ..... @xx_ui8
xvxori_b 0111 01111101 10 ........ ..... ..... @xx_ui8
xvnori_b 0111 01111101 11 ........ ..... ..... @xx_ui8
+xvsll_b 0111 01001110 10000 ..... ..... ..... @xxx
+xvsll_h 0111 01001110 10001 ..... ..... ..... @xxx
+xvsll_w 0111 01001110 10010 ..... ..... ..... @xxx
+xvsll_d 0111 01001110 10011 ..... ..... ..... @xxx
+xvslli_b 0111 01110010 11000 01 ... ..... ..... @xx_ui3
+xvslli_h 0111 01110010 11000 1 .... ..... ..... @xx_ui4
+xvslli_w 0111 01110010 11001 ..... ..... ..... @xx_ui5
+xvslli_d 0111 01110010 1101 ...... ..... ..... @xx_ui6
+xvsrl_b 0111 01001110 10100 ..... ..... ..... @xxx
+xvsrl_h 0111 01001110 10101 ..... ..... ..... @xxx
+xvsrl_w 0111 01001110 10110 ..... ..... ..... @xxx
+xvsrl_d 0111 01001110 10111 ..... ..... ..... @xxx
+xvsrli_b 0111 01110011 00000 01 ... ..... ..... @xx_ui3
+xvsrli_h 0111 01110011 00000 1 .... ..... ..... @xx_ui4
+xvsrli_w 0111 01110011 00001 ..... ..... ..... @xx_ui5
+xvsrli_d 0111 01110011 0001 ...... ..... ..... @xx_ui6
+xvsra_b 0111 01001110 11000 ..... ..... ..... @xxx
+xvsra_h 0111 01001110 11001 ..... ..... ..... @xxx
+xvsra_w 0111 01001110 11010 ..... ..... ..... @xxx
+xvsra_d 0111 01001110 11011 ..... ..... ..... @xxx
+xvsrai_b 0111 01110011 01000 01 ... ..... ..... @xx_ui3
+xvsrai_h 0111 01110011 01000 1 .... ..... ..... @xx_ui4
+xvsrai_w 0111 01110011 01001 ..... ..... ..... @xx_ui5
+xvsrai_d 0111 01110011 0101 ...... ..... ..... @xx_ui6
+xvrotr_b 0111 01001110 11100 ..... ..... ..... @xxx
+xvrotr_h 0111 01001110 11101 ..... ..... ..... @xxx
+xvrotr_w 0111 01001110 11110 ..... ..... ..... @xxx
+xvrotr_d 0111 01001110 11111 ..... ..... ..... @xxx
+xvrotri_b 0111 01101010 00000 01 ... ..... ..... @xx_ui3
+xvrotri_h 0111 01101010 00000 1 .... ..... ..... @xx_ui4
+xvrotri_w 0111 01101010 00001 ..... ..... ..... @xx_ui5
+xvrotri_d 0111 01101010 0001 ...... ..... ..... @xx_ui6
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 26/46] target/loongarch: Implement xvsllwil xvextl
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (24 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 27/46] target/loongarch: Implement xvsrlr xvsrar Song Gao
` (19 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSLLWIL.{H.B/W.H/D.W};
- XVSLLWIL.{HU.BU/WU.HU/DU.WU};
- XVEXTL.Q.D, VEXTL.QU.DU.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 9 ++++
target/loongarch/helper.h | 9 ++++
target/loongarch/insn_trans/trans_lasx.c.inc | 21 +++++++++
target/loongarch/insns.decode | 9 ++++
target/loongarch/lasx_helper.c | 45 ++++++++++++++++++++
5 files changed, 93 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0c1c7a7e6e..b6940e6389 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2076,6 +2076,15 @@ INSN_LASX(xvrotri_h, xx_i)
INSN_LASX(xvrotri_w, xx_i)
INSN_LASX(xvrotri_d, xx_i)
+INSN_LASX(xvsllwil_h_b, xx_i)
+INSN_LASX(xvsllwil_w_h, xx_i)
+INSN_LASX(xvsllwil_d_w, xx_i)
+INSN_LASX(xvextl_q_d, xx)
+INSN_LASX(xvsllwil_hu_bu, xx_i)
+INSN_LASX(xvsllwil_wu_hu, xx_i)
+INSN_LASX(xvsllwil_du_wu, xx_i)
+INSN_LASX(xvextl_qu_du, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4e0a900318..672a5f8988 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -931,3 +931,12 @@ DEF_HELPER_3(xvmskgez_b, void, env, i32, i32)
DEF_HELPER_3(xvmsknz_b, void, env, i32, i32)
DEF_HELPER_FLAGS_4(xvnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_4(xvsllwil_h_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsllwil_w_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsllwil_d_w, void, env, i32, i32, i32)
+DEF_HELPER_3(xvextl_q_d, void, env, i32, i32)
+DEF_HELPER_4(xvsllwil_hu_bu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsllwil_wu_hu, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsllwil_du_wu, void, env, i32, i32, i32)
+DEF_HELPER_3(xvextl_qu_du, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 5d7deb312e..53631cea63 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -39,6 +39,18 @@ static bool gen_xx(DisasContext *ctx, arg_xx *a,
return true;
}
+static bool gen_xx_i(DisasContext *ctx, arg_xx_i *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+{
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+ TCGv_i32 imm = tcg_constant_i32(a->imm);
+
+ CHECK_SXE;
+ func(cpu_env, xd, xj, imm);
+ return true;
+}
+
static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
void (*func)(unsigned, uint32_t, uint32_t,
uint32_t, uint32_t, uint32_t))
@@ -2013,6 +2025,15 @@ TRANS(xvrotri_h, gvec_xx_i, MO_16, tcg_gen_gvec_rotri)
TRANS(xvrotri_w, gvec_xx_i, MO_32, tcg_gen_gvec_rotri)
TRANS(xvrotri_d, gvec_xx_i, MO_64, tcg_gen_gvec_rotri)
+TRANS(xvsllwil_h_b, gen_xx_i, gen_helper_xvsllwil_h_b)
+TRANS(xvsllwil_w_h, gen_xx_i, gen_helper_xvsllwil_w_h)
+TRANS(xvsllwil_d_w, gen_xx_i, gen_helper_xvsllwil_d_w)
+TRANS(xvextl_q_d, gen_xx, gen_helper_xvextl_q_d)
+TRANS(xvsllwil_hu_bu, gen_xx_i, gen_helper_xvsllwil_hu_bu)
+TRANS(xvsllwil_wu_hu, gen_xx_i, gen_helper_xvsllwil_wu_hu)
+TRANS(xvsllwil_du_wu, gen_xx_i, gen_helper_xvsllwil_du_wu)
+TRANS(xvextl_qu_du, gen_xx, gen_helper_xvextl_qu_du)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 03c3aa0019..ebaddb94ea 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1674,6 +1674,15 @@ xvrotri_h 0111 01101010 00000 1 .... ..... ..... @xx_ui4
xvrotri_w 0111 01101010 00001 ..... ..... ..... @xx_ui5
xvrotri_d 0111 01101010 0001 ...... ..... ..... @xx_ui6
+xvsllwil_h_b 0111 01110000 10000 01 ... ..... ..... @xx_ui3
+xvsllwil_w_h 0111 01110000 10000 1 .... ..... ..... @xx_ui4
+xvsllwil_d_w 0111 01110000 10001 ..... ..... ..... @xx_ui5
+xvextl_q_d 0111 01110000 10010 00000 ..... ..... @xx
+xvsllwil_hu_bu 0111 01110000 11000 01 ... ..... ..... @xx_ui3
+xvsllwil_wu_hu 0111 01110000 11000 1 .... ..... ..... @xx_ui4
+xvsllwil_du_wu 0111 01110000 11001 ..... ..... ..... @xx_ui5
+xvextl_qu_du 0111 01110000 11010 00000 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 8e8860c1bb..cd0e18ac3c 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -815,3 +815,48 @@ void HELPER(xvnori_b)(void *xd, void *xj, uint64_t imm, uint32_t v)
Xd->XB(i) = ~(Xj->XB(i) | (uint8_t)imm);
}
}
+
+#define XVSLLWIL(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ typedef __typeof(temp.E1(0)) TD; \
+ \
+ temp.XQ(0) = int128_zero(); \
+ temp.XQ(1) = int128_zero(); \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = (TD)Xj->E2(i) << (imm % BIT); \
+ temp.E1(i + max) = (TD)Xj->E2(i + max * 2) << (imm % BIT); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvextl_q_d)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ Xd->XQ(0) = int128_makes64(Xj->XD(0));
+ Xd->XQ(1) = int128_makes64(Xj->XD(2));
+}
+
+void HELPER(xvextl_qu_du)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ Xd->XQ(0) = int128_make64(Xj->UXD(0));
+ Xd->XQ(1) = int128_make64(Xj->UXD(2));
+}
+
+XVSLLWIL(xvsllwil_h_b, 16, XH, XB)
+XVSLLWIL(xvsllwil_w_h, 32, XW, XH)
+XVSLLWIL(xvsllwil_d_w, 64, XD, XW)
+XVSLLWIL(xvsllwil_hu_bu, 16, UXH, UXB)
+XVSLLWIL(xvsllwil_wu_hu, 32, UXW, UXH)
+XVSLLWIL(xvsllwil_du_wu, 64, UXD, UXW)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 27/46] target/loongarch: Implement xvsrlr xvsrar
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (25 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 26/46] target/loongarch: Implement xvsllwil xvextl Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 28/46] target/loongarch: Implement xvsrln xvsran Song Gao
` (18 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSRLR[I].{B/H/W/D};
- XVSRAR[I].{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 18 ++++
target/loongarch/helper.h | 18 ++++
target/loongarch/insn_trans/trans_lasx.c.inc | 18 ++++
target/loongarch/insns.decode | 17 +++
target/loongarch/lasx_helper.c | 104 +++++++++++++++++++
5 files changed, 175 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b6940e6389..a63ba6d6ee 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2085,6 +2085,24 @@ INSN_LASX(xvsllwil_wu_hu, xx_i)
INSN_LASX(xvsllwil_du_wu, xx_i)
INSN_LASX(xvextl_qu_du, xx)
+INSN_LASX(xvsrlr_b, xxx)
+INSN_LASX(xvsrlr_h, xxx)
+INSN_LASX(xvsrlr_w, xxx)
+INSN_LASX(xvsrlr_d, xxx)
+INSN_LASX(xvsrlri_b, xx_i)
+INSN_LASX(xvsrlri_h, xx_i)
+INSN_LASX(xvsrlri_w, xx_i)
+INSN_LASX(xvsrlri_d, xx_i)
+
+INSN_LASX(xvsrar_b, xxx)
+INSN_LASX(xvsrar_h, xxx)
+INSN_LASX(xvsrar_w, xxx)
+INSN_LASX(xvsrar_d, xxx)
+INSN_LASX(xvsrari_b, xx_i)
+INSN_LASX(xvsrari_h, xx_i)
+INSN_LASX(xvsrari_w, xx_i)
+INSN_LASX(xvsrari_d, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 672a5f8988..6bb30ddd31 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -940,3 +940,21 @@ DEF_HELPER_4(xvsllwil_hu_bu, void, env, i32, i32, i32)
DEF_HELPER_4(xvsllwil_wu_hu, void, env, i32, i32, i32)
DEF_HELPER_4(xvsllwil_du_wu, void, env, i32, i32, i32)
DEF_HELPER_3(xvextl_qu_du, void, env, i32, i32)
+
+DEF_HELPER_4(xvsrlr_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlr_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlr_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlr_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlri_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlri_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlri_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlri_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvsrar_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrar_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrar_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrar_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrari_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrari_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrari_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrari_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 53631cea63..602ba0c800 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2034,6 +2034,24 @@ TRANS(xvsllwil_wu_hu, gen_xx_i, gen_helper_xvsllwil_wu_hu)
TRANS(xvsllwil_du_wu, gen_xx_i, gen_helper_xvsllwil_du_wu)
TRANS(xvextl_qu_du, gen_xx, gen_helper_xvextl_qu_du)
+TRANS(xvsrlr_b, gen_xxx, gen_helper_xvsrlr_b)
+TRANS(xvsrlr_h, gen_xxx, gen_helper_xvsrlr_h)
+TRANS(xvsrlr_w, gen_xxx, gen_helper_xvsrlr_w)
+TRANS(xvsrlr_d, gen_xxx, gen_helper_xvsrlr_d)
+TRANS(xvsrlri_b, gen_xx_i, gen_helper_xvsrlri_b)
+TRANS(xvsrlri_h, gen_xx_i, gen_helper_xvsrlri_h)
+TRANS(xvsrlri_w, gen_xx_i, gen_helper_xvsrlri_w)
+TRANS(xvsrlri_d, gen_xx_i, gen_helper_xvsrlri_d)
+
+TRANS(xvsrar_b, gen_xxx, gen_helper_xvsrar_b)
+TRANS(xvsrar_h, gen_xxx, gen_helper_xvsrar_h)
+TRANS(xvsrar_w, gen_xxx, gen_helper_xvsrar_w)
+TRANS(xvsrar_d, gen_xxx, gen_helper_xvsrar_d)
+TRANS(xvsrari_b, gen_xx_i, gen_helper_xvsrari_b)
+TRANS(xvsrari_h, gen_xx_i, gen_helper_xvsrari_h)
+TRANS(xvsrari_w, gen_xx_i, gen_helper_xvsrari_w)
+TRANS(xvsrari_d, gen_xx_i, gen_helper_xvsrari_d)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ebaddb94ea..d901ddf063 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1683,6 +1683,23 @@ xvsllwil_wu_hu 0111 01110000 11000 1 .... ..... ..... @xx_ui4
xvsllwil_du_wu 0111 01110000 11001 ..... ..... ..... @xx_ui5
xvextl_qu_du 0111 01110000 11010 00000 ..... ..... @xx
+xvsrlr_b 0111 01001111 00000 ..... ..... ..... @xxx
+xvsrlr_h 0111 01001111 00001 ..... ..... ..... @xxx
+xvsrlr_w 0111 01001111 00010 ..... ..... ..... @xxx
+xvsrlr_d 0111 01001111 00011 ..... ..... ..... @xxx
+xvsrlri_b 0111 01101010 01000 01 ... ..... ..... @xx_ui3
+xvsrlri_h 0111 01101010 01000 1 .... ..... ..... @xx_ui4
+xvsrlri_w 0111 01101010 01001 ..... ..... ..... @xx_ui5
+xvsrlri_d 0111 01101010 0101 ...... ..... ..... @xx_ui6
+xvsrar_b 0111 01001111 00100 ..... ..... ..... @xxx
+xvsrar_h 0111 01001111 00101 ..... ..... ..... @xxx
+xvsrar_w 0111 01001111 00110 ..... ..... ..... @xxx
+xvsrar_d 0111 01001111 00111 ..... ..... ..... @xxx
+xvsrari_b 0111 01101010 10000 01 ... ..... ..... @xx_ui3
+xvsrari_h 0111 01101010 10000 1 .... ..... ..... @xx_ui4
+xvsrari_w 0111 01101010 10001 ..... ..... ..... @xx_ui5
+xvsrari_d 0111 01101010 1001 ...... ..... ..... @xx_ui6
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index cd0e18ac3c..ebbbf014f7 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -860,3 +860,107 @@ XVSLLWIL(xvsllwil_d_w, 64, XD, XW)
XVSLLWIL(xvsllwil_hu_bu, 16, UXH, UXB)
XVSLLWIL(xvsllwil_wu_hu, 32, UXW, UXH)
XVSLLWIL(xvsllwil_du_wu, 64, UXD, UXW)
+
+#define do_xvsrlr(E, T) \
+static T do_xvsrlr_ ##E(T s1, int sh) \
+{ \
+ if (sh == 0) { \
+ return s1; \
+ } else { \
+ return (s1 >> sh) + ((s1 >> (sh - 1)) & 0x1); \
+ } \
+}
+
+do_xvsrlr(XB, uint8_t)
+do_xvsrlr(XH, uint16_t)
+do_xvsrlr(XW, uint32_t)
+do_xvsrlr(XD, uint64_t)
+
+#define XVSRLR(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) = do_xvsrlr_ ## E1(Xj->E1(i), (Xk->E2(i)) % BIT); \
+ } \
+}
+
+XVSRLR(xvsrlr_b, 8, XB, UXB)
+XVSRLR(xvsrlr_h, 16, XH, UXH)
+XVSRLR(xvsrlr_w, 32, XW, UXW)
+XVSRLR(xvsrlr_d, 64, XD, UXD)
+
+#define XVSRLRI(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = do_xvsrlr_ ## E(Xj->E(i), imm); \
+ } \
+}
+
+XVSRLRI(xvsrlri_b, 8, XB)
+XVSRLRI(xvsrlri_h, 16, XH)
+XVSRLRI(xvsrlri_w, 32, XW)
+XVSRLRI(xvsrlri_d, 64, XD)
+
+#define do_xvsrar(E, T) \
+static T do_xvsrar_ ##E(T s1, int sh) \
+{ \
+ if (sh == 0) { \
+ return s1; \
+ } else { \
+ return (s1 >> sh) + ((s1 >> (sh - 1)) & 0x1); \
+ } \
+}
+
+do_xvsrar(XB, int8_t)
+do_xvsrar(XH, int16_t)
+do_xvsrar(XW, int32_t)
+do_xvsrar(XD, int64_t)
+
+#define XVSRAR(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E1(i) = do_xvsrar_ ## E1(Xj->E1(i), (Xk->E2(i)) % BIT); \
+ } \
+}
+
+XVSRAR(xvsrar_b, 8, XB, UXB)
+XVSRAR(xvsrar_h, 16, XH, UXH)
+XVSRAR(xvsrar_w, 32, XW, UXW)
+XVSRAR(xvsrar_d, 64, XD, UXD)
+
+#define XVSRARI(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = do_xvsrar_ ## E(Xj->E(i), imm); \
+ } \
+}
+
+XVSRARI(xvsrari_b, 8, XB)
+XVSRARI(xvsrari_h, 16, XH)
+XVSRARI(xvsrari_w, 32, XW)
+XVSRARI(xvsrari_d, 64, XD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 28/46] target/loongarch: Implement xvsrln xvsran
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (26 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 27/46] target/loongarch: Implement xvsrlr xvsrar Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 29/46] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
` (17 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSRLN.{B.H/H.W/W.D};
- XVSRAN.{B.H/H.W/W.D};
- XVSRLNI.{B.H/H.W/W.D/D.Q};
- XVSRANI.{B.H/H.W/W.D/D.Q}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 16 +++
target/loongarch/helper.h | 16 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 16 +++
target/loongarch/insns.decode | 17 +++
target/loongarch/lasx_helper.c | 128 +++++++++++++++++++
target/loongarch/lsx_helper.c | 2 -
target/loongarch/vec.h | 2 +
7 files changed, 195 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index a63ba6d6ee..5ea713075f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2103,6 +2103,22 @@ INSN_LASX(xvsrari_h, xx_i)
INSN_LASX(xvsrari_w, xx_i)
INSN_LASX(xvsrari_d, xx_i)
+INSN_LASX(xvsrln_b_h, xxx)
+INSN_LASX(xvsrln_h_w, xxx)
+INSN_LASX(xvsrln_w_d, xxx)
+INSN_LASX(xvsran_b_h, xxx)
+INSN_LASX(xvsran_h_w, xxx)
+INSN_LASX(xvsran_w_d, xxx)
+
+INSN_LASX(xvsrlni_b_h, xx_i)
+INSN_LASX(xvsrlni_h_w, xx_i)
+INSN_LASX(xvsrlni_w_d, xx_i)
+INSN_LASX(xvsrlni_d_q, xx_i)
+INSN_LASX(xvsrani_b_h, xx_i)
+INSN_LASX(xvsrani_h_w, xx_i)
+INSN_LASX(xvsrani_w_d, xx_i)
+INSN_LASX(xvsrani_d_q, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6bb30ddd31..c41f8e2bc9 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -958,3 +958,19 @@ DEF_HELPER_4(xvsrari_b, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrari_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrari_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrari_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvsrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsran_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvsrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrani_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 602ba0c800..9a3c2114eb 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2052,6 +2052,22 @@ TRANS(xvsrari_h, gen_xx_i, gen_helper_xvsrari_h)
TRANS(xvsrari_w, gen_xx_i, gen_helper_xvsrari_w)
TRANS(xvsrari_d, gen_xx_i, gen_helper_xvsrari_d)
+TRANS(xvsrln_b_h, gen_xxx, gen_helper_xvsrln_b_h)
+TRANS(xvsrln_h_w, gen_xxx, gen_helper_xvsrln_h_w)
+TRANS(xvsrln_w_d, gen_xxx, gen_helper_xvsrln_w_d)
+TRANS(xvsran_b_h, gen_xxx, gen_helper_xvsran_b_h)
+TRANS(xvsran_h_w, gen_xxx, gen_helper_xvsran_h_w)
+TRANS(xvsran_w_d, gen_xxx, gen_helper_xvsran_w_d)
+
+TRANS(xvsrlni_b_h, gen_xx_i, gen_helper_xvsrlni_b_h)
+TRANS(xvsrlni_h_w, gen_xx_i, gen_helper_xvsrlni_h_w)
+TRANS(xvsrlni_w_d, gen_xx_i, gen_helper_xvsrlni_w_d)
+TRANS(xvsrlni_d_q, gen_xx_i, gen_helper_xvsrlni_d_q)
+TRANS(xvsrani_b_h, gen_xx_i, gen_helper_xvsrani_b_h)
+TRANS(xvsrani_h_w, gen_xx_i, gen_helper_xvsrani_h_w)
+TRANS(xvsrani_w_d, gen_xx_i, gen_helper_xvsrani_w_d)
+TRANS(xvsrani_d_q, gen_xx_i, gen_helper_xvsrani_d_q)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d901ddf063..45f15e3be2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1320,6 +1320,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx_ui4 .... ........ ..... . imm:4 xj:5 xd:5 &xx_i
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
@xx_ui6 .... ........ .... imm:6 xj:5 xd:5 &xx_i
+@xx_ui7 .... ........ ... imm:7 xj:5 xd:5 &xx_i
@xx_ui8 .... ........ .. imm:8 xj:5 xd:5 &xx_i
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
@@ -1700,6 +1701,22 @@ xvsrari_h 0111 01101010 10000 1 .... ..... ..... @xx_ui4
xvsrari_w 0111 01101010 10001 ..... ..... ..... @xx_ui5
xvsrari_d 0111 01101010 1001 ...... ..... ..... @xx_ui6
+xvsrln_b_h 0111 01001111 01001 ..... ..... ..... @xxx
+xvsrln_h_w 0111 01001111 01010 ..... ..... ..... @xxx
+xvsrln_w_d 0111 01001111 01011 ..... ..... ..... @xxx
+xvsran_b_h 0111 01001111 01101 ..... ..... ..... @xxx
+xvsran_h_w 0111 01001111 01110 ..... ..... ..... @xxx
+xvsran_w_d 0111 01001111 01111 ..... ..... ..... @xxx
+
+xvsrlni_b_h 0111 01110100 00000 1 .... ..... ..... @xx_ui4
+xvsrlni_h_w 0111 01110100 00001 ..... ..... ..... @xx_ui5
+xvsrlni_w_d 0111 01110100 0001 ...... ..... ..... @xx_ui6
+xvsrlni_d_q 0111 01110100 001 ....... ..... ..... @xx_ui7
+xvsrani_b_h 0111 01110101 10000 1 .... ..... ..... @xx_ui4
+xvsrani_h_w 0111 01110101 10001 ..... ..... ..... @xx_ui5
+xvsrani_w_d 0111 01110101 1001 ...... ..... ..... @xx_ui6
+xvsrani_d_q 0111 01110101 101 ....... ..... ..... @xx_ui7
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index ebbbf014f7..02550646d7 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -964,3 +964,131 @@ XVSRARI(xvsrari_b, 8, XB)
XVSRARI(xvsrari_h, 16, XH)
XVSRARI(xvsrari_w, 32, XW)
XVSRARI(xvsrari_d, 64, XD)
+
+#define XVSRLN(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = R_SHIFT(Xj->E2(i), (Xk->E2(i)) % BIT); \
+ Xd->E1(i + max * 2) = R_SHIFT(Xj->E2(i + max), \
+ Xk->E2(i + max) % BIT); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSRLN(xvsrln_b_h, 16, XB, UXH)
+XVSRLN(xvsrln_h_w, 32, XH, UXW)
+XVSRLN(xvsrln_w_d, 64, XW, UXD)
+
+#define XVSRAN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = R_SHIFT(Xj->E2(i), (Xk->E3(i)) % BIT); \
+ Xd->E1(i + max * 2) = R_SHIFT(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSRAN(xvsran_b_h, 16, XB, XH, UXH)
+XVSRAN(xvsran_h_w, 32, XH, XW, UXW)
+XVSRAN(xvsran_w_d, 64, XW, XD, UXD)
+
+#define XVSRLNI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ temp.XQ(0) = int128_zero(); \
+ temp.XQ(1) = int128_zero(); \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = R_SHIFT(Xj->E2(i), imm); \
+ temp.E1(i + max) = R_SHIFT(Xd->E2(i), imm); \
+ temp.E1(i + max * 2) = R_SHIFT(Xj->E2(i + max), imm); \
+ temp.E1(i + max * 3) = R_SHIFT(Xd->E2(i + max), imm); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvsrlni_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ temp.XQ(0) = int128_zero();
+ temp.XQ(1) = int128_zero();
+ temp.XD(0) = int128_getlo(int128_urshift(Xj->XQ(0), imm % 128));
+ temp.XD(1) = int128_getlo(int128_urshift(Xd->XQ(0), imm % 128));
+ temp.XD(2) = int128_getlo(int128_urshift(Xj->XQ(1), imm % 128));
+ temp.XD(3) = int128_getlo(int128_urshift(Xd->XQ(1), imm % 128));
+ *Xd = temp;
+}
+
+XVSRLNI(xvsrlni_b_h, 16, XB, UXH)
+XVSRLNI(xvsrlni_h_w, 32, XH, UXW)
+XVSRLNI(xvsrlni_w_d, 64, XW, UXD)
+
+#define XVSRANI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ temp.XQ(0) = int128_zero(); \
+ temp.XQ(1) = int128_zero(); \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = R_SHIFT(Xj->E2(i), imm); \
+ temp.E1(i + max) = R_SHIFT(Xd->E2(i), imm); \
+ temp.E1(i + max * 2) = R_SHIFT(Xj->E2(i + max), imm); \
+ temp.E1(i + max * 3) = R_SHIFT(Xd->E2(i + max), imm); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvsrani_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ temp.XQ(0) = int128_zero();
+ temp.XQ(1) = int128_zero();
+ temp.XD(0) = int128_getlo(int128_rshift(Xj->XQ(0), imm % 128));
+ temp.XD(1) = int128_getlo(int128_rshift(Xd->XQ(0), imm % 128));
+ temp.XD(2) = int128_getlo(int128_rshift(Xj->XQ(1), imm % 128));
+ temp.XD(3) = int128_getlo(int128_rshift(Xd->XQ(1), imm % 128));
+ *Xd = temp;
+}
+
+XVSRANI(xvsrani_b_h, 16, XB, XH)
+XVSRANI(xvsrani_h_w, 32, XH, XW)
+XVSRANI(xvsrani_w_d, 64, XW, XD)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index e64155f38c..d21e4006f2 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -922,8 +922,6 @@ VSRARI(vsrari_h, 16, H)
VSRARI(vsrari_w, 32, W)
VSRARI(vsrari_d, 64, D)
-#define R_SHIFT(a, b) (a >> b)
-
#define VSRLN(NAME, BIT, T, E1, E2) \
void HELPER(NAME)(CPULoongArchState *env, \
uint32_t vd, uint32_t vj, uint32_t vk) \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index d5a880b3fd..b5cdb4b470 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -75,6 +75,8 @@
#define DO_SIGNCOV(a, b) (a == 0 ? 0 : a < 0 ? -b : b)
+#define R_SHIFT(a, b) (a >> b)
+
uint64_t do_vmskltz_b(int64_t val);
uint64_t do_vmskltz_h(int64_t val);
uint64_t do_vmskltz_w(int64_t val);
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 29/46] target/loongarch: Implement xvsrlrn xvsrarn
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (27 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 28/46] target/loongarch: Implement xvsrln xvsran Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 30/46] target/loongarch: Implement xvssrln xvssran Song Gao
` (16 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSRLRN.{B.H/H.W/W.D};
- XVSRARN.{B.H/H.W/W.D};
- XVSRLRNI.{B.H/H.W/W.D/D.Q};
- XVSRARNI.{B.H/H.W/W.D/D.Q}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 16 ++
target/loongarch/helper.h | 16 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 16 ++
target/loongarch/insns.decode | 16 ++
target/loongarch/lasx_helper.c | 150 +++++++++++++++++++
5 files changed, 214 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5ea713075f..515d99aa1f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2119,6 +2119,22 @@ INSN_LASX(xvsrani_h_w, xx_i)
INSN_LASX(xvsrani_w_d, xx_i)
INSN_LASX(xvsrani_d_q, xx_i)
+INSN_LASX(xvsrlrn_b_h, xxx)
+INSN_LASX(xvsrlrn_h_w, xxx)
+INSN_LASX(xvsrlrn_w_d, xxx)
+INSN_LASX(xvsrarn_b_h, xxx)
+INSN_LASX(xvsrarn_h_w, xxx)
+INSN_LASX(xvsrarn_w_d, xxx)
+
+INSN_LASX(xvsrlrni_b_h, xx_i)
+INSN_LASX(xvsrlrni_h_w, xx_i)
+INSN_LASX(xvsrlrni_w_d, xx_i)
+INSN_LASX(xvsrlrni_d_q, xx_i)
+INSN_LASX(xvsrarni_b_h, xx_i)
+INSN_LASX(xvsrarni_h_w, xx_i)
+INSN_LASX(xvsrarni_w_d, xx_i)
+INSN_LASX(xvsrarni_d_q, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index c41f8e2bc9..09ae21edd6 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -974,3 +974,19 @@ DEF_HELPER_4(xvsrani_b_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrani_h_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrani_w_d, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrani_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvsrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarn_w_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvsrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvsrarni_d_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 9a3c2114eb..5cd241bafa 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2068,6 +2068,22 @@ TRANS(xvsrani_h_w, gen_xx_i, gen_helper_xvsrani_h_w)
TRANS(xvsrani_w_d, gen_xx_i, gen_helper_xvsrani_w_d)
TRANS(xvsrani_d_q, gen_xx_i, gen_helper_xvsrani_d_q)
+TRANS(xvsrlrn_b_h, gen_xxx, gen_helper_xvsrlrn_b_h)
+TRANS(xvsrlrn_h_w, gen_xxx, gen_helper_xvsrlrn_h_w)
+TRANS(xvsrlrn_w_d, gen_xxx, gen_helper_xvsrlrn_w_d)
+TRANS(xvsrarn_b_h, gen_xxx, gen_helper_xvsrarn_b_h)
+TRANS(xvsrarn_h_w, gen_xxx, gen_helper_xvsrarn_h_w)
+TRANS(xvsrarn_w_d, gen_xxx, gen_helper_xvsrarn_w_d)
+
+TRANS(xvsrlrni_b_h, gen_xx_i, gen_helper_xvsrlrni_b_h)
+TRANS(xvsrlrni_h_w, gen_xx_i, gen_helper_xvsrlrni_h_w)
+TRANS(xvsrlrni_w_d, gen_xx_i, gen_helper_xvsrlrni_w_d)
+TRANS(xvsrlrni_d_q, gen_xx_i, gen_helper_xvsrlrni_d_q)
+TRANS(xvsrarni_b_h, gen_xx_i, gen_helper_xvsrarni_b_h)
+TRANS(xvsrarni_h_w, gen_xx_i, gen_helper_xvsrarni_h_w)
+TRANS(xvsrarni_w_d, gen_xx_i, gen_helper_xvsrarni_w_d)
+TRANS(xvsrarni_d_q, gen_xx_i, gen_helper_xvsrarni_d_q)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 45f15e3be2..0273576ada 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1717,6 +1717,22 @@ xvsrani_h_w 0111 01110101 10001 ..... ..... ..... @xx_ui5
xvsrani_w_d 0111 01110101 1001 ...... ..... ..... @xx_ui6
xvsrani_d_q 0111 01110101 101 ....... ..... ..... @xx_ui7
+xvsrlrn_b_h 0111 01001111 10001 ..... ..... ..... @xxx
+xvsrlrn_h_w 0111 01001111 10010 ..... ..... ..... @xxx
+xvsrlrn_w_d 0111 01001111 10011 ..... ..... ..... @xxx
+xvsrarn_b_h 0111 01001111 10101 ..... ..... ..... @xxx
+xvsrarn_h_w 0111 01001111 10110 ..... ..... ..... @xxx
+xvsrarn_w_d 0111 01001111 10111 ..... ..... ..... @xxx
+
+xvsrlrni_b_h 0111 01110100 01000 1 .... ..... ..... @xx_ui4
+xvsrlrni_h_w 0111 01110100 01001 ..... ..... ..... @xx_ui5
+xvsrlrni_w_d 0111 01110100 0101 ...... ..... ..... @xx_ui6
+xvsrlrni_d_q 0111 01110100 011 ....... ..... ..... @xx_ui7
+xvsrarni_b_h 0111 01110101 11000 1 .... ..... ..... @xx_ui4
+xvsrarni_h_w 0111 01110101 11001 ..... ..... ..... @xx_ui5
+xvsrarni_w_d 0111 01110101 1101 ...... ..... ..... @xx_ui6
+xvsrarni_d_q 0111 01110101 111 ....... ..... ..... @xx_ui7
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 02550646d7..b0d5f93a97 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -1092,3 +1092,153 @@ void HELPER(xvsrani_d_q)(CPULoongArchState *env,
XVSRANI(xvsrani_b_h, 16, XB, XH)
XVSRANI(xvsrani_h_w, 32, XH, XW)
XVSRANI(xvsrani_w_d, 64, XW, XD)
+
+#define XVSRLRN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xvsrlr_ ## E2(Xj->E2(i), (Xk->E3(i)) % BIT); \
+ Xd->E1(i + max * 2) = do_xvsrlr_## E2(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSRLRN(xvsrlrn_b_h, 16, XB, XH, UXH)
+XVSRLRN(xvsrlrn_h_w, 32, XH, XW, UXW)
+XVSRLRN(xvsrlrn_w_d, 64, XW, XD, UXD)
+
+#define XVSRARN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xvsrar_ ## E2(Xj->E2(i), (Xk->E3(i)) % BIT); \
+ Xd->E1(i + max * 2) = do_xvsrar_## E2(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSRARN(xvsrarn_b_h, 16, XB, XH, UXH)
+XVSRARN(xvsrarn_h_w, 32, XH, XW, UXW)
+XVSRARN(xvsrarn_w_d, 64, XW, XD, UXD)
+
+#define XVSRLRNI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ temp.XQ(0) = int128_zero(); \
+ temp.XQ(1) = int128_zero(); \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xvsrlr_ ## E2(Xj->E2(i), imm); \
+ temp.E1(i + max) = do_xvsrlr_ ## E2(Xd->E2(i), imm); \
+ temp.E1(i + max * 2) = do_xvsrlr_## E2(Xj->E2(i + max), imm); \
+ temp.E1(i + max * 3) = do_xvsrlr_## E2(Xd->E2(i + max), imm); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvsrlrni_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ Int128 r1, r2, r3, r4;
+
+ if (imm == 0) {
+ temp.XD(0) = int128_getlo(Xj->XQ(0));
+ temp.XD(1) = int128_getlo(Xd->XQ(0));
+ temp.XD(2) = int128_getlo(Xj->XQ(1));
+ temp.XD(3) = int128_getlo(Xd->XQ(1));
+ } else {
+ r1 = int128_and(int128_urshift(Xj->XQ(0), (imm - 1)), int128_one());
+ r2 = int128_and(int128_urshift(Xd->XQ(0), (imm - 1)), int128_one());
+ r3 = int128_and(int128_urshift(Xj->XQ(1), (imm - 1)), int128_one());
+ r4 = int128_and(int128_urshift(Xd->XQ(1), (imm - 1)), int128_one());
+
+ temp.XD(0) = int128_getlo(int128_add(int128_urshift(Xj->XQ(0), imm), r1));
+ temp.XD(1) = int128_getlo(int128_add(int128_urshift(Xd->XQ(0), imm), r2));
+ temp.XD(2) = int128_getlo(int128_add(int128_urshift(Xj->XQ(1), imm), r3));
+ temp.XD(3) = int128_getlo(int128_add(int128_urshift(Xd->XQ(1), imm), r4));
+ }
+ *Xd = temp;
+}
+
+XVSRLRNI(xvsrlrni_b_h, 16, XB, XH)
+XVSRLRNI(xvsrlrni_h_w, 32, XH, XW)
+XVSRLRNI(xvsrlrni_w_d, 64, XW, XD)
+
+#define XVSRARNI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ temp.XQ(0) = int128_zero(); \
+ temp.XQ(1) = int128_zero(); \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xvsrar_ ## E2(Xj->E2(i), imm); \
+ temp.E1(i + max) = do_xvsrar_ ## E2(Xd->E2(i), imm); \
+ temp.E1(i + max * 2) = do_xvsrar_## E2(Xj->E2(i + max), imm); \
+ temp.E1(i + max * 3) = do_xvsrar_## E2(Xd->E2(i + max), imm); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvsrarni_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ Int128 r1, r2, r3, r4;
+
+ if (imm == 0) {
+ temp.XD(0) = int128_getlo(Xj->XQ(0));
+ temp.XD(1) = int128_getlo(Xd->XQ(0));
+ temp.XD(2) = int128_getlo(Xj->XQ(1));
+ temp.XD(3) = int128_getlo(Xd->XQ(1));
+ } else {
+ r1 = int128_and(int128_rshift(Xj->XQ(0), (imm - 1)), int128_one());
+ r2 = int128_and(int128_rshift(Xd->XQ(0), (imm - 1)), int128_one());
+ r3 = int128_and(int128_rshift(Xj->XQ(1), (imm - 1)), int128_one());
+ r4 = int128_and(int128_rshift(Xd->XQ(1), (imm - 1)), int128_one());
+
+ temp.XD(0) = int128_getlo(int128_add(int128_rshift(Xj->XQ(0), imm), r1));
+ temp.XD(1) = int128_getlo(int128_add(int128_rshift(Xd->XQ(0), imm), r2));
+ temp.XD(2) = int128_getlo(int128_add(int128_rshift(Xj->XQ(1), imm), r3));
+ temp.XD(3) = int128_getlo(int128_add(int128_rshift(Xd->XQ(1), imm), r4));
+ }
+ *Xd = temp;
+}
+
+XVSRARNI(xvsrarni_b_h, 16, XB, XH)
+XVSRARNI(xvsrarni_h_w, 32, XH, XW)
+XVSRARNI(xvsrarni_w_d, 64, XW, XD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 30/46] target/loongarch: Implement xvssrln xvssran
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (28 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 29/46] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 31/46] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
` (15 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSSRLN.{B.H/H.W/W.D};
- XVSSRAN.{B.H/H.W/W.D};
- XVSSRLN.{BU.H/HU.W/WU.D};
- XVSSRAN.{BU.H/HU.W/WU.D};
- XVSSRLNI.{B.H/H.W/W.D/D.Q};
- XVSSRANI.{B.H/H.W/W.D/D.Q};
- XVSSRLNI.{BU.H/HU.W/WU.D/DU.Q};
- XVSSRANI.{BU.H/HU.W/WU.D/DU.Q}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 30 ++
target/loongarch/helper.h | 30 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 30 ++
target/loongarch/insns.decode | 30 ++
target/loongarch/lasx_helper.c | 428 +++++++++++++++++++
5 files changed, 548 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 515d99aa1f..1f40f3aaca 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2135,6 +2135,36 @@ INSN_LASX(xvsrarni_h_w, xx_i)
INSN_LASX(xvsrarni_w_d, xx_i)
INSN_LASX(xvsrarni_d_q, xx_i)
+INSN_LASX(xvssrln_b_h, xxx)
+INSN_LASX(xvssrln_h_w, xxx)
+INSN_LASX(xvssrln_w_d, xxx)
+INSN_LASX(xvssran_b_h, xxx)
+INSN_LASX(xvssran_h_w, xxx)
+INSN_LASX(xvssran_w_d, xxx)
+INSN_LASX(xvssrln_bu_h, xxx)
+INSN_LASX(xvssrln_hu_w, xxx)
+INSN_LASX(xvssrln_wu_d, xxx)
+INSN_LASX(xvssran_bu_h, xxx)
+INSN_LASX(xvssran_hu_w, xxx)
+INSN_LASX(xvssran_wu_d, xxx)
+
+INSN_LASX(xvssrlni_b_h, xx_i)
+INSN_LASX(xvssrlni_h_w, xx_i)
+INSN_LASX(xvssrlni_w_d, xx_i)
+INSN_LASX(xvssrlni_d_q, xx_i)
+INSN_LASX(xvssrani_b_h, xx_i)
+INSN_LASX(xvssrani_h_w, xx_i)
+INSN_LASX(xvssrani_w_d, xx_i)
+INSN_LASX(xvssrani_d_q, xx_i)
+INSN_LASX(xvssrlni_bu_h, xx_i)
+INSN_LASX(xvssrlni_hu_w, xx_i)
+INSN_LASX(xvssrlni_wu_d, xx_i)
+INSN_LASX(xvssrlni_du_q, xx_i)
+INSN_LASX(xvssrani_bu_h, xx_i)
+INSN_LASX(xvssrani_hu_w, xx_i)
+INSN_LASX(xvssrani_wu_d, xx_i)
+INSN_LASX(xvssrani_du_q, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 09ae21edd6..2d76916049 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -990,3 +990,33 @@ DEF_HELPER_4(xvsrarni_b_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrarni_h_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrarni_w_d, void, env, i32, i32, i32)
DEF_HELPER_4(xvsrarni_d_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvssrln_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrln_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrln_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssran_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssran_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssran_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrln_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrln_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrln_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssran_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssran_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssran_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvssrlni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrani_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 5cd241bafa..b6c2ced30c 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2084,6 +2084,36 @@ TRANS(xvsrarni_h_w, gen_xx_i, gen_helper_xvsrarni_h_w)
TRANS(xvsrarni_w_d, gen_xx_i, gen_helper_xvsrarni_w_d)
TRANS(xvsrarni_d_q, gen_xx_i, gen_helper_xvsrarni_d_q)
+TRANS(xvssrln_b_h, gen_xxx, gen_helper_xvssrln_b_h)
+TRANS(xvssrln_h_w, gen_xxx, gen_helper_xvssrln_h_w)
+TRANS(xvssrln_w_d, gen_xxx, gen_helper_xvssrln_w_d)
+TRANS(xvssran_b_h, gen_xxx, gen_helper_xvssran_b_h)
+TRANS(xvssran_h_w, gen_xxx, gen_helper_xvssran_h_w)
+TRANS(xvssran_w_d, gen_xxx, gen_helper_xvssran_w_d)
+TRANS(xvssrln_bu_h, gen_xxx, gen_helper_xvssrln_bu_h)
+TRANS(xvssrln_hu_w, gen_xxx, gen_helper_xvssrln_hu_w)
+TRANS(xvssrln_wu_d, gen_xxx, gen_helper_xvssrln_wu_d)
+TRANS(xvssran_bu_h, gen_xxx, gen_helper_xvssran_bu_h)
+TRANS(xvssran_hu_w, gen_xxx, gen_helper_xvssran_hu_w)
+TRANS(xvssran_wu_d, gen_xxx, gen_helper_xvssran_wu_d)
+
+TRANS(xvssrlni_b_h, gen_xx_i, gen_helper_xvssrlni_b_h)
+TRANS(xvssrlni_h_w, gen_xx_i, gen_helper_xvssrlni_h_w)
+TRANS(xvssrlni_w_d, gen_xx_i, gen_helper_xvssrlni_w_d)
+TRANS(xvssrlni_d_q, gen_xx_i, gen_helper_xvssrlni_d_q)
+TRANS(xvssrani_b_h, gen_xx_i, gen_helper_xvssrani_b_h)
+TRANS(xvssrani_h_w, gen_xx_i, gen_helper_xvssrani_h_w)
+TRANS(xvssrani_w_d, gen_xx_i, gen_helper_xvssrani_w_d)
+TRANS(xvssrani_d_q, gen_xx_i, gen_helper_xvssrani_d_q)
+TRANS(xvssrlni_bu_h, gen_xx_i, gen_helper_xvssrlni_bu_h)
+TRANS(xvssrlni_hu_w, gen_xx_i, gen_helper_xvssrlni_hu_w)
+TRANS(xvssrlni_wu_d, gen_xx_i, gen_helper_xvssrlni_wu_d)
+TRANS(xvssrlni_du_q, gen_xx_i, gen_helper_xvssrlni_du_q)
+TRANS(xvssrani_bu_h, gen_xx_i, gen_helper_xvssrani_bu_h)
+TRANS(xvssrani_hu_w, gen_xx_i, gen_helper_xvssrani_hu_w)
+TRANS(xvssrani_wu_d, gen_xx_i, gen_helper_xvssrani_wu_d)
+TRANS(xvssrani_du_q, gen_xx_i, gen_helper_xvssrani_du_q)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0273576ada..cf3803c230 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1733,6 +1733,36 @@ xvsrarni_h_w 0111 01110101 11001 ..... ..... ..... @xx_ui5
xvsrarni_w_d 0111 01110101 1101 ...... ..... ..... @xx_ui6
xvsrarni_d_q 0111 01110101 111 ....... ..... ..... @xx_ui7
+xvssrln_b_h 0111 01001111 11001 ..... ..... ..... @xxx
+xvssrln_h_w 0111 01001111 11010 ..... ..... ..... @xxx
+xvssrln_w_d 0111 01001111 11011 ..... ..... ..... @xxx
+xvssran_b_h 0111 01001111 11101 ..... ..... ..... @xxx
+xvssran_h_w 0111 01001111 11110 ..... ..... ..... @xxx
+xvssran_w_d 0111 01001111 11111 ..... ..... ..... @xxx
+xvssrln_bu_h 0111 01010000 01001 ..... ..... ..... @xxx
+xvssrln_hu_w 0111 01010000 01010 ..... ..... ..... @xxx
+xvssrln_wu_d 0111 01010000 01011 ..... ..... ..... @xxx
+xvssran_bu_h 0111 01010000 01101 ..... ..... ..... @xxx
+xvssran_hu_w 0111 01010000 01110 ..... ..... ..... @xxx
+xvssran_wu_d 0111 01010000 01111 ..... ..... ..... @xxx
+
+xvssrlni_b_h 0111 01110100 10000 1 .... ..... ..... @xx_ui4
+xvssrlni_h_w 0111 01110100 10001 ..... ..... ..... @xx_ui5
+xvssrlni_w_d 0111 01110100 1001 ...... ..... ..... @xx_ui6
+xvssrlni_d_q 0111 01110100 101 ....... ..... ..... @xx_ui7
+xvssrani_b_h 0111 01110110 00000 1 .... ..... ..... @xx_ui4
+xvssrani_h_w 0111 01110110 00001 ..... ..... ..... @xx_ui5
+xvssrani_w_d 0111 01110110 0001 ...... ..... ..... @xx_ui6
+xvssrani_d_q 0111 01110110 001 ....... ..... ..... @xx_ui7
+xvssrlni_bu_h 0111 01110100 11000 1 .... ..... ..... @xx_ui4
+xvssrlni_hu_w 0111 01110100 11001 ..... ..... ..... @xx_ui5
+xvssrlni_wu_d 0111 01110100 1101 ...... ..... ..... @xx_ui6
+xvssrlni_du_q 0111 01110100 111 ....... ..... ..... @xx_ui7
+xvssrani_bu_h 0111 01110110 01000 1 .... ..... ..... @xx_ui4
+xvssrani_hu_w 0111 01110110 01001 ..... ..... ..... @xx_ui5
+xvssrani_wu_d 0111 01110110 0101 ...... ..... ..... @xx_ui6
+xvssrani_du_q 0111 01110110 011 ....... ..... ..... @xx_ui7
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index b0d5f93a97..b42f412c02 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -1242,3 +1242,431 @@ void HELPER(xvsrarni_d_q)(CPULoongArchState *env,
XVSRARNI(xvsrarni_b_h, 16, XB, XH)
XVSRARNI(xvsrarni_h_w, 32, XH, XW)
XVSRARNI(xvsrarni_w_d, 64, XW, XD)
+
+#define XSSRLNS(NAME, T1, T2, T3) \
+static T1 do_xssrlns_ ## NAME(T2 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ if (sa == 0) { \
+ shft_res = e2; \
+ } else { \
+ shft_res = (((T1)e2) >> sa); \
+ } \
+ T3 mask; \
+ mask = (1ull << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRLNS(XB, uint16_t, int16_t, uint8_t)
+XSSRLNS(XH, uint32_t, int32_t, uint16_t)
+XSSRLNS(XW, uint64_t, int64_t, uint32_t)
+
+#define XVSSRLN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrlns_ ## E1(Xj->E2(i), \
+ Xk->E3(i) % BIT, (BIT / 2) - 1); \
+ Xd->E1(i + max * 2) = do_xssrlns_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ (BIT / 2) - 1); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRLN(xvssrln_b_h, 16, XB, XH, UXH)
+XVSSRLN(xvssrln_h_w, 32, XH, XW, UXW)
+XVSSRLN(xvssrln_w_d, 64, XW, XD, UXD)
+
+#define XSSRANS(E, T1, T2) \
+static T1 do_xssrans_ ## E(T1 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ if (sa == 0) { \
+ shft_res = e2; \
+ } else { \
+ shft_res = e2 >> sa; \
+ } \
+ T2 mask; \
+ mask = (1ll << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else if (shft_res < -(mask + 1)) { \
+ return ~mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRANS(XB, int16_t, int8_t)
+XSSRANS(XH, int32_t, int16_t)
+XSSRANS(XW, int64_t, int32_t)
+
+#define XVSSRAN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrans_ ## E1(Xj->E2(i), \
+ Xk->E3(i) % BIT, (BIT / 2) - 1); \
+ Xd->E1(i + max * 2) = do_xssrans_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ (BIT / 2) - 1); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRAN(xvssran_b_h, 16, XB, XH, UXH)
+XVSSRAN(xvssran_h_w, 32, XH, XW, UXW)
+XVSSRAN(xvssran_w_d, 64, XW, XD, UXD)
+
+#define XSSRLNU(E, T1, T2, T3) \
+static T1 do_xssrlnu_ ## E(T3 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ if (sa == 0) { \
+ shft_res = e2; \
+ } else { \
+ shft_res = (((T1)e2) >> sa); \
+ } \
+ T2 mask; \
+ mask = (1ull << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRLNU(XB, uint16_t, uint8_t, int16_t)
+XSSRLNU(XH, uint32_t, uint16_t, int32_t)
+XSSRLNU(XW, uint64_t, uint32_t, int64_t)
+
+#define XVSSRLNU(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrlnu_ ## E1(Xj->E2(i), Xk->E3(i) % BIT, BIT / 2); \
+ Xd->E1(i + max * 2) = do_xssrlnu_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ BIT / 2); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRLNU(xvssrln_bu_h, 16, XB, XH, UXH)
+XVSSRLNU(xvssrln_hu_w, 32, XH, XW, UXW)
+XVSSRLNU(xvssrln_wu_d, 64, XW, XD, UXD)
+
+#define XSSRANU(E, T1, T2, T3) \
+static T1 do_xssranu_ ## E(T3 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ if (sa == 0) { \
+ shft_res = e2; \
+ } else { \
+ shft_res = e2 >> sa; \
+ } \
+ if (e2 < 0) { \
+ shft_res = 0; \
+ } \
+ T2 mask; \
+ mask = (1ull << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRANU(XB, uint16_t, uint8_t, int16_t)
+XSSRANU(XH, uint32_t, uint16_t, int32_t)
+XSSRANU(XW, uint64_t, uint32_t, int64_t)
+
+#define XVSSRANU(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssranu_ ## E1(Xj->E2(i), Xk->E3(i) % BIT, BIT / 2); \
+ Xd->E1(i + max * 2) = do_xssranu_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ BIT / 2); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRANU(xvssran_bu_h, 16, XB, XH, UXH)
+XVSSRANU(xvssran_hu_w, 32, XH, XW, UXW)
+XVSSRANU(xvssran_wu_d, 64, XW, XD, UXD)
+
+#define XVSSRLNI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrlns_ ## E1(Xj->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max) = do_xssrlns_ ## E1(Xd->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 2) = do_xssrlns_## E1(Xj->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 3) = do_xssrlns_## E1(Xd->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvssrlni_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ int i;
+ Int128 shft_res[4], mask;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ if (imm == 0) {
+ shft_res[0] = Xj->XQ(0);
+ shft_res[1] = Xd->XQ(0);
+ shft_res[2] = Xj->XQ(1);
+ shft_res[3] = Xd->XQ(1);
+ } else {
+ shft_res[0] = int128_urshift(Xj->XQ(0), imm);
+ shft_res[1] = int128_urshift(Xd->XQ(0), imm);
+ shft_res[2] = int128_urshift(Xj->XQ(1), imm);
+ shft_res[3] = int128_urshift(Xd->XQ(1), imm);
+ }
+ mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
+
+ for (i = 0; i < 4; i++) {
+ if (int128_ult(mask, shft_res[i])) {
+ Xd->XD(i) = int128_getlo(mask);
+ } else {
+ Xd->XD(i) = int128_getlo(shft_res[i]);
+ }
+ }
+}
+
+XVSSRLNI(xvssrlni_b_h, 16, XB, XH)
+XVSSRLNI(xvssrlni_h_w, 32, XH, XW)
+XVSSRLNI(xvssrlni_w_d, 64, XW, XD)
+
+#define XVSSRANI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrans_ ## E1(Xj->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max) = do_xssrans_ ## E1(Xd->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 2) = do_xssrans_## E1(Xj->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 3) = do_xssrans_## E1(Xd->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvssrani_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ int i;
+ Int128 shft_res[4], mask, min;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ if (imm == 0) {
+ shft_res[0] = Xj->XQ(0);
+ shft_res[1] = Xd->XQ(0);
+ shft_res[2] = Xj->XQ(1);
+ shft_res[3] = Xd->XQ(1);
+ } else {
+ shft_res[0] = int128_rshift(Xj->XQ(0), imm);
+ shft_res[1] = int128_rshift(Xd->XQ(0), imm);
+ shft_res[2] = int128_rshift(Xj->XQ(1), imm);
+ shft_res[3] = int128_rshift(Xd->XQ(1), imm);
+ }
+ mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
+ min = int128_lshift(int128_one(), 63);
+
+ for (i = 0; i < 4; i++) {
+ if (int128_gt(shft_res[i], mask)) {
+ Xd->XD(i) = int128_getlo(mask);
+ } else if (int128_lt(shft_res[i], int128_neg(min))) {
+ Xd->XD(i) = int128_getlo(min);
+ } else {
+ Xd->XD(i) = int128_getlo(shft_res[i]);
+ }
+ }
+}
+
+XVSSRANI(xvssrani_b_h, 16, XB, XH)
+XVSSRANI(xvssrani_h_w, 32, XH, XW)
+XVSSRANI(xvssrani_w_d, 64, XW, XD)
+
+#define XVSSRLNUI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrlnu_ ## E1(Xj->E2(i), imm, BIT / 2); \
+ temp.E1(i + max) = do_xssrlnu_ ## E1(Xd->E2(i), imm, BIT / 2); \
+ temp.E1(i + max * 2) = do_xssrlnu_## E1(Xj->E2(i + max), \
+ imm, BIT / 2); \
+ temp.E1(i + max * 3) = do_xssrlnu_## E1(Xd->E2(i + max), \
+ imm, BIT / 2); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvssrlni_du_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ int i;
+ Int128 shft_res[4], mask;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ if (imm == 0) {
+ shft_res[0] = Xj->XQ(0);
+ shft_res[1] = Xd->XQ(0);
+ shft_res[2] = Xj->XQ(1);
+ shft_res[3] = Xd->XQ(1);
+ } else {
+ shft_res[0] = int128_urshift(Xj->XQ(0), imm);
+ shft_res[1] = int128_urshift(Xd->XQ(0), imm);
+ shft_res[2] = int128_urshift(Xj->XQ(1), imm);
+ shft_res[3] = int128_urshift(Xd->XQ(1), imm);
+ }
+ mask = int128_sub(int128_lshift(int128_one(), 64), int128_one());
+
+ for (i = 0; i < 4; i++) {
+ if (int128_ult(mask, shft_res[i])) {
+ Xd->XD(i) = int128_getlo(mask);
+ } else {
+ Xd->XD(i) = int128_getlo(shft_res[i]);
+ }
+ }
+}
+
+XVSSRLNUI(xvssrlni_bu_h, 16, XB, XH)
+XVSSRLNUI(xvssrlni_hu_w, 32, XH, XW)
+XVSSRLNUI(xvssrlni_wu_d, 64, XW, XD)
+
+#define XVSSRANUI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssranu_ ## E1(Xj->E2(i), imm, BIT / 2); \
+ temp.E1(i + max) = do_xssranu_ ## E1(Xd->E2(i), imm, BIT / 2); \
+ temp.E1(i + max * 2) = do_xssranu_## E1(Xj->E2(i + max), \
+ imm, BIT / 2); \
+ temp.E1(i + max * 3) = do_xssranu_## E1(Xd->E2(i + max), \
+ imm, BIT / 2); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvssrani_du_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ int i;
+ Int128 shft_res[4], mask;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ if (imm == 0) {
+ shft_res[0] = Xj->XQ(0);
+ shft_res[1] = Xd->XQ(0);
+ shft_res[2] = Xj->XQ(1);
+ shft_res[3] = Xd->XQ(1);
+ } else {
+ shft_res[0] = int128_rshift(Xj->XQ(0), imm);
+ shft_res[1] = int128_rshift(Xd->XQ(0), imm);
+ shft_res[2] = int128_rshift(Xj->XQ(1), imm);
+ shft_res[3] = int128_rshift(Xd->XQ(1), imm);
+ }
+
+ if (int128_lt(Xj->XQ(0), int128_zero())) {
+ shft_res[0] = int128_zero();
+ }
+ if (int128_lt(Xd->XQ(0), int128_zero())) {
+ shft_res[1] = int128_zero();
+ }
+ if (int128_lt(Xj->XQ(1), int128_zero())) {
+ shft_res[2] = int128_zero();
+ }
+ if (int128_lt(Xd->XQ(1), int128_zero())) {
+ shft_res[3] = int128_zero();
+ }
+
+ mask = int128_sub(int128_lshift(int128_one(), 64), int128_one());
+
+ for (i = 0; i < 4; i++) {
+ if (int128_ult(mask, shft_res[i])) {
+ Xd->XD(i) = int128_getlo(mask);
+ } else {
+ Xd->XD(i) = int128_getlo(shft_res[i]);
+ }
+ }
+}
+
+XVSSRANUI(xvssrani_bu_h, 16, XB, XH)
+XVSSRANUI(xvssrani_hu_w, 32, XH, XW)
+XVSSRANUI(xvssrani_wu_d, 64, XW, XD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 31/46] target/loongarch: Implement xvssrlrn xvssrarn
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (29 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 30/46] target/loongarch: Implement xvssrln xvssran Song Gao
@ 2023-06-20 9:37 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 32/46] target/loongarch: Implement xvclo xvclz Song Gao
` (14 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:37 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSSRLRN.{B.H/H.W/W.D};
- XVSSRARN.{B.H/H.W/W.D};
- XVSSRLRN.{BU.H/HU.W/WU.D};
- XVSSRARN.{BU.H/HU.W/WU.D};
- XVSSRLRNI.{B.H/H.W/W.D/D.Q};
- XVSSRARNI.{B.H/H.W/W.D/D.Q};
- XVSSRLRNI.{BU.H/HU.W/WU.D/DU.Q};
- XVSSRARNI.{BU.H/HU.W/WU.D/DU.Q}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 30 ++
target/loongarch/helper.h | 30 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 30 ++
target/loongarch/insns.decode | 30 ++
target/loongarch/lasx_helper.c | 411 +++++++++++++++++++
5 files changed, 531 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1f40f3aaca..da07b56dee 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2165,6 +2165,36 @@ INSN_LASX(xvssrani_hu_w, xx_i)
INSN_LASX(xvssrani_wu_d, xx_i)
INSN_LASX(xvssrani_du_q, xx_i)
+INSN_LASX(xvssrlrn_b_h, xxx)
+INSN_LASX(xvssrlrn_h_w, xxx)
+INSN_LASX(xvssrlrn_w_d, xxx)
+INSN_LASX(xvssrarn_b_h, xxx)
+INSN_LASX(xvssrarn_h_w, xxx)
+INSN_LASX(xvssrarn_w_d, xxx)
+INSN_LASX(xvssrlrn_bu_h, xxx)
+INSN_LASX(xvssrlrn_hu_w, xxx)
+INSN_LASX(xvssrlrn_wu_d, xxx)
+INSN_LASX(xvssrarn_bu_h, xxx)
+INSN_LASX(xvssrarn_hu_w, xxx)
+INSN_LASX(xvssrarn_wu_d, xxx)
+
+INSN_LASX(xvssrlrni_b_h, xx_i)
+INSN_LASX(xvssrlrni_h_w, xx_i)
+INSN_LASX(xvssrlrni_w_d, xx_i)
+INSN_LASX(xvssrlrni_d_q, xx_i)
+INSN_LASX(xvssrlrni_bu_h, xx_i)
+INSN_LASX(xvssrlrni_hu_w, xx_i)
+INSN_LASX(xvssrlrni_wu_d, xx_i)
+INSN_LASX(xvssrlrni_du_q, xx_i)
+INSN_LASX(xvssrarni_b_h, xx_i)
+INSN_LASX(xvssrarni_h_w, xx_i)
+INSN_LASX(xvssrarni_w_d, xx_i)
+INSN_LASX(xvssrarni_d_q, xx_i)
+INSN_LASX(xvssrarni_bu_h, xx_i)
+INSN_LASX(xvssrarni_hu_w, xx_i)
+INSN_LASX(xvssrarni_wu_d, xx_i)
+INSN_LASX(xvssrarni_du_q, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 2d76916049..b5d1cff1f0 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1020,3 +1020,33 @@ DEF_HELPER_4(xvssrani_bu_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvssrani_hu_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvssrani_wu_d, void, env, i32, i32, i32)
DEF_HELPER_4(xvssrani_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvssrlrn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarn_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarn_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarn_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrn_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarn_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarn_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarn_wu_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvssrlrni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_b_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_h_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_d_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrlrni_du_q, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_bu_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_hu_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_wu_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvssrarni_du_q, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index b6c2ced30c..aa145c850b 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2114,6 +2114,36 @@ TRANS(xvssrani_hu_w, gen_xx_i, gen_helper_xvssrani_hu_w)
TRANS(xvssrani_wu_d, gen_xx_i, gen_helper_xvssrani_wu_d)
TRANS(xvssrani_du_q, gen_xx_i, gen_helper_xvssrani_du_q)
+TRANS(xvssrlrn_b_h, gen_xxx, gen_helper_xvssrlrn_b_h)
+TRANS(xvssrlrn_h_w, gen_xxx, gen_helper_xvssrlrn_h_w)
+TRANS(xvssrlrn_w_d, gen_xxx, gen_helper_xvssrlrn_w_d)
+TRANS(xvssrarn_b_h, gen_xxx, gen_helper_xvssrarn_b_h)
+TRANS(xvssrarn_h_w, gen_xxx, gen_helper_xvssrarn_h_w)
+TRANS(xvssrarn_w_d, gen_xxx, gen_helper_xvssrarn_w_d)
+TRANS(xvssrlrn_bu_h, gen_xxx, gen_helper_xvssrlrn_bu_h)
+TRANS(xvssrlrn_hu_w, gen_xxx, gen_helper_xvssrlrn_hu_w)
+TRANS(xvssrlrn_wu_d, gen_xxx, gen_helper_xvssrlrn_wu_d)
+TRANS(xvssrarn_bu_h, gen_xxx, gen_helper_xvssrarn_bu_h)
+TRANS(xvssrarn_hu_w, gen_xxx, gen_helper_xvssrarn_hu_w)
+TRANS(xvssrarn_wu_d, gen_xxx, gen_helper_xvssrarn_wu_d)
+
+TRANS(xvssrlrni_b_h, gen_xx_i, gen_helper_xvssrlrni_b_h)
+TRANS(xvssrlrni_h_w, gen_xx_i, gen_helper_xvssrlrni_h_w)
+TRANS(xvssrlrni_w_d, gen_xx_i, gen_helper_xvssrlrni_w_d)
+TRANS(xvssrlrni_d_q, gen_xx_i, gen_helper_xvssrlrni_d_q)
+TRANS(xvssrarni_b_h, gen_xx_i, gen_helper_xvssrarni_b_h)
+TRANS(xvssrarni_h_w, gen_xx_i, gen_helper_xvssrarni_h_w)
+TRANS(xvssrarni_w_d, gen_xx_i, gen_helper_xvssrarni_w_d)
+TRANS(xvssrarni_d_q, gen_xx_i, gen_helper_xvssrarni_d_q)
+TRANS(xvssrlrni_bu_h, gen_xx_i, gen_helper_xvssrlrni_bu_h)
+TRANS(xvssrlrni_hu_w, gen_xx_i, gen_helper_xvssrlrni_hu_w)
+TRANS(xvssrlrni_wu_d, gen_xx_i, gen_helper_xvssrlrni_wu_d)
+TRANS(xvssrlrni_du_q, gen_xx_i, gen_helper_xvssrlrni_du_q)
+TRANS(xvssrarni_bu_h, gen_xx_i, gen_helper_xvssrarni_bu_h)
+TRANS(xvssrarni_hu_w, gen_xx_i, gen_helper_xvssrarni_hu_w)
+TRANS(xvssrarni_wu_d, gen_xx_i, gen_helper_xvssrarni_wu_d)
+TRANS(xvssrarni_du_q, gen_xx_i, gen_helper_xvssrarni_du_q)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index cf3803c230..3aed69b766 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1763,6 +1763,36 @@ xvssrani_hu_w 0111 01110110 01001 ..... ..... ..... @xx_ui5
xvssrani_wu_d 0111 01110110 0101 ...... ..... ..... @xx_ui6
xvssrani_du_q 0111 01110110 011 ....... ..... ..... @xx_ui7
+xvssrlrn_b_h 0111 01010000 00001 ..... ..... ..... @xxx
+xvssrlrn_h_w 0111 01010000 00010 ..... ..... ..... @xxx
+xvssrlrn_w_d 0111 01010000 00011 ..... ..... ..... @xxx
+xvssrarn_b_h 0111 01010000 00101 ..... ..... ..... @xxx
+xvssrarn_h_w 0111 01010000 00110 ..... ..... ..... @xxx
+xvssrarn_w_d 0111 01010000 00111 ..... ..... ..... @xxx
+xvssrlrn_bu_h 0111 01010000 10001 ..... ..... ..... @xxx
+xvssrlrn_hu_w 0111 01010000 10010 ..... ..... ..... @xxx
+xvssrlrn_wu_d 0111 01010000 10011 ..... ..... ..... @xxx
+xvssrarn_bu_h 0111 01010000 10101 ..... ..... ..... @xxx
+xvssrarn_hu_w 0111 01010000 10110 ..... ..... ..... @xxx
+xvssrarn_wu_d 0111 01010000 10111 ..... ..... ..... @xxx
+
+xvssrlrni_b_h 0111 01110101 00000 1 .... ..... ..... @xx_ui4
+xvssrlrni_h_w 0111 01110101 00001 ..... ..... ..... @xx_ui5
+xvssrlrni_w_d 0111 01110101 0001 ...... ..... ..... @xx_ui6
+xvssrlrni_d_q 0111 01110101 001 ....... ..... ..... @xx_ui7
+xvssrarni_b_h 0111 01110110 10000 1 .... ..... ..... @xx_ui4
+xvssrarni_h_w 0111 01110110 10001 ..... ..... ..... @xx_ui5
+xvssrarni_w_d 0111 01110110 1001 ...... ..... ..... @xx_ui6
+xvssrarni_d_q 0111 01110110 101 ....... ..... ..... @xx_ui7
+xvssrlrni_bu_h 0111 01110101 01000 1 .... ..... ..... @xx_ui4
+xvssrlrni_hu_w 0111 01110101 01001 ..... ..... ..... @xx_ui5
+xvssrlrni_wu_d 0111 01110101 0101 ...... ..... ..... @xx_ui6
+xvssrlrni_du_q 0111 01110101 011 ....... ..... ..... @xx_ui7
+xvssrarni_bu_h 0111 01110110 11000 1 .... ..... ..... @xx_ui4
+xvssrarni_hu_w 0111 01110110 11001 ..... ..... ..... @xx_ui5
+xvssrarni_wu_d 0111 01110110 1101 ...... ..... ..... @xx_ui6
+xvssrarni_du_q 0111 01110110 111 ....... ..... ..... @xx_ui7
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index b42f412c02..0e223601de 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -1670,3 +1670,414 @@ void HELPER(xvssrani_du_q)(CPULoongArchState *env,
XVSSRANUI(xvssrani_bu_h, 16, XB, XH)
XVSSRANUI(xvssrani_hu_w, 32, XH, XW)
XVSSRANUI(xvssrani_wu_d, 64, XW, XD)
+
+#define XSSRLRNS(E1, E2, T1, T2, T3) \
+static T1 do_xssrlrns_ ## E1(T2 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ \
+ shft_res = do_xvsrlr_ ## E2(e2, sa); \
+ T1 mask; \
+ mask = (1ull << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRLRNS(XB, XH, uint16_t, int16_t, uint8_t)
+XSSRLRNS(XH, XW, uint32_t, int32_t, uint16_t)
+XSSRLRNS(XW, XD, uint64_t, int64_t, uint32_t)
+
+#define XVSSRLRN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrlrns_ ## E1(Xj->E2(i), \
+ Xk->E3(i) % BIT, (BIT / 2) - 1); \
+ Xd->E1(i + max * 2) = do_xssrlrns_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ (BIT / 2) - 1); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRLRN(xvssrlrn_b_h, 16, XB, XH, UXH)
+XVSSRLRN(xvssrlrn_h_w, 32, XH, XW, UXW)
+XVSSRLRN(xvssrlrn_w_d, 64, XW, XD, UXD)
+
+#define XSSRARNS(E1, E2, T1, T2) \
+static T1 do_xssrarns_ ## E1(T1 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ \
+ shft_res = do_xvsrar_ ## E2(e2, sa); \
+ T2 mask; \
+ mask = (1ll << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else if (shft_res < -(mask + 1)) { \
+ return ~mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRARNS(XB, XH, int16_t, int8_t)
+XSSRARNS(XH, XW, int32_t, int16_t)
+XSSRARNS(XW, XD, int64_t, int32_t)
+
+#define XVSSRARN(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrarns_ ## E1(Xj->E2(i), \
+ Xk->E3(i) % BIT, (BIT / 2) - 1); \
+ Xd->E1(i + max * 2) = do_xssrarns_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ (BIT / 2) - 1); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRARN(xvssrarn_b_h, 16, XB, XH, UXH)
+XVSSRARN(xvssrarn_h_w, 32, XH, XW, UXW)
+XVSSRARN(xvssrarn_w_d, 64, XW, XD, UXD)
+
+#define XSSRLRNU(E1, E2, T1, T2, T3) \
+static T1 do_xssrlrnu_ ## E1(T3 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ \
+ shft_res = do_xvsrlr_ ## E2(e2, sa); \
+ \
+ T2 mask; \
+ mask = (1ull << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRLRNU(XB, XH, uint16_t, uint8_t, int16_t)
+XSSRLRNU(XH, XW, uint32_t, uint16_t, int32_t)
+XSSRLRNU(XW, XD, uint64_t, uint32_t, int64_t)
+
+#define XVSSRLRNU(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrlrnu_ ## E1(Xj->E2(i), Xk->E3(i) % BIT, BIT / 2); \
+ Xd->E1(i + max * 2) = do_xssrlrnu_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ BIT / 2); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRLRNU(xvssrlrn_bu_h, 16, XB, XH, UXH)
+XVSSRLRNU(xvssrlrn_hu_w, 32, XH, XW, UXW)
+XVSSRLRNU(xvssrlrn_wu_d, 64, XW, XD, UXD)
+
+#define XSSRARNU(E1, E2, T1, T2, T3) \
+static T1 do_xssrarnu_ ## E1(T3 e2, int sa, int sh) \
+{ \
+ T1 shft_res; \
+ \
+ if (e2 < 0) { \
+ shft_res = 0; \
+ } else { \
+ shft_res = do_xvsrar_ ## E2(e2, sa); \
+ } \
+ T2 mask; \
+ mask = (1ull << sh) - 1; \
+ if (shft_res > mask) { \
+ return mask; \
+ } else { \
+ return shft_res; \
+ } \
+}
+
+XSSRARNU(XB, XH, uint16_t, uint8_t, int16_t)
+XSSRARNU(XH, XW, uint32_t, uint16_t, int32_t)
+XSSRARNU(XW, XD, uint64_t, uint32_t, int64_t)
+
+#define XVSSRARNU(NAME, BIT, E1, E2, E3) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ Xd->E1(i) = do_xssrarnu_ ## E1(Xj->E2(i), Xk->E3(i) % BIT, BIT / 2); \
+ Xd->E1(i + max * 2) = do_xssrarnu_## E1(Xj->E2(i + max), \
+ Xk->E3(i + max) % BIT, \
+ BIT / 2); \
+ } \
+ Xd->XD(1) = 0; \
+ Xd->XD(3) = 0; \
+}
+
+XVSSRARNU(xvssrarn_bu_h, 16, XB, XH, UXH)
+XVSSRARNU(xvssrarn_hu_w, 32, XH, XW, UXW)
+XVSSRARNU(xvssrarn_wu_d, 64, XW, XD, UXD)
+
+#define XVSSRLRNI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrlrns_ ## E1(Xj->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max) = do_xssrlrns_ ## E1(Xd->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 2) = do_xssrlrns_## E1(Xj->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 3) = do_xssrlrns_## E1(Xd->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ } \
+ *Xd = temp; \
+}
+
+#define XVSSRLRNI_Q(NAME, sh) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i; \
+ Int128 shft_res[4], r[4], mask; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ if (imm == 0) { \
+ shft_res[0] = Xj->XQ(0); \
+ shft_res[1] = Xd->XQ(0); \
+ shft_res[2] = Xj->XQ(1); \
+ shft_res[3] = Xd->XQ(1); \
+ } else { \
+ r[0] = int128_and(int128_urshift(Xj->XQ(0), (imm - 1)), int128_one()); \
+ r[1] = int128_and(int128_urshift(Xd->XQ(0), (imm - 1)), int128_one()); \
+ r[2] = int128_and(int128_urshift(Xj->XQ(1), (imm - 1)), int128_one()); \
+ r[3] = int128_and(int128_urshift(Xd->XQ(1), (imm - 1)), int128_one()); \
+ \
+ shft_res[0] = (int128_add(int128_urshift(Xj->XQ(0), imm), r[0])); \
+ shft_res[1] = (int128_add(int128_urshift(Xd->XQ(0), imm), r[1])); \
+ shft_res[2] = (int128_add(int128_urshift(Xj->XQ(1), imm), r[2])); \
+ shft_res[3] = (int128_add(int128_urshift(Xd->XQ(1), imm), r[3])); \
+ } \
+ \
+ mask = int128_sub(int128_lshift(int128_one(), sh), int128_one()); \
+ \
+ for (i = 0; i < 4; i++) { \
+ if (int128_ult(mask, shft_res[i])) { \
+ Xd->XD(i) = int128_getlo(mask); \
+ } else { \
+ Xd->XD(i) = int128_getlo(shft_res[i]); \
+ } \
+ } \
+}
+
+XVSSRLRNI(xvssrlrni_b_h, 16, XB, XH)
+XVSSRLRNI(xvssrlrni_h_w, 32, XH, XW)
+XVSSRLRNI(xvssrlrni_w_d, 64, XW, XD)
+XVSSRLRNI_Q(xvssrlrni_d_q, 63)
+
+#define XVSSRARNI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrarns_ ## E1(Xj->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max) = do_xssrarns_ ## E1(Xd->E2(i), imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 2) = do_xssrarns_## E1(Xj->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ temp.E1(i + max * 3) = do_xssrarns_## E1(Xd->E2(i + max), \
+ imm, (BIT / 2) - 1); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvssrarni_d_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ int i;
+ Int128 shft_res[4], r[4], mask1, mask2;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ if (imm == 0) {
+ shft_res[0] = Xj->XQ(0);
+ shft_res[1] = Xd->XQ(0);
+ shft_res[2] = Xj->XQ(1);
+ shft_res[3] = Xd->XQ(1);
+ } else {
+ r[0] = int128_and(int128_rshift(Xj->XQ(0), (imm - 1)), int128_one());
+ r[1] = int128_and(int128_rshift(Xd->XQ(0), (imm - 1)), int128_one());
+ r[2] = int128_and(int128_rshift(Xj->XQ(1), (imm - 1)), int128_one());
+ r[3] = int128_and(int128_rshift(Xd->XQ(1), (imm - 1)), int128_one());
+
+ shft_res[0] = int128_add(int128_rshift(Xj->XQ(0), imm), r[0]);
+ shft_res[1] = int128_add(int128_rshift(Xd->XQ(0), imm), r[1]);
+ shft_res[2] = int128_add(int128_rshift(Xj->XQ(1), imm), r[2]);
+ shft_res[3] = int128_add(int128_rshift(Xd->XQ(1), imm), r[3]);
+ }
+
+ mask1 = int128_sub(int128_lshift(int128_one(), 63), int128_one());
+ mask2 = int128_lshift(int128_one(), 63);
+
+ for (i = 0; i < 4; i++) {
+ if (int128_gt(shft_res[i], mask1)) {
+ Xd->XD(i) = int128_getlo(mask1);
+ } else if (int128_lt(shft_res[i], int128_neg(mask2))) {
+ Xd->XD(i) = int128_getlo(mask2);
+ } else {
+ Xd->XD(i) = int128_getlo(shft_res[i]);
+ }
+ }
+}
+
+XVSSRARNI(xvssrarni_b_h, 16, XB, XH)
+XVSSRARNI(xvssrarni_h_w, 32, XH, XW)
+XVSSRARNI(xvssrarni_w_d, 64, XW, XD)
+
+#define XVSSRLRNUI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrlrnu_ ## E1(Xj->E2(i), imm, BIT / 2); \
+ temp.E1(i + max) = do_xssrlrnu_ ## E1(Xd->E2(i), imm, BIT / 2); \
+ temp.E1(i + max * 2) = do_xssrlrnu_## E1(Xj->E2(i + max), \
+ imm, BIT / 2); \
+ temp.E1(i + max * 3) = do_xssrlrnu_## E1(Xd->E2(i + max), \
+ imm, BIT / 2); \
+ } \
+ *Xd = temp; \
+}
+
+XVSSRLRNUI(xvssrlrni_bu_h, 16, XB, XH)
+XVSSRLRNUI(xvssrlrni_hu_w, 32, XH, XW)
+XVSSRLRNUI(xvssrlrni_wu_d, 64, XW, XD)
+XVSSRLRNI_Q(xvssrlrni_du_q, 64)
+
+#define XVSSRARNUI(NAME, BIT, E1, E2) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < max; i++) { \
+ temp.E1(i) = do_xssrarnu_ ## E1(Xj->E2(i), imm, BIT / 2); \
+ temp.E1(i + max) = do_xssrarnu_ ## E1(Xd->E2(i), imm, BIT / 2); \
+ temp.E1(i + max * 2) = do_xssrarnu_## E1(Xj->E2(i + max), \
+ imm, BIT / 2); \
+ temp.E1(i + max * 3) = do_xssrarnu_## E1(Xd->E2(i + max), \
+ imm, BIT / 2); \
+ } \
+ *Xd = temp; \
+}
+
+void HELPER(xvssrarni_du_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ int i;
+ Int128 shft_res[4], r[4], mask1, mask2;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ if (imm == 0) {
+ shft_res[0] = Xj->XQ(0);
+ shft_res[1] = Xd->XQ(0);
+ shft_res[2] = Xj->XQ(1);
+ shft_res[3] = Xd->XQ(1);
+ } else {
+ r[0] = int128_and(int128_rshift(Xj->XQ(0), (imm - 1)), int128_one());
+ r[1] = int128_and(int128_rshift(Xd->XQ(0), (imm - 1)), int128_one());
+ r[2] = int128_and(int128_rshift(Xj->XQ(1), (imm - 1)), int128_one());
+ r[3] = int128_and(int128_rshift(Xd->XQ(1), (imm - 1)), int128_one());
+
+ shft_res[0] = int128_add(int128_rshift(Xj->XQ(0), imm), r[0]);
+ shft_res[1] = int128_add(int128_rshift(Xd->XQ(0), imm), r[1]);
+ shft_res[2] = int128_add(int128_rshift(Xj->XQ(1), imm), r[2]);
+ shft_res[3] = int128_add(int128_rshift(Xd->XQ(1), imm), r[3]);
+ }
+
+ if (int128_lt(Xj->XQ(0), int128_zero())) {
+ shft_res[0] = int128_zero();
+ }
+ if (int128_lt(Xd->XQ(0), int128_zero())) {
+ shft_res[1] = int128_zero();
+ }
+ if (int128_lt(Xj->XQ(1), int128_zero())) {
+ shft_res[2] = int128_zero();
+ }
+ if (int128_lt(Xd->XQ(1), int128_zero())) {
+ shft_res[3] = int128_zero();
+ }
+
+ mask1 = int128_sub(int128_lshift(int128_one(), 64), int128_one());
+ mask2 = int128_lshift(int128_one(), 64);
+
+ for (i = 0; i < 4; i++) {
+ if (int128_gt(shft_res[i], mask1)) {
+ Xd->XD(i) = int128_getlo(mask1);
+ } else if (int128_lt(shft_res[i], int128_neg(mask2))) {
+ Xd->XD(i) = int128_getlo(mask2);
+ } else {
+ Xd->XD(i) = int128_getlo(shft_res[i]);
+ }
+ }
+}
+
+XVSSRARNUI(xvssrarni_bu_h, 16, XB, XH)
+XVSSRARNUI(xvssrarni_hu_w, 32, XH, XW)
+XVSSRARNUI(xvssrarni_wu_d, 64, XW, XD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 32/46] target/loongarch: Implement xvclo xvclz
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (30 preceding siblings ...)
2023-06-20 9:37 ` [PATCH v1 31/46] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 33/46] target/loongarch: Implement xvpcnt Song Gao
` (13 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVCLO.{B/H/W/D};
- XVCLZ.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 9 +++++++++
target/loongarch/helper.h | 9 +++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 9 +++++++++
target/loongarch/insns.decode | 9 +++++++++
target/loongarch/lasx_helper.c | 21 ++++++++++++++++++++
target/loongarch/lsx_helper.c | 9 ---------
target/loongarch/vec.h | 9 +++++++++
7 files changed, 66 insertions(+), 9 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index da07b56dee..99636ca56c 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2195,6 +2195,15 @@ INSN_LASX(xvssrarni_hu_w, xx_i)
INSN_LASX(xvssrarni_wu_d, xx_i)
INSN_LASX(xvssrarni_du_q, xx_i)
+INSN_LASX(xvclo_b, xx)
+INSN_LASX(xvclo_h, xx)
+INSN_LASX(xvclo_w, xx)
+INSN_LASX(xvclo_d, xx)
+INSN_LASX(xvclz_b, xx)
+INSN_LASX(xvclz_h, xx)
+INSN_LASX(xvclz_w, xx)
+INSN_LASX(xvclz_d, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b5d1cff1f0..950a73ec6f 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1050,3 +1050,12 @@ DEF_HELPER_4(xvssrarni_bu_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvssrarni_hu_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvssrarni_wu_d, void, env, i32, i32, i32)
DEF_HELPER_4(xvssrarni_du_q, void, env, i32, i32, i32)
+
+DEF_HELPER_3(xvclo_b, void, env, i32, i32)
+DEF_HELPER_3(xvclo_h, void, env, i32, i32)
+DEF_HELPER_3(xvclo_w, void, env, i32, i32)
+DEF_HELPER_3(xvclo_d, void, env, i32, i32)
+DEF_HELPER_3(xvclz_b, void, env, i32, i32)
+DEF_HELPER_3(xvclz_h, void, env, i32, i32)
+DEF_HELPER_3(xvclz_w, void, env, i32, i32)
+DEF_HELPER_3(xvclz_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index aa145c850b..fa7dafa7f9 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2144,6 +2144,15 @@ TRANS(xvssrarni_hu_w, gen_xx_i, gen_helper_xvssrarni_hu_w)
TRANS(xvssrarni_wu_d, gen_xx_i, gen_helper_xvssrarni_wu_d)
TRANS(xvssrarni_du_q, gen_xx_i, gen_helper_xvssrarni_du_q)
+TRANS(xvclo_b, gen_xx, gen_helper_xvclo_b)
+TRANS(xvclo_h, gen_xx, gen_helper_xvclo_h)
+TRANS(xvclo_w, gen_xx, gen_helper_xvclo_w)
+TRANS(xvclo_d, gen_xx, gen_helper_xvclo_d)
+TRANS(xvclz_b, gen_xx, gen_helper_xvclz_b)
+TRANS(xvclz_h, gen_xx, gen_helper_xvclz_h)
+TRANS(xvclz_w, gen_xx, gen_helper_xvclz_w)
+TRANS(xvclz_d, gen_xx, gen_helper_xvclz_d)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3aed69b766..91de5a3815 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1793,6 +1793,15 @@ xvssrarni_hu_w 0111 01110110 11001 ..... ..... ..... @xx_ui5
xvssrarni_wu_d 0111 01110110 1101 ...... ..... ..... @xx_ui6
xvssrarni_du_q 0111 01110110 111 ....... ..... ..... @xx_ui7
+xvclo_b 0111 01101001 11000 00000 ..... ..... @xx
+xvclo_h 0111 01101001 11000 00001 ..... ..... @xx
+xvclo_w 0111 01101001 11000 00010 ..... ..... @xx
+xvclo_d 0111 01101001 11000 00011 ..... ..... @xx
+xvclz_b 0111 01101001 11000 00100 ..... ..... @xx
+xvclz_h 0111 01101001 11000 00101 ..... ..... @xx
+xvclz_w 0111 01101001 11000 00110 ..... ..... @xx
+xvclz_d 0111 01101001 11000 00111 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 0e223601de..122c460fb5 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2081,3 +2081,24 @@ void HELPER(xvssrarni_du_q)(CPULoongArchState *env,
XVSSRARNUI(xvssrarni_bu_h, 16, XB, XH)
XVSSRARNUI(xvssrarni_hu_w, 32, XH, XW)
XVSSRARNUI(xvssrarni_wu_d, 64, XW, XD)
+
+#define XDO_2OP(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i)); \
+ } \
+}
+
+XDO_2OP(xvclo_b, 8, UXB, DO_CLO_B)
+XDO_2OP(xvclo_h, 16, UXH, DO_CLO_H)
+XDO_2OP(xvclo_w, 32, UXW, DO_CLO_W)
+XDO_2OP(xvclo_d, 64, UXD, DO_CLO_D)
+XDO_2OP(xvclz_b, 8, UXB, DO_CLZ_B)
+XDO_2OP(xvclz_h, 16, UXH, DO_CLZ_H)
+XDO_2OP(xvclz_w, 32, UXW, DO_CLZ_W)
+XDO_2OP(xvclz_d, 64, UXD, DO_CLZ_D)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index d21e4006f2..e1b448a2e6 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1910,15 +1910,6 @@ void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
} \
}
-#define DO_CLO_B(N) (clz32(~N & 0xff) - 24)
-#define DO_CLO_H(N) (clz32(~N & 0xffff) - 16)
-#define DO_CLO_W(N) (clz32(~N))
-#define DO_CLO_D(N) (clz64(~N))
-#define DO_CLZ_B(N) (clz32(N) - 24)
-#define DO_CLZ_H(N) (clz32(N) - 16)
-#define DO_CLZ_W(N) (clz32(N))
-#define DO_CLZ_D(N) (clz64(N))
-
DO_2OP(vclo_b, 8, UB, DO_CLO_B)
DO_2OP(vclo_h, 16, UH, DO_CLO_H)
DO_2OP(vclo_w, 32, UW, DO_CLO_W)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index b5cdb4b470..db5704dd05 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -77,6 +77,15 @@
#define R_SHIFT(a, b) (a >> b)
+#define DO_CLO_B(N) (clz32(~N & 0xff) - 24)
+#define DO_CLO_H(N) (clz32(~N & 0xffff) - 16)
+#define DO_CLO_W(N) (clz32(~N))
+#define DO_CLO_D(N) (clz64(~N))
+#define DO_CLZ_B(N) (clz32(N) - 24)
+#define DO_CLZ_H(N) (clz32(N) - 16)
+#define DO_CLZ_W(N) (clz32(N))
+#define DO_CLZ_D(N) (clz64(N))
+
uint64_t do_vmskltz_b(int64_t val);
uint64_t do_vmskltz_h(int64_t val);
uint64_t do_vmskltz_w(int64_t val);
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 33/46] target/loongarch: Implement xvpcnt
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (31 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 32/46] target/loongarch: Implement xvclo xvclz Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 34/46] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
` (12 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- VPCNT.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 5 +++++
target/loongarch/helper.h | 5 +++++
target/loongarch/insn_trans/trans_lasx.c.inc | 5 +++++
target/loongarch/insns.decode | 5 +++++
target/loongarch/lasx_helper.c | 17 +++++++++++++++++
5 files changed, 37 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 99636ca56c..b7a322651f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2204,6 +2204,11 @@ INSN_LASX(xvclz_h, xx)
INSN_LASX(xvclz_w, xx)
INSN_LASX(xvclz_d, xx)
+INSN_LASX(xvpcnt_b, xx)
+INSN_LASX(xvpcnt_h, xx)
+INSN_LASX(xvpcnt_w, xx)
+INSN_LASX(xvpcnt_d, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 950a73ec6f..a434443819 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1059,3 +1059,8 @@ DEF_HELPER_3(xvclz_b, void, env, i32, i32)
DEF_HELPER_3(xvclz_h, void, env, i32, i32)
DEF_HELPER_3(xvclz_w, void, env, i32, i32)
DEF_HELPER_3(xvclz_d, void, env, i32, i32)
+
+DEF_HELPER_3(xvpcnt_b, void, env, i32, i32)
+DEF_HELPER_3(xvpcnt_h, void, env, i32, i32)
+DEF_HELPER_3(xvpcnt_w, void, env, i32, i32)
+DEF_HELPER_3(xvpcnt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index fa7dafa7f9..616d296432 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2153,6 +2153,11 @@ TRANS(xvclz_h, gen_xx, gen_helper_xvclz_h)
TRANS(xvclz_w, gen_xx, gen_helper_xvclz_w)
TRANS(xvclz_d, gen_xx, gen_helper_xvclz_d)
+TRANS(xvpcnt_b, gen_xx, gen_helper_xvpcnt_b)
+TRANS(xvpcnt_h, gen_xx, gen_helper_xvpcnt_h)
+TRANS(xvpcnt_w, gen_xx, gen_helper_xvpcnt_w)
+TRANS(xvpcnt_d, gen_xx, gen_helper_xvpcnt_d)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 91de5a3815..7d49ddb0ea 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1802,6 +1802,11 @@ xvclz_h 0111 01101001 11000 00101 ..... ..... @xx
xvclz_w 0111 01101001 11000 00110 ..... ..... @xx
xvclz_d 0111 01101001 11000 00111 ..... ..... @xx
+xvpcnt_b 0111 01101001 11000 01000 ..... ..... @xx
+xvpcnt_h 0111 01101001 11000 01001 ..... ..... @xx
+xvpcnt_w 0111 01101001 11000 01010 ..... ..... @xx
+xvpcnt_d 0111 01101001 11000 01011 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 122c460fb5..f04817984b 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2102,3 +2102,20 @@ XDO_2OP(xvclz_b, 8, UXB, DO_CLZ_B)
XDO_2OP(xvclz_h, 16, UXH, DO_CLZ_H)
XDO_2OP(xvclz_w, 32, UXW, DO_CLZ_W)
XDO_2OP(xvclz_d, 64, UXD, DO_CLZ_D)
+
+#define XVPCNT(NAME, BIT, E, FN) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = FN(Xj->E(i)); \
+ } \
+}
+
+XVPCNT(xvpcnt_b, 8, UXB, ctpop8)
+XVPCNT(xvpcnt_h, 16, UXH, ctpop16)
+XVPCNT(xvpcnt_w, 32, UXW, ctpop32)
+XVPCNT(xvpcnt_d, 64, UXD, ctpop64)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 34/46] target/loongarch: Implement xvbitclr xvbitset xvbitrev
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (32 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 33/46] target/loongarch: Implement xvpcnt Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 35/46] target/loongarch: Implement xvfrstp Song Gao
` (11 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVBITCLR[I].{B/H/W/D};
- XVBITSET[I].{B/H/W/D};
- XVBITREV[I].{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 25 ++
target/loongarch/helper.h | 27 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 246 +++++++++++++++++++
target/loongarch/insns.decode | 27 ++
target/loongarch/lasx_helper.c | 51 ++++
target/loongarch/lsx_helper.c | 4 -
target/loongarch/vec.h | 4 +
7 files changed, 380 insertions(+), 4 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b7a322651f..60d265a9f2 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2209,6 +2209,31 @@ INSN_LASX(xvpcnt_h, xx)
INSN_LASX(xvpcnt_w, xx)
INSN_LASX(xvpcnt_d, xx)
+INSN_LASX(xvbitclr_b, xxx)
+INSN_LASX(xvbitclr_h, xxx)
+INSN_LASX(xvbitclr_w, xxx)
+INSN_LASX(xvbitclr_d, xxx)
+INSN_LASX(xvbitclri_b, xx_i)
+INSN_LASX(xvbitclri_h, xx_i)
+INSN_LASX(xvbitclri_w, xx_i)
+INSN_LASX(xvbitclri_d, xx_i)
+INSN_LASX(xvbitset_b, xxx)
+INSN_LASX(xvbitset_h, xxx)
+INSN_LASX(xvbitset_w, xxx)
+INSN_LASX(xvbitset_d, xxx)
+INSN_LASX(xvbitseti_b, xx_i)
+INSN_LASX(xvbitseti_h, xx_i)
+INSN_LASX(xvbitseti_w, xx_i)
+INSN_LASX(xvbitseti_d, xx_i)
+INSN_LASX(xvbitrev_b, xxx)
+INSN_LASX(xvbitrev_h, xxx)
+INSN_LASX(xvbitrev_w, xxx)
+INSN_LASX(xvbitrev_d, xxx)
+INSN_LASX(xvbitrevi_b, xx_i)
+INSN_LASX(xvbitrevi_h, xx_i)
+INSN_LASX(xvbitrevi_w, xx_i)
+INSN_LASX(xvbitrevi_d, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index a434443819..294ac477fc 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1064,3 +1064,30 @@ DEF_HELPER_3(xvpcnt_b, void, env, i32, i32)
DEF_HELPER_3(xvpcnt_h, void, env, i32, i32)
DEF_HELPER_3(xvpcnt_w, void, env, i32, i32)
DEF_HELPER_3(xvpcnt_d, void, env, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvbitclr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitclr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitclr_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitclr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitclri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitclri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitclri_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitclri_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(xvbitset_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitset_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitset_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitset_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitseti_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitseti_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitseti_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitseti_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(xvbitrev_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitrev_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitrev_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitrev_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(xvbitrevi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitrevi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitrevi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvbitrevi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 616d296432..e87e000478 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2158,6 +2158,252 @@ TRANS(xvpcnt_h, gen_xx, gen_helper_xvpcnt_h)
TRANS(xvpcnt_w, gen_xx, gen_helper_xvpcnt_w)
TRANS(xvpcnt_d, gen_xx, gen_helper_xvpcnt_d)
+static void do_xvbitclr(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shlv_vec, INDEX_op_andc_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vbitclr,
+ .fno = gen_helper_xvbitclr_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vbitclr,
+ .fno = gen_helper_xvbitclr_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vbitclr,
+ .fno = gen_helper_xvbitclr_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vbitclr,
+ .fno = gen_helper_xvbitclr_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvbitclr_b, gvec_xxx, MO_8, do_xvbitclr)
+TRANS(xvbitclr_h, gvec_xxx, MO_16, do_xvbitclr)
+TRANS(xvbitclr_w, gvec_xxx, MO_32, do_xvbitclr)
+TRANS(xvbitclr_d, gvec_xxx, MO_64, do_xvbitclr)
+
+static void do_xvbitclri(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, INDEX_op_andc_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vbitclri,
+ .fnoi = gen_helper_xvbitclri_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vbitclri,
+ .fnoi = gen_helper_xvbitclri_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vbitclri,
+ .fnoi = gen_helper_xvbitclri_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vbitclri,
+ .fnoi = gen_helper_xvbitclri_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(xvbitclri_b, gvec_xx_i, MO_8, do_xvbitclri)
+TRANS(xvbitclri_h, gvec_xx_i, MO_16, do_xvbitclri)
+TRANS(xvbitclri_w, gvec_xx_i, MO_32, do_xvbitclri)
+TRANS(xvbitclri_d, gvec_xx_i, MO_64, do_xvbitclri)
+
+static void do_xvbitset(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shlv_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vbitset,
+ .fno = gen_helper_xvbitset_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vbitset,
+ .fno = gen_helper_xvbitset_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vbitset,
+ .fno = gen_helper_xvbitset_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vbitset,
+ .fno = gen_helper_xvbitset_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvbitset_b, gvec_xxx, MO_8, do_xvbitset)
+TRANS(xvbitset_h, gvec_xxx, MO_16, do_xvbitset)
+TRANS(xvbitset_w, gvec_xxx, MO_32, do_xvbitset)
+TRANS(xvbitset_d, gvec_xxx, MO_64, do_xvbitset)
+
+static void do_xvbitseti(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vbitseti,
+ .fnoi = gen_helper_xvbitseti_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vbitseti,
+ .fnoi = gen_helper_xvbitseti_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vbitseti,
+ .fnoi = gen_helper_xvbitseti_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vbitseti,
+ .fnoi = gen_helper_xvbitseti_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(xvbitseti_b, gvec_xx_i, MO_8, do_xvbitseti)
+TRANS(xvbitseti_h, gvec_xx_i, MO_16, do_xvbitseti)
+TRANS(xvbitseti_w, gvec_xx_i, MO_32, do_xvbitseti)
+TRANS(xvbitseti_d, gvec_xx_i, MO_64, do_xvbitseti)
+
+static void do_xvbitrev(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ uint32_t xk_ofs, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shlv_vec, 0
+ };
+ static const GVecGen3 op[4] = {
+ {
+ .fniv = gen_vbitrev,
+ .fno = gen_helper_xvbitrev_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vbitrev,
+ .fno = gen_helper_xvbitrev_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vbitrev,
+ .fno = gen_helper_xvbitrev_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vbitrev,
+ .fno = gen_helper_xvbitrev_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_3(xd_ofs, xj_ofs, xk_ofs, oprsz, maxsz, &op[vece]);
+}
+
+TRANS(xvbitrev_b, gvec_xxx, MO_8, do_xvbitrev)
+TRANS(xvbitrev_h, gvec_xxx, MO_16, do_xvbitrev)
+TRANS(xvbitrev_w, gvec_xxx, MO_32, do_xvbitrev)
+TRANS(xvbitrev_d, gvec_xxx, MO_64, do_xvbitrev)
+
+static void do_xvbitrevi(unsigned vece, uint32_t xd_ofs, uint32_t xj_ofs,
+ int64_t imm, uint32_t oprsz, uint32_t maxsz)
+{
+ static const TCGOpcode vecop_list[] = {
+ INDEX_op_shli_vec, 0
+ };
+ static const GVecGen2i op[4] = {
+ {
+ .fniv = gen_vbitrevi,
+ .fnoi = gen_helper_xvbitrevi_b,
+ .opt_opc = vecop_list,
+ .vece = MO_8
+ },
+ {
+ .fniv = gen_vbitrevi,
+ .fnoi = gen_helper_xvbitrevi_h,
+ .opt_opc = vecop_list,
+ .vece = MO_16
+ },
+ {
+ .fniv = gen_vbitrevi,
+ .fnoi = gen_helper_xvbitrevi_w,
+ .opt_opc = vecop_list,
+ .vece = MO_32
+ },
+ {
+ .fniv = gen_vbitrevi,
+ .fnoi = gen_helper_xvbitrevi_d,
+ .opt_opc = vecop_list,
+ .vece = MO_64
+ },
+ };
+
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, oprsz, maxsz, imm, &op[vece]);
+}
+
+TRANS(xvbitrevi_b, gvec_xx_i, MO_8, do_xvbitrevi)
+TRANS(xvbitrevi_h, gvec_xx_i, MO_16, do_xvbitrevi)
+TRANS(xvbitrevi_w, gvec_xx_i, MO_32, do_xvbitrevi)
+TRANS(xvbitrevi_d, gvec_xx_i, MO_64, do_xvbitrevi)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7d49ddb0ea..47374054c6 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1807,6 +1807,33 @@ xvpcnt_h 0111 01101001 11000 01001 ..... ..... @xx
xvpcnt_w 0111 01101001 11000 01010 ..... ..... @xx
xvpcnt_d 0111 01101001 11000 01011 ..... ..... @xx
+xvbitclr_b 0111 01010000 11000 ..... ..... ..... @xxx
+xvbitclr_h 0111 01010000 11001 ..... ..... ..... @xxx
+xvbitclr_w 0111 01010000 11010 ..... ..... ..... @xxx
+xvbitclr_d 0111 01010000 11011 ..... ..... ..... @xxx
+xvbitclri_b 0111 01110001 00000 01 ... ..... ..... @xx_ui3
+xvbitclri_h 0111 01110001 00000 1 .... ..... ..... @xx_ui4
+xvbitclri_w 0111 01110001 00001 ..... ..... ..... @xx_ui5
+xvbitclri_d 0111 01110001 0001 ...... ..... ..... @xx_ui6
+
+xvbitset_b 0111 01010000 11100 ..... ..... ..... @xxx
+xvbitset_h 0111 01010000 11101 ..... ..... ..... @xxx
+xvbitset_w 0111 01010000 11110 ..... ..... ..... @xxx
+xvbitset_d 0111 01010000 11111 ..... ..... ..... @xxx
+xvbitseti_b 0111 01110001 01000 01 ... ..... ..... @xx_ui3
+xvbitseti_h 0111 01110001 01000 1 .... ..... ..... @xx_ui4
+xvbitseti_w 0111 01110001 01001 ..... ..... ..... @xx_ui5
+xvbitseti_d 0111 01110001 0101 ...... ..... ..... @xx_ui6
+
+xvbitrev_b 0111 01010001 00000 ..... ..... ..... @xxx
+xvbitrev_h 0111 01010001 00001 ..... ..... ..... @xxx
+xvbitrev_w 0111 01010001 00010 ..... ..... ..... @xxx
+xvbitrev_d 0111 01010001 00011 ..... ..... ..... @xxx
+xvbitrevi_b 0111 01110001 10000 01 ... ..... ..... @xx_ui3
+xvbitrevi_h 0111 01110001 10000 1 .... ..... ..... @xx_ui4
+xvbitrevi_w 0111 01110001 10001 ..... ..... ..... @xx_ui5
+xvbitrevi_d 0111 01110001 1001 ...... ..... ..... @xx_ui6
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index f04817984b..7092835d30 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2119,3 +2119,54 @@ XVPCNT(xvpcnt_b, 8, UXB, ctpop8)
XVPCNT(xvpcnt_h, 16, UXH, ctpop16)
XVPCNT(xvpcnt_w, 32, UXW, ctpop32)
XVPCNT(xvpcnt_d, 64, UXD, ctpop64)
+
+#define XDO_BIT(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, void *xk, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ XReg *Xk = (XReg *)xk; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i), Xk->E(i) % BIT); \
+ } \
+}
+
+XDO_BIT(xvbitclr_b, 8, UXB, DO_BITCLR)
+XDO_BIT(xvbitclr_h, 16, UXH, DO_BITCLR)
+XDO_BIT(xvbitclr_w, 32, UXW, DO_BITCLR)
+XDO_BIT(xvbitclr_d, 64, UXD, DO_BITCLR)
+XDO_BIT(xvbitset_b, 8, UXB, DO_BITSET)
+XDO_BIT(xvbitset_h, 16, UXH, DO_BITSET)
+XDO_BIT(xvbitset_w, 32, UXW, DO_BITSET)
+XDO_BIT(xvbitset_d, 64, UXD, DO_BITSET)
+XDO_BIT(xvbitrev_b, 8, UXB, DO_BITREV)
+XDO_BIT(xvbitrev_h, 16, UXH, DO_BITREV)
+XDO_BIT(xvbitrev_w, 32, UXW, DO_BITREV)
+XDO_BIT(xvbitrev_d, 64, UXD, DO_BITREV)
+
+#define XDO_BITI(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, uint64_t imm, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i), imm); \
+ } \
+}
+
+XDO_BITI(xvbitclri_b, 8, UXB, DO_BITCLR)
+XDO_BITI(xvbitclri_h, 16, UXH, DO_BITCLR)
+XDO_BITI(xvbitclri_w, 32, UXW, DO_BITCLR)
+XDO_BITI(xvbitclri_d, 64, UXD, DO_BITCLR)
+XDO_BITI(xvbitseti_b, 8, UXB, DO_BITSET)
+XDO_BITI(xvbitseti_h, 16, UXH, DO_BITSET)
+XDO_BITI(xvbitseti_w, 32, UXW, DO_BITSET)
+XDO_BITI(xvbitseti_d, 64, UXD, DO_BITSET)
+XDO_BITI(xvbitrevi_b, 8, UXB, DO_BITREV)
+XDO_BITI(xvbitrevi_h, 16, UXH, DO_BITREV)
+XDO_BITI(xvbitrevi_w, 32, UXW, DO_BITREV)
+XDO_BITI(xvbitrevi_d, 64, UXD, DO_BITREV)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index e1b448a2e6..b9fdcd3ed7 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -1937,10 +1937,6 @@ VPCNT(vpcnt_h, 16, UH, ctpop16)
VPCNT(vpcnt_w, 32, UW, ctpop32)
VPCNT(vpcnt_d, 64, UD, ctpop64)
-#define DO_BITCLR(a, bit) (a & ~(1ull << bit))
-#define DO_BITSET(a, bit) (a | 1ull << bit)
-#define DO_BITREV(a, bit) (a ^ (1ull << bit))
-
#define DO_BIT(NAME, BIT, E, DO_OP) \
void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
{ \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index db5704dd05..4d9c4eb85f 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -86,6 +86,10 @@
#define DO_CLZ_W(N) (clz32(N))
#define DO_CLZ_D(N) (clz64(N))
+#define DO_BITCLR(a, bit) (a & ~(1ull << bit))
+#define DO_BITSET(a, bit) (a | 1ull << bit)
+#define DO_BITREV(a, bit) (a ^ (1ull << bit))
+
uint64_t do_vmskltz_b(int64_t val);
uint64_t do_vmskltz_h(int64_t val);
uint64_t do_vmskltz_w(int64_t val);
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 35/46] target/loongarch: Implement xvfrstp
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (33 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 34/46] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 36/46] target/loongarch: Implement LASX fpu arith instructions Song Gao
` (10 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVFRSTP[I].{B/H}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 5 ++
target/loongarch/helper.h | 5 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 5 ++
target/loongarch/insns.decode | 5 ++
target/loongarch/lasx_helper.c | 56 ++++++++++++++++++++
5 files changed, 76 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 60d265a9f2..5340609e6f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2234,6 +2234,11 @@ INSN_LASX(xvbitrevi_h, xx_i)
INSN_LASX(xvbitrevi_w, xx_i)
INSN_LASX(xvbitrevi_d, xx_i)
+INSN_LASX(xvfrstp_b, xxx)
+INSN_LASX(xvfrstp_h, xxx)
+INSN_LASX(xvfrstpi_b, xx_i)
+INSN_LASX(xvfrstpi_h, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 294ac477fc..4db0cd25d3 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1091,3 +1091,8 @@ DEF_HELPER_FLAGS_4(xvbitrevi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvbitrevi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvbitrevi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvbitrevi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_4(xvfrstp_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfrstp_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfrstpi_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfrstpi_h, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index e87e000478..beeb9b3ff8 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2404,6 +2404,11 @@ TRANS(xvbitrevi_h, gvec_xx_i, MO_16, do_xvbitrevi)
TRANS(xvbitrevi_w, gvec_xx_i, MO_32, do_xvbitrevi)
TRANS(xvbitrevi_d, gvec_xx_i, MO_64, do_xvbitrevi)
+TRANS(xvfrstp_b, gen_xxx, gen_helper_xvfrstp_b)
+TRANS(xvfrstp_h, gen_xxx, gen_helper_xvfrstp_h)
+TRANS(xvfrstpi_b, gen_xx_i, gen_helper_xvfrstpi_b)
+TRANS(xvfrstpi_h, gen_xx_i, gen_helper_xvfrstpi_h)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 47374054c6..387c1e5776 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1834,6 +1834,11 @@ xvbitrevi_h 0111 01110001 10000 1 .... ..... ..... @xx_ui4
xvbitrevi_w 0111 01110001 10001 ..... ..... ..... @xx_ui5
xvbitrevi_d 0111 01110001 1001 ...... ..... ..... @xx_ui6
+xvfrstp_b 0111 01010010 10110 ..... ..... ..... @xxx
+xvfrstp_h 0111 01010010 10111 ..... ..... ..... @xxx
+xvfrstpi_b 0111 01101001 10100 ..... ..... ..... @xx_ui5
+xvfrstpi_h 0111 01101001 10101 ..... ..... ..... @xx_ui5
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 7092835d30..011eab46fb 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2170,3 +2170,59 @@ XDO_BITI(xvbitrevi_b, 8, UXB, DO_BITREV)
XDO_BITI(xvbitrevi_h, 16, UXH, DO_BITREV)
XDO_BITI(xvbitrevi_w, 32, UXW, DO_BITREV)
XDO_BITI(xvbitrevi_d, 64, UXD, DO_BITREV)
+
+#define XVFRSTP(NAME, BIT, MASK, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, j, m1, m2, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ m1 = Xk->E(0) & MASK; \
+ for (i = 0; i < max; i++) { \
+ if (Xj->E(i) < 0) { \
+ break; \
+ } \
+ } \
+ Xd->E(m1) = i; \
+ for (j = 0; j < max; j++) { \
+ if (Xj->E(j + max) < 0) { \
+ break; \
+ } \
+ } \
+ m2 = Xk->E(max) & MASK; \
+ Xd->E(m2 + max) = j; \
+}
+
+XVFRSTP(xvfrstp_b, 8, 0xf, XB)
+XVFRSTP(xvfrstp_h, 16, 0x7, XH)
+
+#define XVFRSTPI(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, j, m, max; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (BIT * 2); \
+ m = imm % (max); \
+ for (i = 0; i < max; i++) { \
+ if (Xj->E(i) < 0) { \
+ break; \
+ } \
+ } \
+ Xd->E(m) = i; \
+ for (j = 0; j < max; j++) { \
+ if (Xj->E(j + max) < 0) { \
+ break; \
+ } \
+ } \
+ Xd->E(m + max) = j; \
+}
+
+XVFRSTPI(xvfrstpi_b, 8, XB)
+XVFRSTPI(xvfrstpi_h, 16, XH)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 36/46] target/loongarch: Implement LASX fpu arith instructions
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (34 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 35/46] target/loongarch: Implement xvfrstp Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 37/46] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
` (9 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVF{ADD/SUB/MUL/DIV}.{S/D};
- XVF{MADD/MSUB/NMADD/NMSUB}.{S/D};
- XVF{MAX/MIN}.{S/D};
- XVF{MAXA/MINA}.{S/D};
- XVFLOGB.{S/D};
- XVFCLASS.{S/D};
- XVF{SQRT/RECIP/RSQRT}.{S/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 46 +++++++++
target/loongarch/helper.h | 41 ++++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 55 +++++++++++
target/loongarch/insns.decode | 43 +++++++++
target/loongarch/lasx_helper.c | 99 ++++++++++++++++++++
target/loongarch/lsx_helper.c | 51 +++++-----
target/loongarch/vec.h | 13 +++
7 files changed, 322 insertions(+), 26 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5340609e6f..0e4ec2bd03 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_x_i(DisasContext *ctx, arg_x_i *a, const char *mnemonic)
output(ctx, mnemonic, "x%d, 0x%x", a->xd, a->imm);
}
+static void output_xxxx(DisasContext *ctx, arg_xxxx *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, x%d, x%d, x%d", a->xd, a->xj, a->xk, a->xa);
+}
+
static void output_xxx(DisasContext *ctx, arg_xxx * a, const char *mnemonic)
{
output(ctx, mnemonic, "x%d, x%d, x%d", a->xd, a->xj, a->xk);
@@ -2239,6 +2244,47 @@ INSN_LASX(xvfrstp_h, xxx)
INSN_LASX(xvfrstpi_b, xx_i)
INSN_LASX(xvfrstpi_h, xx_i)
+INSN_LASX(xvfadd_s, xxx)
+INSN_LASX(xvfadd_d, xxx)
+INSN_LASX(xvfsub_s, xxx)
+INSN_LASX(xvfsub_d, xxx)
+INSN_LASX(xvfmul_s, xxx)
+INSN_LASX(xvfmul_d, xxx)
+INSN_LASX(xvfdiv_s, xxx)
+INSN_LASX(xvfdiv_d, xxx)
+
+INSN_LASX(xvfmadd_s, xxxx)
+INSN_LASX(xvfmadd_d, xxxx)
+INSN_LASX(xvfmsub_s, xxxx)
+INSN_LASX(xvfmsub_d, xxxx)
+INSN_LASX(xvfnmadd_s, xxxx)
+INSN_LASX(xvfnmadd_d, xxxx)
+INSN_LASX(xvfnmsub_s, xxxx)
+INSN_LASX(xvfnmsub_d, xxxx)
+
+INSN_LASX(xvfmax_s, xxx)
+INSN_LASX(xvfmax_d, xxx)
+INSN_LASX(xvfmin_s, xxx)
+INSN_LASX(xvfmin_d, xxx)
+
+INSN_LASX(xvfmaxa_s, xxx)
+INSN_LASX(xvfmaxa_d, xxx)
+INSN_LASX(xvfmina_s, xxx)
+INSN_LASX(xvfmina_d, xxx)
+
+INSN_LASX(xvflogb_s, xx)
+INSN_LASX(xvflogb_d, xx)
+
+INSN_LASX(xvfclass_s, xx)
+INSN_LASX(xvfclass_d, xx)
+
+INSN_LASX(xvfsqrt_s, xx)
+INSN_LASX(xvfsqrt_d, xx)
+INSN_LASX(xvfrecip_s, xx)
+INSN_LASX(xvfrecip_d, xx)
+INSN_LASX(xvfrsqrt_s, xx)
+INSN_LASX(xvfrsqrt_d, xx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 4db0cd25d3..2e6e3f2fd3 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1096,3 +1096,44 @@ DEF_HELPER_4(xvfrstp_b, void, env, i32, i32, i32)
DEF_HELPER_4(xvfrstp_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvfrstpi_b, void, env, i32, i32, i32)
DEF_HELPER_4(xvfrstpi_h, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvfadd_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfadd_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfsub_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfsub_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmul_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmul_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfdiv_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfdiv_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(xvfmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfmsub_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfnmadd_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfnmadd_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfnmsub_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfnmsub_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_4(xvfmax_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmax_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmin_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmin_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvfmaxa_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmaxa_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmina_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfmina_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(xvflogb_s, void, env, i32, i32)
+DEF_HELPER_3(xvflogb_d, void, env, i32, i32)
+
+DEF_HELPER_3(xvfclass_s, void, env, i32, i32)
+DEF_HELPER_3(xvfclass_d, void, env, i32, i32)
+
+DEF_HELPER_3(xvfsqrt_s, void, env, i32, i32)
+DEF_HELPER_3(xvfsqrt_d, void, env, i32, i32)
+DEF_HELPER_3(xvfrecip_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrecip_d, void, env, i32, i32)
+DEF_HELPER_3(xvfrsqrt_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrsqrt_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index beeb9b3ff8..b9785be6c5 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -15,6 +15,20 @@
#define CHECK_ASXE
#endif
+static bool gen_xxxx(DisasContext *ctx, arg_xxxx *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
+ TCGv_i32, TCGv_i32))
+{
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+ TCGv_i32 xk = tcg_constant_i32(a->xk);
+ TCGv_i32 xa = tcg_constant_i32(a->xa);
+
+ CHECK_ASXE;
+ func(cpu_env, xd, xj, xk, xa);
+ return true;
+}
+
static bool gen_xxx(DisasContext *ctx, arg_xxx *a,
void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
{
@@ -2409,6 +2423,47 @@ TRANS(xvfrstp_h, gen_xxx, gen_helper_xvfrstp_h)
TRANS(xvfrstpi_b, gen_xx_i, gen_helper_xvfrstpi_b)
TRANS(xvfrstpi_h, gen_xx_i, gen_helper_xvfrstpi_h)
+TRANS(xvfadd_s, gen_xxx, gen_helper_xvfadd_s)
+TRANS(xvfadd_d, gen_xxx, gen_helper_xvfadd_d)
+TRANS(xvfsub_s, gen_xxx, gen_helper_xvfsub_s)
+TRANS(xvfsub_d, gen_xxx, gen_helper_xvfsub_d)
+TRANS(xvfmul_s, gen_xxx, gen_helper_xvfmul_s)
+TRANS(xvfmul_d, gen_xxx, gen_helper_xvfmul_d)
+TRANS(xvfdiv_s, gen_xxx, gen_helper_xvfdiv_s)
+TRANS(xvfdiv_d, gen_xxx, gen_helper_xvfdiv_d)
+
+TRANS(xvfmadd_s, gen_xxxx, gen_helper_xvfmadd_s)
+TRANS(xvfmadd_d, gen_xxxx, gen_helper_xvfmadd_d)
+TRANS(xvfmsub_s, gen_xxxx, gen_helper_xvfmsub_s)
+TRANS(xvfmsub_d, gen_xxxx, gen_helper_xvfmsub_d)
+TRANS(xvfnmadd_s, gen_xxxx, gen_helper_xvfnmadd_s)
+TRANS(xvfnmadd_d, gen_xxxx, gen_helper_xvfnmadd_d)
+TRANS(xvfnmsub_s, gen_xxxx, gen_helper_xvfnmsub_s)
+TRANS(xvfnmsub_d, gen_xxxx, gen_helper_xvfnmsub_d)
+
+TRANS(xvfmax_s, gen_xxx, gen_helper_xvfmax_s)
+TRANS(xvfmax_d, gen_xxx, gen_helper_xvfmax_d)
+TRANS(xvfmin_s, gen_xxx, gen_helper_xvfmin_s)
+TRANS(xvfmin_d, gen_xxx, gen_helper_xvfmin_d)
+
+TRANS(xvfmaxa_s, gen_xxx, gen_helper_xvfmaxa_s)
+TRANS(xvfmaxa_d, gen_xxx, gen_helper_xvfmaxa_d)
+TRANS(xvfmina_s, gen_xxx, gen_helper_xvfmina_s)
+TRANS(xvfmina_d, gen_xxx, gen_helper_xvfmina_d)
+
+TRANS(xvflogb_s, gen_xx, gen_helper_xvflogb_s)
+TRANS(xvflogb_d, gen_xx, gen_helper_xvflogb_d)
+
+TRANS(xvfclass_s, gen_xx, gen_helper_xvfclass_s)
+TRANS(xvfclass_d, gen_xx, gen_helper_xvfclass_d)
+
+TRANS(xvfsqrt_s, gen_xx, gen_helper_xvfsqrt_s)
+TRANS(xvfsqrt_d, gen_xx, gen_helper_xvfsqrt_d)
+TRANS(xvfrecip_s, gen_xx, gen_helper_xvfrecip_s)
+TRANS(xvfrecip_d, gen_xx, gen_helper_xvfrecip_d)
+TRANS(xvfrsqrt_s, gen_xx, gen_helper_xvfrsqrt_s)
+TRANS(xvfrsqrt_d, gen_xx, gen_helper_xvfrsqrt_d)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 387c1e5776..8a5d6a8d45 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1306,6 +1306,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&xr xd rj
&xx_i xd xj imm
&x_i xd imm
+&xxxx xd xj xk xa
#
# LASX Formats
@@ -1322,6 +1323,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx_ui6 .... ........ .... imm:6 xj:5 xd:5 &xx_i
@xx_ui7 .... ........ ... imm:7 xj:5 xd:5 &xx_i
@xx_ui8 .... ........ .. imm:8 xj:5 xd:5 &xx_i
+@xxxx .... ........ xa:5 xk:5 xj:5 xd:5 &xxxx
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1839,6 +1841,47 @@ xvfrstp_h 0111 01010010 10111 ..... ..... ..... @xxx
xvfrstpi_b 0111 01101001 10100 ..... ..... ..... @xx_ui5
xvfrstpi_h 0111 01101001 10101 ..... ..... ..... @xx_ui5
+xvfadd_s 0111 01010011 00001 ..... ..... ..... @xxx
+xvfadd_d 0111 01010011 00010 ..... ..... ..... @xxx
+xvfsub_s 0111 01010011 00101 ..... ..... ..... @xxx
+xvfsub_d 0111 01010011 00110 ..... ..... ..... @xxx
+xvfmul_s 0111 01010011 10001 ..... ..... ..... @xxx
+xvfmul_d 0111 01010011 10010 ..... ..... ..... @xxx
+xvfdiv_s 0111 01010011 10101 ..... ..... ..... @xxx
+xvfdiv_d 0111 01010011 10110 ..... ..... ..... @xxx
+
+xvfmadd_s 0000 10100001 ..... ..... ..... ..... @xxxx
+xvfmadd_d 0000 10100010 ..... ..... ..... ..... @xxxx
+xvfmsub_s 0000 10100101 ..... ..... ..... ..... @xxxx
+xvfmsub_d 0000 10100110 ..... ..... ..... ..... @xxxx
+xvfnmadd_s 0000 10101001 ..... ..... ..... ..... @xxxx
+xvfnmadd_d 0000 10101010 ..... ..... ..... ..... @xxxx
+xvfnmsub_s 0000 10101101 ..... ..... ..... ..... @xxxx
+xvfnmsub_d 0000 10101110 ..... ..... ..... ..... @xxxx
+
+xvfmax_s 0111 01010011 11001 ..... ..... ..... @xxx
+xvfmax_d 0111 01010011 11010 ..... ..... ..... @xxx
+xvfmin_s 0111 01010011 11101 ..... ..... ..... @xxx
+xvfmin_d 0111 01010011 11110 ..... ..... ..... @xxx
+
+xvfmaxa_s 0111 01010100 00001 ..... ..... ..... @xxx
+xvfmaxa_d 0111 01010100 00010 ..... ..... ..... @xxx
+xvfmina_s 0111 01010100 00101 ..... ..... ..... @xxx
+xvfmina_d 0111 01010100 00110 ..... ..... ..... @xxx
+
+xvflogb_s 0111 01101001 11001 10001 ..... ..... @xx
+xvflogb_d 0111 01101001 11001 10010 ..... ..... @xx
+
+xvfclass_s 0111 01101001 11001 10101 ..... ..... @xx
+xvfclass_d 0111 01101001 11001 10110 ..... ..... @xx
+
+xvfsqrt_s 0111 01101001 11001 11001 ..... ..... @xx
+xvfsqrt_d 0111 01101001 11001 11010 ..... ..... @xx
+xvfrecip_s 0111 01101001 11001 11101 ..... ..... @xx
+xvfrecip_d 0111 01101001 11001 11110 ..... ..... @xx
+xvfrsqrt_s 0111 01101001 11010 00001 ..... ..... @xx
+xvfrsqrt_d 0111 01101001 11010 00010 ..... ..... @xx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 011eab46fb..316ebd3463 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -9,6 +9,7 @@
#include "cpu.h"
#include "exec/exec-all.h"
#include "exec/helper-proto.h"
+#include "fpu/softfloat.h"
#include "internals.h"
#include "vec.h"
@@ -2226,3 +2227,101 @@ void HELPER(NAME)(CPULoongArchState *env, \
XVFRSTPI(xvfrstpi_b, 8, XB)
XVFRSTPI(xvfrstpi_h, 16, XH)
+
+#define XDO_3OP_F(NAME, BIT, E, FN) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ vec_clear_cause(env); \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = FN(Xj->E(i), Xk->E(i), &env->fp_status); \
+ vec_update_fcsr0(env, GETPC()); \
+ } \
+}
+
+XDO_3OP_F(xvfadd_s, 32, UXW, float32_add)
+XDO_3OP_F(xvfadd_d, 64, UXD, float64_add)
+XDO_3OP_F(xvfsub_s, 32, UXW, float32_sub)
+XDO_3OP_F(xvfsub_d, 64, UXD, float64_sub)
+XDO_3OP_F(xvfmul_s, 32, UXW, float32_mul)
+XDO_3OP_F(xvfmul_d, 64, UXD, float64_mul)
+XDO_3OP_F(xvfdiv_s, 32, UXW, float32_div)
+XDO_3OP_F(xvfdiv_d, 64, UXD, float64_div)
+XDO_3OP_F(xvfmax_s, 32, UXW, float32_maxnum)
+XDO_3OP_F(xvfmax_d, 64, UXD, float64_maxnum)
+XDO_3OP_F(xvfmin_s, 32, UXW, float32_minnum)
+XDO_3OP_F(xvfmin_d, 64, UXD, float64_minnum)
+XDO_3OP_F(xvfmaxa_s, 32, UXW, float32_maxnummag)
+XDO_3OP_F(xvfmaxa_d, 64, UXD, float64_maxnummag)
+XDO_3OP_F(xvfmina_s, 32, UXW, float32_minnummag)
+XDO_3OP_F(xvfmina_d, 64, UXD, float64_minnummag)
+
+#define XDO_4OP_F(NAME, BIT, E, FN, flags) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk, uint32_t xa) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ XReg *Xa = &(env->fpr[xa].xreg); \
+ \
+ vec_clear_cause(env); \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = FN(Xj->E(i), Xk->E(i), Xa->E(i), flags, &env->fp_status); \
+ vec_update_fcsr0(env, GETPC()); \
+ } \
+}
+
+XDO_4OP_F(xvfmadd_s, 32, UXW, float32_muladd, 0)
+XDO_4OP_F(xvfmadd_d, 64, UXD, float64_muladd, 0)
+XDO_4OP_F(xvfmsub_s, 32, UXW, float32_muladd, float_muladd_negate_c)
+XDO_4OP_F(xvfmsub_d, 64, UXD, float64_muladd, float_muladd_negate_c)
+XDO_4OP_F(xvfnmadd_s, 32, UXW, float32_muladd, float_muladd_negate_result)
+XDO_4OP_F(xvfnmadd_d, 64, UXD, float64_muladd, float_muladd_negate_result)
+XDO_4OP_F(xvfnmsub_s, 32, UXW, float32_muladd,
+ float_muladd_negate_c | float_muladd_negate_result)
+XDO_4OP_F(xvfnmsub_d, 64, UXD, float64_muladd,
+ float_muladd_negate_c | float_muladd_negate_result)
+
+#define XDO_2OP_F(NAME, BIT, E, FN) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ vec_clear_cause(env); \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = FN(env, Xj->E(i)); \
+ } \
+}
+
+#define XFCLASS(NAME, BIT, E, FN) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = FN(env, Xj->E(i)); \
+ } \
+}
+
+XFCLASS(xvfclass_s, 32, UXW, helper_fclass_s)
+XFCLASS(xvfclass_d, 64, UXD, helper_fclass_d)
+
+XDO_2OP_F(xvflogb_s, 32, UXW, do_flogb_32)
+XDO_2OP_F(xvflogb_d, 64, UXD, do_flogb_64)
+XDO_2OP_F(xvfsqrt_s, 32, UXW, do_fsqrt_32)
+XDO_2OP_F(xvfsqrt_d, 64, UXD, do_fsqrt_64)
+XDO_2OP_F(xvfrecip_s, 32, UXW, do_frecip_32)
+XDO_2OP_F(xvfrecip_d, 64, UXD, do_frecip_64)
+XDO_2OP_F(xvfrsqrt_s, 32, UXW, do_frsqrt_32)
+XDO_2OP_F(xvfrsqrt_d, 64, UXD, do_frsqrt_64)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index b9fdcd3ed7..446a1bdfe3 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2029,8 +2029,7 @@ void HELPER(NAME)(CPULoongArchState *env, \
VFRSTPI(vfrstpi_b, 8, B)
VFRSTPI(vfrstpi_h, 16, H)
-static void vec_update_fcsr0_mask(CPULoongArchState *env,
- uintptr_t pc, int mask)
+void vec_update_fcsr0_mask(CPULoongArchState *env, uintptr_t pc, int mask)
{
int flags = get_float_exception_flags(&env->fp_status);
@@ -2050,12 +2049,12 @@ static void vec_update_fcsr0_mask(CPULoongArchState *env,
}
}
-static void vec_update_fcsr0(CPULoongArchState *env, uintptr_t pc)
+void vec_update_fcsr0(CPULoongArchState *env, uintptr_t pc)
{
vec_update_fcsr0_mask(env, pc, 0);
}
-static inline void vec_clear_cause(CPULoongArchState *env)
+inline void vec_clear_cause(CPULoongArchState *env)
{
SET_FP_CAUSE(env->fcsr0, 0);
}
@@ -2134,19 +2133,19 @@ void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
} \
}
-#define FLOGB(BIT, T) \
-static T do_flogb_## BIT(CPULoongArchState *env, T fj) \
-{ \
- T fp, fd; \
- float_status *status = &env->fp_status; \
- FloatRoundMode old_mode = get_float_rounding_mode(status); \
- \
- set_float_rounding_mode(float_round_down, status); \
- fp = float ## BIT ##_log2(fj, status); \
- fd = float ## BIT ##_round_to_int(fp, status); \
- set_float_rounding_mode(old_mode, status); \
- vec_update_fcsr0_mask(env, GETPC(), float_flag_inexact); \
- return fd; \
+#define FLOGB(BIT, T) \
+T do_flogb_## BIT(CPULoongArchState *env, T fj) \
+{ \
+ T fp, fd; \
+ float_status *status = &env->fp_status; \
+ FloatRoundMode old_mode = get_float_rounding_mode(status); \
+ \
+ set_float_rounding_mode(float_round_down, status); \
+ fp = float ## BIT ##_log2(fj, status); \
+ fd = float ## BIT ##_round_to_int(fp, status); \
+ set_float_rounding_mode(old_mode, status); \
+ vec_update_fcsr0_mask(env, GETPC(), float_flag_inexact); \
+ return fd; \
}
FLOGB(32, uint32_t)
@@ -2167,20 +2166,20 @@ void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
FCLASS(vfclass_s, 32, UW, helper_fclass_s)
FCLASS(vfclass_d, 64, UD, helper_fclass_d)
-#define FSQRT(BIT, T) \
-static T do_fsqrt_## BIT(CPULoongArchState *env, T fj) \
-{ \
- T fd; \
- fd = float ## BIT ##_sqrt(fj, &env->fp_status); \
- vec_update_fcsr0(env, GETPC()); \
- return fd; \
+#define FSQRT(BIT, T) \
+T do_fsqrt_## BIT(CPULoongArchState *env, T fj) \
+{ \
+ T fd; \
+ fd = float ## BIT ##_sqrt(fj, &env->fp_status); \
+ vec_update_fcsr0(env, GETPC()); \
+ return fd; \
}
FSQRT(32, uint32_t)
FSQRT(64, uint64_t)
#define FRECIP(BIT, T) \
-static T do_frecip_## BIT(CPULoongArchState *env, T fj) \
+T do_frecip_## BIT(CPULoongArchState *env, T fj) \
{ \
T fd; \
fd = float ## BIT ##_div(float ## BIT ##_one, fj, &env->fp_status); \
@@ -2192,7 +2191,7 @@ FRECIP(32, uint32_t)
FRECIP(64, uint64_t)
#define FRSQRT(BIT, T) \
-static T do_frsqrt_## BIT(CPULoongArchState *env, T fj) \
+T do_frsqrt_## BIT(CPULoongArchState *env, T fj) \
{ \
T fd, fp; \
fp = float ## BIT ##_sqrt(fj, &env->fp_status); \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 4d9c4eb85f..583997d576 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -96,4 +96,17 @@ uint64_t do_vmskltz_w(int64_t val);
uint64_t do_vmskltz_d(int64_t val);
uint64_t do_vmskez_b(uint64_t val);
+void vec_update_fcsr0_mask(CPULoongArchState *env, uintptr_t pc, int mask);
+void vec_update_fcsr0(CPULoongArchState *env, uintptr_t pc);
+void vec_clear_cause(CPULoongArchState *env);
+
+uint32_t do_flogb_32(CPULoongArchState *env, uint32_t fj);
+uint64_t do_flogb_64(CPULoongArchState *env, uint64_t fj);
+uint32_t do_fsqrt_32(CPULoongArchState *env, uint32_t fj);
+uint64_t do_fsqrt_64(CPULoongArchState *env, uint64_t fj);
+uint32_t do_frecip_32(CPULoongArchState *env, uint32_t fj);
+uint64_t do_frecip_64(CPULoongArchState *env, uint64_t fj);
+uint32_t do_frsqrt_32(CPULoongArchState *env, uint32_t fj);
+uint64_t do_frsqrt_64(CPULoongArchState *env, uint64_t fj);
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 37/46] target/loongarch: Implement LASX fpu fcvt instructions
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (35 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 36/46] target/loongarch: Implement LASX fpu arith instructions Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 38/46] target/loongarch: Implement xvseq xvsle xvslt Song Gao
` (8 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVFCVT{L/H}.{S.H/D.S};
- XVFCVT.{H.S/S.D};
- XVFRINT[{RNE/RZ/RP/RM}].{S/D};
- XVFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
- XVFTINT[RZ].{WU.S/LU.D};
- XVFTINT[{RNE/RZ/RP/RM}].W.D;
- XVFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
- XVFFINT.{S.W/D.L}[U];
- X[CVFFINT.S.L, VFFINT{L/H}.D.W.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 56 +++
target/loongarch/helper.h | 56 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 56 +++
target/loongarch/insns.decode | 58 +++
target/loongarch/lasx_helper.c | 398 +++++++++++++++++++
5 files changed, 624 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0e4ec2bd03..65eccc8598 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2285,6 +2285,62 @@ INSN_LASX(xvfrecip_d, xx)
INSN_LASX(xvfrsqrt_s, xx)
INSN_LASX(xvfrsqrt_d, xx)
+INSN_LASX(xvfcvtl_s_h, xx)
+INSN_LASX(xvfcvth_s_h, xx)
+INSN_LASX(xvfcvtl_d_s, xx)
+INSN_LASX(xvfcvth_d_s, xx)
+INSN_LASX(xvfcvt_h_s, xxx)
+INSN_LASX(xvfcvt_s_d, xxx)
+
+INSN_LASX(xvfrint_s, xx)
+INSN_LASX(xvfrint_d, xx)
+INSN_LASX(xvfrintrm_s, xx)
+INSN_LASX(xvfrintrm_d, xx)
+INSN_LASX(xvfrintrp_s, xx)
+INSN_LASX(xvfrintrp_d, xx)
+INSN_LASX(xvfrintrz_s, xx)
+INSN_LASX(xvfrintrz_d, xx)
+INSN_LASX(xvfrintrne_s, xx)
+INSN_LASX(xvfrintrne_d, xx)
+
+INSN_LASX(xvftint_w_s, xx)
+INSN_LASX(xvftint_l_d, xx)
+INSN_LASX(xvftintrm_w_s, xx)
+INSN_LASX(xvftintrm_l_d, xx)
+INSN_LASX(xvftintrp_w_s, xx)
+INSN_LASX(xvftintrp_l_d, xx)
+INSN_LASX(xvftintrz_w_s, xx)
+INSN_LASX(xvftintrz_l_d, xx)
+INSN_LASX(xvftintrne_w_s, xx)
+INSN_LASX(xvftintrne_l_d, xx)
+INSN_LASX(xvftint_wu_s, xx)
+INSN_LASX(xvftint_lu_d, xx)
+INSN_LASX(xvftintrz_wu_s, xx)
+INSN_LASX(xvftintrz_lu_d, xx)
+INSN_LASX(xvftint_w_d, xxx)
+INSN_LASX(xvftintrm_w_d, xxx)
+INSN_LASX(xvftintrp_w_d, xxx)
+INSN_LASX(xvftintrz_w_d, xxx)
+INSN_LASX(xvftintrne_w_d, xxx)
+INSN_LASX(xvftintl_l_s, xx)
+INSN_LASX(xvftinth_l_s, xx)
+INSN_LASX(xvftintrml_l_s, xx)
+INSN_LASX(xvftintrmh_l_s, xx)
+INSN_LASX(xvftintrpl_l_s, xx)
+INSN_LASX(xvftintrph_l_s, xx)
+INSN_LASX(xvftintrzl_l_s, xx)
+INSN_LASX(xvftintrzh_l_s, xx)
+INSN_LASX(xvftintrnel_l_s, xx)
+INSN_LASX(xvftintrneh_l_s, xx)
+
+INSN_LASX(xvffint_s_w, xx)
+INSN_LASX(xvffint_s_wu, xx)
+INSN_LASX(xvffint_d_l, xx)
+INSN_LASX(xvffint_d_lu, xx)
+INSN_LASX(xvffintl_d_w, xx)
+INSN_LASX(xvffinth_d_w, xx)
+INSN_LASX(xvffint_s_l, xxx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 2e6e3f2fd3..d30ea7f6a4 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1137,3 +1137,59 @@ DEF_HELPER_3(xvfrecip_s, void, env, i32, i32)
DEF_HELPER_3(xvfrecip_d, void, env, i32, i32)
DEF_HELPER_3(xvfrsqrt_s, void, env, i32, i32)
DEF_HELPER_3(xvfrsqrt_d, void, env, i32, i32)
+
+DEF_HELPER_3(xvfcvtl_s_h, void, env, i32, i32)
+DEF_HELPER_3(xvfcvth_s_h, void, env, i32, i32)
+DEF_HELPER_3(xvfcvtl_d_s, void, env, i32, i32)
+DEF_HELPER_3(xvfcvth_d_s, void, env, i32, i32)
+DEF_HELPER_4(xvfcvt_h_s, void, env, i32, i32, i32)
+DEF_HELPER_4(xvfcvt_s_d, void, env, i32, i32, i32)
+
+DEF_HELPER_3(xvfrintrne_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrne_d, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrz_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrz_d, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrp_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrp_d, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrm_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrintrm_d, void, env, i32, i32)
+DEF_HELPER_3(xvfrint_s, void, env, i32, i32)
+DEF_HELPER_3(xvfrint_d, void, env, i32, i32)
+
+DEF_HELPER_3(xvftintrne_w_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrne_l_d, void, env, i32, i32)
+DEF_HELPER_3(xvftintrz_w_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrz_l_d, void, env, i32, i32)
+DEF_HELPER_3(xvftintrp_w_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrp_l_d, void, env, i32, i32)
+DEF_HELPER_3(xvftintrm_w_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrm_l_d, void, env, i32, i32)
+DEF_HELPER_3(xvftint_w_s, void, env, i32, i32)
+DEF_HELPER_3(xvftint_l_d, void, env, i32, i32)
+DEF_HELPER_3(xvftintrz_wu_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrz_lu_d, void, env, i32, i32)
+DEF_HELPER_3(xvftint_wu_s, void, env, i32, i32)
+DEF_HELPER_3(xvftint_lu_d, void, env, i32, i32)
+DEF_HELPER_4(xvftintrne_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvftintrz_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvftintrp_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvftintrm_w_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvftint_w_d, void, env, i32, i32, i32)
+DEF_HELPER_3(xvftintrnel_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrneh_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrzl_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrzh_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrpl_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrph_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrml_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintrmh_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftintl_l_s, void, env, i32, i32)
+DEF_HELPER_3(xvftinth_l_s, void, env, i32, i32)
+
+DEF_HELPER_3(xvffint_s_w, void, env, i32, i32)
+DEF_HELPER_3(xvffint_d_l, void, env, i32, i32)
+DEF_HELPER_3(xvffint_s_wu, void, env, i32, i32)
+DEF_HELPER_3(xvffint_d_lu, void, env, i32, i32)
+DEF_HELPER_3(xvffintl_d_w, void, env, i32, i32)
+DEF_HELPER_3(xvffinth_d_w, void, env, i32, i32)
+DEF_HELPER_4(xvffint_s_l, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index b9785be6c5..998c07b358 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2464,6 +2464,62 @@ TRANS(xvfrecip_d, gen_xx, gen_helper_xvfrecip_d)
TRANS(xvfrsqrt_s, gen_xx, gen_helper_xvfrsqrt_s)
TRANS(xvfrsqrt_d, gen_xx, gen_helper_xvfrsqrt_d)
+TRANS(xvfcvtl_s_h, gen_xx, gen_helper_xvfcvtl_s_h)
+TRANS(xvfcvth_s_h, gen_xx, gen_helper_xvfcvth_s_h)
+TRANS(xvfcvtl_d_s, gen_xx, gen_helper_xvfcvtl_d_s)
+TRANS(xvfcvth_d_s, gen_xx, gen_helper_xvfcvth_d_s)
+TRANS(xvfcvt_h_s, gen_xxx, gen_helper_xvfcvt_h_s)
+TRANS(xvfcvt_s_d, gen_xxx, gen_helper_xvfcvt_s_d)
+
+TRANS(xvfrintrne_s, gen_xx, gen_helper_xvfrintrne_s)
+TRANS(xvfrintrne_d, gen_xx, gen_helper_xvfrintrne_d)
+TRANS(xvfrintrz_s, gen_xx, gen_helper_xvfrintrz_s)
+TRANS(xvfrintrz_d, gen_xx, gen_helper_xvfrintrz_d)
+TRANS(xvfrintrp_s, gen_xx, gen_helper_xvfrintrp_s)
+TRANS(xvfrintrp_d, gen_xx, gen_helper_xvfrintrp_d)
+TRANS(xvfrintrm_s, gen_xx, gen_helper_xvfrintrm_s)
+TRANS(xvfrintrm_d, gen_xx, gen_helper_xvfrintrm_d)
+TRANS(xvfrint_s, gen_xx, gen_helper_xvfrint_s)
+TRANS(xvfrint_d, gen_xx, gen_helper_xvfrint_d)
+
+TRANS(xvftintrne_w_s, gen_xx, gen_helper_xvftintrne_w_s)
+TRANS(xvftintrne_l_d, gen_xx, gen_helper_xvftintrne_l_d)
+TRANS(xvftintrz_w_s, gen_xx, gen_helper_xvftintrz_w_s)
+TRANS(xvftintrz_l_d, gen_xx, gen_helper_xvftintrz_l_d)
+TRANS(xvftintrp_w_s, gen_xx, gen_helper_xvftintrp_w_s)
+TRANS(xvftintrp_l_d, gen_xx, gen_helper_xvftintrp_l_d)
+TRANS(xvftintrm_w_s, gen_xx, gen_helper_xvftintrm_w_s)
+TRANS(xvftintrm_l_d, gen_xx, gen_helper_xvftintrm_l_d)
+TRANS(xvftint_w_s, gen_xx, gen_helper_xvftint_w_s)
+TRANS(xvftint_l_d, gen_xx, gen_helper_xvftint_l_d)
+TRANS(xvftintrz_wu_s, gen_xx, gen_helper_xvftintrz_wu_s)
+TRANS(xvftintrz_lu_d, gen_xx, gen_helper_xvftintrz_lu_d)
+TRANS(xvftint_wu_s, gen_xx, gen_helper_xvftint_wu_s)
+TRANS(xvftint_lu_d, gen_xx, gen_helper_xvftint_lu_d)
+TRANS(xvftintrne_w_d, gen_xxx, gen_helper_xvftintrne_w_d)
+TRANS(xvftintrz_w_d, gen_xxx, gen_helper_xvftintrz_w_d)
+TRANS(xvftintrp_w_d, gen_xxx, gen_helper_xvftintrp_w_d)
+TRANS(xvftintrm_w_d, gen_xxx, gen_helper_xvftintrm_w_d)
+TRANS(xvftint_w_d, gen_xxx, gen_helper_xvftint_w_d)
+TRANS(xvftintrnel_l_s, gen_xx, gen_helper_xvftintrnel_l_s)
+TRANS(xvftintrneh_l_s, gen_xx, gen_helper_xvftintrneh_l_s)
+TRANS(xvftintrzl_l_s, gen_xx, gen_helper_xvftintrzl_l_s)
+TRANS(xvftintrzh_l_s, gen_xx, gen_helper_xvftintrzh_l_s)
+TRANS(xvftintrpl_l_s, gen_xx, gen_helper_xvftintrpl_l_s)
+TRANS(xvftintrph_l_s, gen_xx, gen_helper_xvftintrph_l_s)
+TRANS(xvftintrml_l_s, gen_xx, gen_helper_xvftintrml_l_s)
+TRANS(xvftintrmh_l_s, gen_xx, gen_helper_xvftintrmh_l_s)
+TRANS(xvftintl_l_s, gen_xx, gen_helper_xvftintl_l_s)
+TRANS(xvftinth_l_s, gen_xx, gen_helper_xvftinth_l_s)
+
+TRANS(xvffint_s_w, gen_xx, gen_helper_xvffint_s_w)
+TRANS(xvffint_d_l, gen_xx, gen_helper_xvffint_d_l)
+TRANS(xvffint_s_wu, gen_xx, gen_helper_xvffint_s_wu)
+TRANS(xvffint_d_lu, gen_xx, gen_helper_xvffint_d_lu)
+TRANS(xvffintl_d_w, gen_xx, gen_helper_xvffintl_d_w)
+TRANS(xvffinth_d_w, gen_xx, gen_helper_xvffinth_d_w)
+TRANS(xvffint_s_l, gen_xxx, gen_helper_xvffint_s_l)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8a5d6a8d45..59b79573e5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1882,6 +1882,64 @@ xvfrecip_d 0111 01101001 11001 11110 ..... ..... @xx
xvfrsqrt_s 0111 01101001 11010 00001 ..... ..... @xx
xvfrsqrt_d 0111 01101001 11010 00010 ..... ..... @xx
+xvfcvtl_s_h 0111 01101001 11011 11010 ..... ..... @xx
+xvfcvth_s_h 0111 01101001 11011 11011 ..... ..... @xx
+xvfcvtl_d_s 0111 01101001 11011 11100 ..... ..... @xx
+xvfcvth_d_s 0111 01101001 11011 11101 ..... ..... @xx
+xvfcvt_h_s 0111 01010100 01100 ..... ..... ..... @xxx
+xvfcvt_s_d 0111 01010100 01101 ..... ..... ..... @xxx
+
+xvfrintrne_s 0111 01101001 11010 11101 ..... ..... @xx
+xvfrintrne_d 0111 01101001 11010 11110 ..... ..... @xx
+xvfrintrz_s 0111 01101001 11010 11001 ..... ..... @xx
+xvfrintrz_d 0111 01101001 11010 11010 ..... ..... @xx
+xvfrintrp_s 0111 01101001 11010 10101 ..... ..... @xx
+xvfrintrp_d 0111 01101001 11010 10110 ..... ..... @xx
+xvfrintrm_s 0111 01101001 11010 10001 ..... ..... @xx
+xvfrintrm_d 0111 01101001 11010 10010 ..... ..... @xx
+xvfrint_s 0111 01101001 11010 01101 ..... ..... @xx
+xvfrint_d 0111 01101001 11010 01110 ..... ..... @xx
+
+xvftintrne_w_s 0111 01101001 11100 10100 ..... ..... @xx
+xvftintrne_l_d 0111 01101001 11100 10101 ..... ..... @xx
+xvftintrz_w_s 0111 01101001 11100 10010 ..... ..... @xx
+xvftintrz_l_d 0111 01101001 11100 10011 ..... ..... @xx
+xvftintrp_w_s 0111 01101001 11100 10000 ..... ..... @xx
+xvftintrp_l_d 0111 01101001 11100 10001 ..... ..... @xx
+xvftintrm_w_s 0111 01101001 11100 01110 ..... ..... @xx
+xvftintrm_l_d 0111 01101001 11100 01111 ..... ..... @xx
+xvftint_w_s 0111 01101001 11100 01100 ..... ..... @xx
+xvftint_l_d 0111 01101001 11100 01101 ..... ..... @xx
+xvftintrz_wu_s 0111 01101001 11100 11100 ..... ..... @xx
+xvftintrz_lu_d 0111 01101001 11100 11101 ..... ..... @xx
+xvftint_wu_s 0111 01101001 11100 10110 ..... ..... @xx
+xvftint_lu_d 0111 01101001 11100 10111 ..... ..... @xx
+
+xvftintrne_w_d 0111 01010100 10111 ..... ..... ..... @xxx
+xvftintrz_w_d 0111 01010100 10110 ..... ..... ..... @xxx
+xvftintrp_w_d 0111 01010100 10101 ..... ..... ..... @xxx
+xvftintrm_w_d 0111 01010100 10100 ..... ..... ..... @xxx
+xvftint_w_d 0111 01010100 10011 ..... ..... ..... @xxx
+
+xvftintrnel_l_s 0111 01101001 11101 01000 ..... ..... @xx
+xvftintrneh_l_s 0111 01101001 11101 01001 ..... ..... @xx
+xvftintrzl_l_s 0111 01101001 11101 00110 ..... ..... @xx
+xvftintrzh_l_s 0111 01101001 11101 00111 ..... ..... @xx
+xvftintrpl_l_s 0111 01101001 11101 00100 ..... ..... @xx
+xvftintrph_l_s 0111 01101001 11101 00101 ..... ..... @xx
+xvftintrml_l_s 0111 01101001 11101 00010 ..... ..... @xx
+xvftintrmh_l_s 0111 01101001 11101 00011 ..... ..... @xx
+xvftintl_l_s 0111 01101001 11101 00000 ..... ..... @xx
+xvftinth_l_s 0111 01101001 11101 00001 ..... ..... @xx
+
+xvffint_s_w 0111 01101001 11100 00000 ..... ..... @xx
+xvffint_d_l 0111 01101001 11100 00010 ..... ..... @xx
+xvffint_s_wu 0111 01101001 11100 00001 ..... ..... @xx
+xvffint_d_lu 0111 01101001 11100 00011 ..... ..... @xx
+xvffintl_d_w 0111 01101001 11100 00100 ..... ..... @xx
+xvffinth_d_w 0111 01101001 11100 00101 ..... ..... @xx
+xvffint_s_l 0111 01010100 10000 ..... ..... ..... @xxx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 316ebd3463..5cc917fdc3 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2325,3 +2325,401 @@ XDO_2OP_F(xvfrecip_s, 32, UXW, do_frecip_32)
XDO_2OP_F(xvfrecip_d, 64, UXD, do_frecip_64)
XDO_2OP_F(xvfrsqrt_s, 32, UXW, do_frsqrt_32)
XDO_2OP_F(xvfrsqrt_d, 64, UXD, do_frsqrt_64)
+
+void HELPER(xvfcvtl_s_h)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ max = LASX_LEN / (32 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.UXW(i) = float16_to_float32(Xj->UXH(i), true, &env->fp_status);
+ temp.UXW(i + max) = float16_to_float32(Xj->UXH(i + max * 2),
+ true, &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvfcvtl_d_s)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ max = LASX_LEN / (64 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.UXD(i) = float32_to_float64(Xj->UXW(i), &env->fp_status);
+ temp.UXD(i + max) = float32_to_float64(Xj->UXW(i + max * 2),
+ &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvfcvth_s_h)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ max = LASX_LEN / (32 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.UXW(i) = float16_to_float32(Xj->UXH(i + max),
+ true, &env->fp_status);
+ temp.UXW(i + max) = float16_to_float32(Xj->UXH(i + max * 3),
+ true, &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvfcvth_d_s)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ max = LASX_LEN / (64 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.UXD(i) = float32_to_float64(Xj->UXW(i + max), &env->fp_status);
+ temp.UXD(i + max) = float32_to_float64(Xj->UXW(i + max * 3),
+ &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvfcvt_h_s)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ max = LASX_LEN / (32 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.UXH(i + max) = float32_to_float16(Xj->UXW(i),
+ true, &env->fp_status);
+ temp.UXH(i) = float32_to_float16(Xk->UXW(i), true, &env->fp_status);
+ temp.UXH(i + max * 3) = float32_to_float16(Xj->UXW(i + max),
+ true, &env->fp_status);
+ temp.UXH(i + max * 2) = float32_to_float16(Xk->UXW(i + max),
+ true, &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvfcvt_s_d)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ max = LASX_LEN / (64 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.UXW(i + max) = float64_to_float32(Xj->UXD(i), &env->fp_status);
+ temp.UXW(i) = float64_to_float32(Xk->UXD(i), &env->fp_status);
+ temp.UXW(i + max * 3) = float64_to_float32(Xj->UXD(i + max),
+ &env->fp_status);
+ temp.UXW(i + max * 2) = float64_to_float32(Xk->UXD(i + max),
+ &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvfrint_s)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ vec_clear_cause(env);
+ for (i = 0; i < LASX_LEN / 32; i++) {
+ Xd->XW(i) = float32_round_to_int(Xj->UXW(i), &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+}
+
+void HELPER(xvfrint_d)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ vec_clear_cause(env);
+ for (i = 0; i < LASX_LEN / 64; i++) {
+ Xd->XD(i) = float64_round_to_int(Xj->UXD(i), &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+}
+
+#define XFCVT_2OP(NAME, BIT, E, MODE) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ vec_clear_cause(env); \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status); \
+ set_float_rounding_mode(MODE, &env->fp_status); \
+ Xd->E(i) = float## BIT ## _round_to_int(Xj->E(i), &env->fp_status); \
+ set_float_rounding_mode(old_mode, &env->fp_status); \
+ vec_update_fcsr0(env, GETPC()); \
+ } \
+}
+
+XFCVT_2OP(xvfrintrne_s, 32, UXW, float_round_nearest_even)
+XFCVT_2OP(xvfrintrne_d, 64, UXD, float_round_nearest_even)
+XFCVT_2OP(xvfrintrz_s, 32, UXW, float_round_to_zero)
+XFCVT_2OP(xvfrintrz_d, 64, UXD, float_round_to_zero)
+XFCVT_2OP(xvfrintrp_s, 32, UXW, float_round_up)
+XFCVT_2OP(xvfrintrp_d, 64, UXD, float_round_up)
+XFCVT_2OP(xvfrintrm_s, 32, UXW, float_round_down)
+XFCVT_2OP(xvfrintrm_d, 64, UXD, float_round_down)
+
+#define XFTINT(NAME, FMT1, FMT2, T1, T2, MODE) \
+static T2 do_xftint ## NAME(CPULoongArchState *env, T1 fj) \
+{ \
+ T2 fd; \
+ FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status); \
+ \
+ set_float_rounding_mode(MODE, &env->fp_status); \
+ fd = do_## FMT1 ##_to_## FMT2(env, fj); \
+ set_float_rounding_mode(old_mode, &env->fp_status); \
+ return fd; \
+}
+
+#define XDO_FTINT(FMT1, FMT2, T1, T2) \
+static T2 do_## FMT1 ##_to_## FMT2(CPULoongArchState *env, T1 fj) \
+{ \
+ T2 fd; \
+ \
+ fd = FMT1 ##_to_## FMT2(fj, &env->fp_status); \
+ if (get_float_exception_flags(&env->fp_status) & (float_flag_invalid)) { \
+ if (FMT1 ##_is_any_nan(fj)) { \
+ fd = 0; \
+ } \
+ } \
+ vec_update_fcsr0(env, GETPC()); \
+ return fd; \
+}
+
+XDO_FTINT(float32, int32, uint32_t, uint32_t)
+XDO_FTINT(float64, int64, uint64_t, uint64_t)
+XDO_FTINT(float32, uint32, uint32_t, uint32_t)
+XDO_FTINT(float64, uint64, uint64_t, uint64_t)
+XDO_FTINT(float64, int32, uint64_t, uint32_t)
+XDO_FTINT(float32, int64, uint32_t, uint64_t)
+
+XFTINT(rne_w_s, float32, int32, uint32_t, uint32_t, float_round_nearest_even)
+XFTINT(rne_l_d, float64, int64, uint64_t, uint64_t, float_round_nearest_even)
+XFTINT(rp_w_s, float32, int32, uint32_t, uint32_t, float_round_up)
+XFTINT(rp_l_d, float64, int64, uint64_t, uint64_t, float_round_up)
+XFTINT(rz_w_s, float32, int32, uint32_t, uint32_t, float_round_to_zero)
+XFTINT(rz_l_d, float64, int64, uint64_t, uint64_t, float_round_to_zero)
+XFTINT(rm_w_s, float32, int32, uint32_t, uint32_t, float_round_down)
+XFTINT(rm_l_d, float64, int64, uint64_t, uint64_t, float_round_down)
+
+XDO_2OP_F(xvftintrne_w_s, 32, UXW, do_xftintrne_w_s)
+XDO_2OP_F(xvftintrne_l_d, 64, UXD, do_xftintrne_l_d)
+XDO_2OP_F(xvftintrp_w_s, 32, UXW, do_xftintrp_w_s)
+XDO_2OP_F(xvftintrp_l_d, 64, UXD, do_xftintrp_l_d)
+XDO_2OP_F(xvftintrz_w_s, 32, UXW, do_xftintrz_w_s)
+XDO_2OP_F(xvftintrz_l_d, 64, UXD, do_xftintrz_l_d)
+XDO_2OP_F(xvftintrm_w_s, 32, UXW, do_xftintrm_w_s)
+XDO_2OP_F(xvftintrm_l_d, 64, UXD, do_xftintrm_l_d)
+XDO_2OP_F(xvftint_w_s, 32, UXW, do_float32_to_int32)
+XDO_2OP_F(xvftint_l_d, 64, UXD, do_float64_to_int64)
+
+XFTINT(rz_wu_s, float32, uint32, uint32_t, uint32_t, float_round_to_zero)
+XFTINT(rz_lu_d, float64, uint64, uint64_t, uint64_t, float_round_to_zero)
+
+XDO_2OP_F(xvftintrz_wu_s, 32, UXW, do_xftintrz_wu_s)
+XDO_2OP_F(xvftintrz_lu_d, 64, UXD, do_xftintrz_lu_d)
+XDO_2OP_F(xvftint_wu_s, 32, UXW, do_float32_to_uint32)
+XDO_2OP_F(xvftint_lu_d, 64, UXD, do_float64_to_uint64)
+
+XFTINT(rm_w_d, float64, int32, uint64_t, uint32_t, float_round_down)
+XFTINT(rp_w_d, float64, int32, uint64_t, uint32_t, float_round_up)
+XFTINT(rz_w_d, float64, int32, uint64_t, uint32_t, float_round_to_zero)
+XFTINT(rne_w_d, float64, int32, uint64_t, uint32_t, float_round_nearest_even)
+
+#define XFTINT_W_D(NAME, FN) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (64 * 2); \
+ vec_clear_cause(env); \
+ for (i = 0; i < max; i++) { \
+ temp.XW(i + max) = FN(env, Xj->UXD(i)); \
+ temp.XW(i) = FN(env, Xk->UXD(i)); \
+ temp.XW(i + max * 3) = FN(env, Xj->UXD(i + max)); \
+ temp.XW(i + max * 2) = FN(env, Xk->UXD(i + max)); \
+ } \
+ *Xd = temp; \
+}
+
+XFTINT_W_D(xvftint_w_d, do_float64_to_int32)
+XFTINT_W_D(xvftintrm_w_d, do_xftintrm_w_d)
+XFTINT_W_D(xvftintrp_w_d, do_xftintrp_w_d)
+XFTINT_W_D(xvftintrz_w_d, do_xftintrz_w_d)
+XFTINT_W_D(xvftintrne_w_d, do_xftintrne_w_d)
+
+XFTINT(rml_l_s, float32, int64, uint32_t, uint64_t, float_round_down)
+XFTINT(rpl_l_s, float32, int64, uint32_t, uint64_t, float_round_up)
+XFTINT(rzl_l_s, float32, int64, uint32_t, uint64_t, float_round_to_zero)
+XFTINT(rnel_l_s, float32, int64, uint32_t, uint64_t, float_round_nearest_even)
+XFTINT(rmh_l_s, float32, int64, uint32_t, uint64_t, float_round_down)
+XFTINT(rph_l_s, float32, int64, uint32_t, uint64_t, float_round_up)
+XFTINT(rzh_l_s, float32, int64, uint32_t, uint64_t, float_round_to_zero)
+XFTINT(rneh_l_s, float32, int64, uint32_t, uint64_t, float_round_nearest_even)
+
+#define XFTINTL_L_S(NAME, FN) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (64 * 2); \
+ vec_clear_cause(env); \
+ for (i = 0; i < max; i++) { \
+ temp.XD(i) = FN(env, Xj->UXW(i)); \
+ temp.XD(i + max) = FN(env, Xj->UXW(i + max * 2)); \
+ } \
+ *Xd = temp; \
+}
+
+XFTINTL_L_S(xvftintl_l_s, do_float32_to_int64)
+XFTINTL_L_S(xvftintrml_l_s, do_xftintrml_l_s)
+XFTINTL_L_S(xvftintrpl_l_s, do_xftintrpl_l_s)
+XFTINTL_L_S(xvftintrzl_l_s, do_xftintrzl_l_s)
+XFTINTL_L_S(xvftintrnel_l_s, do_xftintrnel_l_s)
+
+#define XFTINTH_L_S(NAME, FN) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t xd, uint32_t xj) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ max = LASX_LEN / (64 * 2); \
+ vec_clear_cause(env); \
+ for (i = 0; i < max; i++) { \
+ temp.XD(i) = FN(env, Xj->UXW(i + max)); \
+ temp.XD(i + max) = FN(env, Xj->UXW(i + max * 3)); \
+ } \
+ *Xd = temp; \
+}
+
+XFTINTH_L_S(xvftinth_l_s, do_float32_to_int64)
+XFTINTH_L_S(xvftintrmh_l_s, do_xftintrmh_l_s)
+XFTINTH_L_S(xvftintrph_l_s, do_xftintrph_l_s)
+XFTINTH_L_S(xvftintrzh_l_s, do_xftintrzh_l_s)
+XFTINTH_L_S(xvftintrneh_l_s, do_xftintrneh_l_s)
+
+#define XFFINT(NAME, FMT1, FMT2, T1, T2) \
+static T2 do_xffint_ ## NAME(CPULoongArchState *env, T1 fj) \
+{ \
+ T2 fd; \
+ \
+ fd = FMT1 ##_to_## FMT2(fj, &env->fp_status); \
+ vec_update_fcsr0(env, GETPC()); \
+ return fd; \
+}
+
+XFFINT(s_w, int32, float32, int32_t, uint32_t)
+XFFINT(d_l, int64, float64, int64_t, uint64_t)
+XFFINT(s_wu, uint32, float32, uint32_t, uint32_t)
+XFFINT(d_lu, uint64, float64, uint64_t, uint64_t)
+
+XDO_2OP_F(xvffint_s_w, 32, XW, do_xffint_s_w)
+XDO_2OP_F(xvffint_d_l, 64, XD, do_xffint_d_l)
+XDO_2OP_F(xvffint_s_wu, 32, UXW, do_xffint_s_wu)
+XDO_2OP_F(xvffint_d_lu, 64, UXD, do_xffint_d_lu)
+
+void HELPER(xvffintl_d_w)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ max = LASX_LEN / (64 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.XD(i) = int32_to_float64(Xj->XW(i), &env->fp_status);
+ temp.XD(i + max) = int32_to_float64(Xj->XW(i + max * 2),
+ &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvffinth_d_w)(CPULoongArchState *env, uint32_t xd, uint32_t xj)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ max = LASX_LEN / (64 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.XD(i) = int32_to_float64(Xj->XW(i + max), &env->fp_status);
+ temp.XD(i + max) = int32_to_float64(Xj->XW(i + max * 3),
+ &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
+
+void HELPER(xvffint_s_l)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ int i, max;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ max = LASX_LEN / (64 * 2);
+ vec_clear_cause(env);
+ for (i = 0; i < max; i++) {
+ temp.XW(i + max) = int64_to_float32(Xj->XD(i), &env->fp_status);
+ temp.XW(i) = int64_to_float32(Xk->XD(i), &env->fp_status);
+ temp.XW(i + max * 3) = int64_to_float32(Xj->XD(i + max), &env->fp_status);
+ temp.XW(i + max * 2) = int64_to_float32(Xk->XD(i + max), &env->fp_status);
+ vec_update_fcsr0(env, GETPC());
+ }
+ *Xd = temp;
+}
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 38/46] target/loongarch: Implement xvseq xvsle xvslt
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (36 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 37/46] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 39/46] target/loongarch: Implement xvfcmp Song Gao
` (7 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSEQ[I].{B/H/W/D};
- XVSLE[I].{B/H/W/D}[U];
- XVSLT[I].{B/H/W/D/}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 43 ++++++
target/loongarch/helper.h | 23 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 154 +++++++++++++++++++
target/loongarch/insns.decode | 43 ++++++
target/loongarch/lasx_helper.c | 34 ++++
target/loongarch/lsx_helper.c | 4 -
target/loongarch/vec.h | 4 +
7 files changed, 301 insertions(+), 4 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 65eccc8598..5d3904402d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2341,6 +2341,49 @@ INSN_LASX(xvffintl_d_w, xx)
INSN_LASX(xvffinth_d_w, xx)
INSN_LASX(xvffint_s_l, xxx)
+INSN_LASX(xvseq_b, xxx)
+INSN_LASX(xvseq_h, xxx)
+INSN_LASX(xvseq_w, xxx)
+INSN_LASX(xvseq_d, xxx)
+INSN_LASX(xvseqi_b, xx_i)
+INSN_LASX(xvseqi_h, xx_i)
+INSN_LASX(xvseqi_w, xx_i)
+INSN_LASX(xvseqi_d, xx_i)
+
+INSN_LASX(xvsle_b, xxx)
+INSN_LASX(xvsle_h, xxx)
+INSN_LASX(xvsle_w, xxx)
+INSN_LASX(xvsle_d, xxx)
+INSN_LASX(xvslei_b, xx_i)
+INSN_LASX(xvslei_h, xx_i)
+INSN_LASX(xvslei_w, xx_i)
+INSN_LASX(xvslei_d, xx_i)
+INSN_LASX(xvsle_bu, xxx)
+INSN_LASX(xvsle_hu, xxx)
+INSN_LASX(xvsle_wu, xxx)
+INSN_LASX(xvsle_du, xxx)
+INSN_LASX(xvslei_bu, xx_i)
+INSN_LASX(xvslei_hu, xx_i)
+INSN_LASX(xvslei_wu, xx_i)
+INSN_LASX(xvslei_du, xx_i)
+
+INSN_LASX(xvslt_b, xxx)
+INSN_LASX(xvslt_h, xxx)
+INSN_LASX(xvslt_w, xxx)
+INSN_LASX(xvslt_d, xxx)
+INSN_LASX(xvslti_b, xx_i)
+INSN_LASX(xvslti_h, xx_i)
+INSN_LASX(xvslti_w, xx_i)
+INSN_LASX(xvslti_d, xx_i)
+INSN_LASX(xvslt_bu, xxx)
+INSN_LASX(xvslt_hu, xxx)
+INSN_LASX(xvslt_wu, xxx)
+INSN_LASX(xvslt_du, xxx)
+INSN_LASX(xvslti_bu, xx_i)
+INSN_LASX(xvslti_hu, xx_i)
+INSN_LASX(xvslti_wu, xx_i)
+INSN_LASX(xvslti_du, xx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index d30ea7f6a4..fbfd15d711 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1193,3 +1193,26 @@ DEF_HELPER_3(xvffint_d_lu, void, env, i32, i32)
DEF_HELPER_3(xvffintl_d_w, void, env, i32, i32)
DEF_HELPER_3(xvffinth_d_w, void, env, i32, i32)
DEF_HELPER_4(xvffint_s_l, void, env, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvseqi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvseqi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvseqi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvseqi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(xvslei_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslei_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(xvslti_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 998c07b358..cc1b4fd42a 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2520,6 +2520,160 @@ TRANS(xvffintl_d_w, gen_xx, gen_helper_xvffintl_d_w)
TRANS(xvffinth_d_w, gen_xx, gen_helper_xvffinth_d_w)
TRANS(xvffint_s_l, gen_xxx, gen_helper_xvffint_s_l)
+static bool do_xcmp(DisasContext *ctx, arg_xxx * a, MemOp mop, TCGCond cond)
+{
+ uint32_t xd_ofs, xj_ofs, xk_ofs;
+
+ CHECK_ASXE;
+
+ xd_ofs = vec_full_offset(a->xd);
+ xj_ofs = vec_full_offset(a->xj);
+ xk_ofs = vec_full_offset(a->xk);
+
+ tcg_gen_gvec_cmp(cond, mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
+ return true;
+}
+
+#define DO_XCMPI_S(NAME) \
+static bool do_x## NAME ##_s(DisasContext *ctx, arg_xx_i * a, MemOp mop) \
+{ \
+ uint32_t xd_ofs, xj_ofs; \
+ \
+ CHECK_ASXE; \
+ \
+ static const TCGOpcode vecop_list[] = { \
+ INDEX_op_cmp_vec, 0 \
+ }; \
+ static const GVecGen2i op[4] = { \
+ { \
+ .fniv = gen_## NAME ##_s_vec, \
+ .fnoi = gen_helper_x## NAME ##_b, \
+ .opt_opc = vecop_list, \
+ .vece = MO_8 \
+ }, \
+ { \
+ .fniv = gen_## NAME ##_s_vec, \
+ .fnoi = gen_helper_x## NAME ##_h, \
+ .opt_opc = vecop_list, \
+ .vece = MO_16 \
+ }, \
+ { \
+ .fniv = gen_## NAME ##_s_vec, \
+ .fnoi = gen_helper_x## NAME ##_w, \
+ .opt_opc = vecop_list, \
+ .vece = MO_32 \
+ }, \
+ { \
+ .fniv = gen_## NAME ##_s_vec, \
+ .fnoi = gen_helper_x## NAME ##_d, \
+ .opt_opc = vecop_list, \
+ .vece = MO_64 \
+ } \
+ }; \
+ \
+ xd_ofs = vec_full_offset(a->xd); \
+ xj_ofs = vec_full_offset(a->xj); \
+ \
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, 32, ctx->vl / 8, a->imm, &op[mop]); \
+ \
+ return true; \
+}
+
+DO_XCMPI_S(vseqi)
+DO_XCMPI_S(vslei)
+DO_XCMPI_S(vslti)
+
+#define DO_XCMPI_U(NAME) \
+static bool do_x## NAME ##_u(DisasContext *ctx, arg_xx_i * a, MemOp mop) \
+{ \
+ uint32_t xd_ofs, xj_ofs; \
+ \
+ CHECK_ASXE; \
+ \
+ static const TCGOpcode vecop_list[] = { \
+ INDEX_op_cmp_vec, 0 \
+ }; \
+ static const GVecGen2i op[4] = { \
+ { \
+ .fniv = gen_## NAME ##_u_vec, \
+ .fnoi = gen_helper_x## NAME ##_bu, \
+ .opt_opc = vecop_list, \
+ .vece = MO_8 \
+ }, \
+ { \
+ .fniv = gen_## NAME ##_u_vec, \
+ .fnoi = gen_helper_x## NAME ##_hu, \
+ .opt_opc = vecop_list, \
+ .vece = MO_16 \
+ }, \
+ { \
+ .fniv = gen_## NAME ##_u_vec, \
+ .fnoi = gen_helper_x## NAME ##_wu, \
+ .opt_opc = vecop_list, \
+ .vece = MO_32 \
+ }, \
+ { \
+ .fniv = gen_## NAME ##_u_vec, \
+ .fnoi = gen_helper_x## NAME ##_du, \
+ .opt_opc = vecop_list, \
+ .vece = MO_64 \
+ } \
+ }; \
+ \
+ xd_ofs = vec_full_offset(a->xd); \
+ xj_ofs = vec_full_offset(a->xj); \
+ \
+ tcg_gen_gvec_2i(xd_ofs, xj_ofs, 32, ctx->vl / 8, a->imm, &op[mop]); \
+ \
+ return true; \
+}
+
+DO_XCMPI_U(vslei)
+DO_XCMPI_U(vslti)
+
+TRANS(xvseq_b, do_xcmp, MO_8, TCG_COND_EQ)
+TRANS(xvseq_h, do_xcmp, MO_16, TCG_COND_EQ)
+TRANS(xvseq_w, do_xcmp, MO_32, TCG_COND_EQ)
+TRANS(xvseq_d, do_xcmp, MO_64, TCG_COND_EQ)
+TRANS(xvseqi_b, do_xvseqi_s, MO_8)
+TRANS(xvseqi_h, do_xvseqi_s, MO_16)
+TRANS(xvseqi_w, do_xvseqi_s, MO_32)
+TRANS(xvseqi_d, do_xvseqi_s, MO_64)
+
+TRANS(xvsle_b, do_xcmp, MO_8, TCG_COND_LE)
+TRANS(xvsle_h, do_xcmp, MO_16, TCG_COND_LE)
+TRANS(xvsle_w, do_xcmp, MO_32, TCG_COND_LE)
+TRANS(xvsle_d, do_xcmp, MO_64, TCG_COND_LE)
+TRANS(xvslei_b, do_xvslei_s, MO_8)
+TRANS(xvslei_h, do_xvslei_s, MO_16)
+TRANS(xvslei_w, do_xvslei_s, MO_32)
+TRANS(xvslei_d, do_xvslei_s, MO_64)
+TRANS(xvsle_bu, do_xcmp, MO_8, TCG_COND_LEU)
+TRANS(xvsle_hu, do_xcmp, MO_16, TCG_COND_LEU)
+TRANS(xvsle_wu, do_xcmp, MO_32, TCG_COND_LEU)
+TRANS(xvsle_du, do_xcmp, MO_64, TCG_COND_LEU)
+TRANS(xvslei_bu, do_xvslei_u, MO_8)
+TRANS(xvslei_hu, do_xvslei_u, MO_16)
+TRANS(xvslei_wu, do_xvslei_u, MO_32)
+TRANS(xvslei_du, do_xvslei_u, MO_64)
+
+TRANS(xvslt_b, do_xcmp, MO_8, TCG_COND_LT)
+TRANS(xvslt_h, do_xcmp, MO_16, TCG_COND_LT)
+TRANS(xvslt_w, do_xcmp, MO_32, TCG_COND_LT)
+TRANS(xvslt_d, do_xcmp, MO_64, TCG_COND_LT)
+TRANS(xvslti_b, do_xvslti_s, MO_8)
+TRANS(xvslti_h, do_xvslti_s, MO_16)
+TRANS(xvslti_w, do_xvslti_s, MO_32)
+TRANS(xvslti_d, do_xvslti_s, MO_64)
+TRANS(xvslt_bu, do_xcmp, MO_8, TCG_COND_LTU)
+TRANS(xvslt_hu, do_xcmp, MO_16, TCG_COND_LTU)
+TRANS(xvslt_wu, do_xcmp, MO_32, TCG_COND_LTU)
+TRANS(xvslt_du, do_xcmp, MO_64, TCG_COND_LTU)
+TRANS(xvslti_bu, do_xvslti_u, MO_8)
+TRANS(xvslti_hu, do_xvslti_u, MO_16)
+TRANS(xvslti_wu, do_xvslti_u, MO_32)
+TRANS(xvslti_du, do_xvslti_u, MO_64)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 59b79573e5..4e1f0b30a0 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1940,6 +1940,49 @@ xvffintl_d_w 0111 01101001 11100 00100 ..... ..... @xx
xvffinth_d_w 0111 01101001 11100 00101 ..... ..... @xx
xvffint_s_l 0111 01010100 10000 ..... ..... ..... @xxx
+xvseq_b 0111 01000000 00000 ..... ..... ..... @xxx
+xvseq_h 0111 01000000 00001 ..... ..... ..... @xxx
+xvseq_w 0111 01000000 00010 ..... ..... ..... @xxx
+xvseq_d 0111 01000000 00011 ..... ..... ..... @xxx
+xvseqi_b 0111 01101000 00000 ..... ..... ..... @xx_i5
+xvseqi_h 0111 01101000 00001 ..... ..... ..... @xx_i5
+xvseqi_w 0111 01101000 00010 ..... ..... ..... @xx_i5
+xvseqi_d 0111 01101000 00011 ..... ..... ..... @xx_i5
+
+xvsle_b 0111 01000000 00100 ..... ..... ..... @xxx
+xvsle_h 0111 01000000 00101 ..... ..... ..... @xxx
+xvsle_w 0111 01000000 00110 ..... ..... ..... @xxx
+xvsle_d 0111 01000000 00111 ..... ..... ..... @xxx
+xvslei_b 0111 01101000 00100 ..... ..... ..... @xx_i5
+xvslei_h 0111 01101000 00101 ..... ..... ..... @xx_i5
+xvslei_w 0111 01101000 00110 ..... ..... ..... @xx_i5
+xvslei_d 0111 01101000 00111 ..... ..... ..... @xx_i5
+xvsle_bu 0111 01000000 01000 ..... ..... ..... @xxx
+xvsle_hu 0111 01000000 01001 ..... ..... ..... @xxx
+xvsle_wu 0111 01000000 01010 ..... ..... ..... @xxx
+xvsle_du 0111 01000000 01011 ..... ..... ..... @xxx
+xvslei_bu 0111 01101000 01000 ..... ..... ..... @xx_ui5
+xvslei_hu 0111 01101000 01001 ..... ..... ..... @xx_ui5
+xvslei_wu 0111 01101000 01010 ..... ..... ..... @xx_ui5
+xvslei_du 0111 01101000 01011 ..... ..... ..... @xx_ui5
+
+xvslt_b 0111 01000000 01100 ..... ..... ..... @xxx
+xvslt_h 0111 01000000 01101 ..... ..... ..... @xxx
+xvslt_w 0111 01000000 01110 ..... ..... ..... @xxx
+xvslt_d 0111 01000000 01111 ..... ..... ..... @xxx
+xvslti_b 0111 01101000 01100 ..... ..... ..... @xx_i5
+xvslti_h 0111 01101000 01101 ..... ..... ..... @xx_i5
+xvslti_w 0111 01101000 01110 ..... ..... ..... @xx_i5
+xvslti_d 0111 01101000 01111 ..... ..... ..... @xx_i5
+xvslt_bu 0111 01000000 10000 ..... ..... ..... @xxx
+xvslt_hu 0111 01000000 10001 ..... ..... ..... @xxx
+xvslt_wu 0111 01000000 10010 ..... ..... ..... @xxx
+xvslt_du 0111 01000000 10011 ..... ..... ..... @xxx
+xvslti_bu 0111 01101000 10000 ..... ..... ..... @xx_ui5
+xvslti_hu 0111 01101000 10001 ..... ..... ..... @xx_ui5
+xvslti_wu 0111 01101000 10010 ..... ..... ..... @xx_ui5
+xvslti_du 0111 01101000 10011 ..... ..... ..... @xx_ui5
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 5cc917fdc3..d0bc02de72 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2723,3 +2723,37 @@ void HELPER(xvffint_s_l)(CPULoongArchState *env,
}
*Xd = temp;
}
+
+#define XVCMPI(NAME, BIT, E, DO_OP) \
+void HELPER(NAME)(void *xd, void *xj, uint64_t imm, uint32_t v) \
+{ \
+ int i; \
+ XReg *Xd = (XReg *)xd; \
+ XReg *Xj = (XReg *)xj; \
+ typedef __typeof(Xd->E(0)) TD; \
+ \
+ for (i = 0; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = DO_OP(Xj->E(i), (TD)imm); \
+ } \
+}
+
+XVCMPI(xvseqi_b, 8, XB, VSEQ)
+XVCMPI(xvseqi_h, 16, XH, VSEQ)
+XVCMPI(xvseqi_w, 32, XW, VSEQ)
+XVCMPI(xvseqi_d, 64, XD, VSEQ)
+XVCMPI(xvslei_b, 8, XB, VSLE)
+XVCMPI(xvslei_h, 16, XH, VSLE)
+XVCMPI(xvslei_w, 32, XW, VSLE)
+XVCMPI(xvslei_d, 64, XD, VSLE)
+XVCMPI(xvslei_bu, 8, UXB, VSLE)
+XVCMPI(xvslei_hu, 16, UXH, VSLE)
+XVCMPI(xvslei_wu, 32, UXW, VSLE)
+XVCMPI(xvslei_du, 64, UXD, VSLE)
+XVCMPI(xvslti_b, 8, XB, VSLT)
+XVCMPI(xvslti_h, 16, XH, VSLT)
+XVCMPI(xvslti_w, 32, XW, VSLT)
+XVCMPI(xvslti_d, 64, XD, VSLT)
+XVCMPI(xvslti_bu, 8, UXB, VSLT)
+XVCMPI(xvslti_hu, 16, UXH, VSLT)
+XVCMPI(xvslti_wu, 32, UXW, VSLT)
+XVCMPI(xvslti_du, 64, UXD, VSLT)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 446a1bdfe3..22d71cb39e 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2588,10 +2588,6 @@ void HELPER(vffint_s_l)(CPULoongArchState *env,
*Vd = temp;
}
-#define VSEQ(a, b) (a == b ? -1 : 0)
-#define VSLE(a, b) (a <= b ? -1 : 0)
-#define VSLT(a, b) (a < b ? -1 : 0)
-
#define VCMPI(NAME, BIT, E, DO_OP) \
void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
{ \
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 583997d576..54fd2689f3 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -90,6 +90,10 @@
#define DO_BITSET(a, bit) (a | 1ull << bit)
#define DO_BITREV(a, bit) (a ^ (1ull << bit))
+#define VSEQ(a, b) (a == b ? -1 : 0)
+#define VSLE(a, b) (a <= b ? -1 : 0)
+#define VSLT(a, b) (a < b ? -1 : 0)
+
uint64_t do_vmskltz_b(int64_t val);
uint64_t do_vmskltz_h(int64_t val);
uint64_t do_vmskltz_w(int64_t val);
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 39/46] target/loongarch: Implement xvfcmp
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (37 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 38/46] target/loongarch: Implement xvseq xvsle xvslt Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 40/46] target/loongarch: Implement xvbitsel xvset Song Gao
` (6 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVFCMP.cond.{S/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 94 ++++++++++++++++++++
target/loongarch/helper.h | 5 ++
target/loongarch/insn_trans/trans_lasx.c.inc | 32 +++++++
target/loongarch/insns.decode | 5 ++
target/loongarch/lasx_helper.c | 25 ++++++
target/loongarch/lsx_helper.c | 4 +-
target/loongarch/vec.h | 5 ++
7 files changed, 168 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5d3904402d..c3bcb9d84a 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2384,6 +2384,100 @@ INSN_LASX(xvslti_hu, xx_i)
INSN_LASX(xvslti_wu, xx_i)
INSN_LASX(xvslti_du, xx_i)
+#define output_xvfcmp(C, PREFIX, SUFFIX) \
+{ \
+ (C)->info->fprintf_func((C)->info->stream, "%08x %s%s\tx%d, x%d, x%d", \
+ (C)->insn, PREFIX, SUFFIX, a->xd, \
+ a->xj, a->xk); \
+}
+
+static bool output_xxx_fcond(DisasContext *ctx, arg_xxx_fcond * a,
+ const char *suffix)
+{
+ bool ret = true;
+ switch (a->fcond) {
+ case 0x0:
+ output_xvfcmp(ctx, "xvfcmp_caf_", suffix);
+ break;
+ case 0x1:
+ output_xvfcmp(ctx, "xvfcmp_saf_", suffix);
+ break;
+ case 0x2:
+ output_xvfcmp(ctx, "xvfcmp_clt_", suffix);
+ break;
+ case 0x3:
+ output_xvfcmp(ctx, "xvfcmp_slt_", suffix);
+ break;
+ case 0x4:
+ output_xvfcmp(ctx, "xvfcmp_ceq_", suffix);
+ break;
+ case 0x5:
+ output_xvfcmp(ctx, "xvfcmp_seq_", suffix);
+ break;
+ case 0x6:
+ output_xvfcmp(ctx, "xvfcmp_cle_", suffix);
+ break;
+ case 0x7:
+ output_xvfcmp(ctx, "xvfcmp_sle_", suffix);
+ break;
+ case 0x8:
+ output_xvfcmp(ctx, "xvfcmp_cun_", suffix);
+ break;
+ case 0x9:
+ output_xvfcmp(ctx, "xvfcmp_sun_", suffix);
+ break;
+ case 0xA:
+ output_xvfcmp(ctx, "xvfcmp_cult_", suffix);
+ break;
+ case 0xB:
+ output_xvfcmp(ctx, "xvfcmp_sult_", suffix);
+ break;
+ case 0xC:
+ output_xvfcmp(ctx, "xvfcmp_cueq_", suffix);
+ break;
+ case 0xD:
+ output_xvfcmp(ctx, "xvfcmp_sueq_", suffix);
+ break;
+ case 0xE:
+ output_xvfcmp(ctx, "xvfcmp_cule_", suffix);
+ break;
+ case 0xF:
+ output_xvfcmp(ctx, "xvfcmp_sule_", suffix);
+ break;
+ case 0x10:
+ output_xvfcmp(ctx, "xvfcmp_cne_", suffix);
+ break;
+ case 0x11:
+ output_xvfcmp(ctx, "xvfcmp_sne_", suffix);
+ break;
+ case 0x14:
+ output_xvfcmp(ctx, "xvfcmp_cor_", suffix);
+ break;
+ case 0x15:
+ output_xvfcmp(ctx, "xvfcmp_sor_", suffix);
+ break;
+ case 0x18:
+ output_xvfcmp(ctx, "xvfcmp_cune_", suffix);
+ break;
+ case 0x19:
+ output_xvfcmp(ctx, "xvfcmp_sune_", suffix);
+ break;
+ default:
+ ret = false;
+ }
+ return ret;
+}
+
+#define LASX_FCMP_INSN(suffix) \
+static bool trans_xvfcmp_cond_##suffix(DisasContext *ctx, \
+ arg_xxx_fcond * a) \
+{ \
+ return output_xxx_fcond(ctx, a, #suffix); \
+}
+
+LASX_FCMP_INSN(s)
+LASX_FCMP_INSN(d)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index fbfd15d711..665bcb812a 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1216,3 +1216,8 @@ DEF_HELPER_FLAGS_4(xvslti_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
DEF_HELPER_FLAGS_4(xvslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_5(xvfcmp_c_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfcmp_s_s, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfcmp_c_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_5(xvfcmp_s_d, void, env, i32, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index cc1b4fd42a..cdcd4a279a 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2674,6 +2674,38 @@ TRANS(xvslti_hu, do_xvslti_u, MO_16)
TRANS(xvslti_wu, do_xvslti_u, MO_32)
TRANS(xvslti_du, do_xvslti_u, MO_64)
+static bool trans_xvfcmp_cond_s(DisasContext *ctx, arg_xxx_fcond * a)
+{
+ uint32_t flags;
+ void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+ TCGv_i32 xk = tcg_constant_i32(a->xk);
+
+ CHECK_SXE;
+
+ fn = (a->fcond & 1 ? gen_helper_xvfcmp_s_s : gen_helper_xvfcmp_c_s);
+ flags = get_fcmp_flags(a->fcond >> 1);
+ fn(cpu_env, xd, xj, xk, tcg_constant_i32(flags));
+
+ return true;
+}
+
+static bool trans_xvfcmp_cond_d(DisasContext *ctx, arg_xxx_fcond *a)
+{
+ uint32_t flags;
+ void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+ TCGv_i32 xk = tcg_constant_i32(a->xk);
+
+ fn = (a->fcond & 1 ? gen_helper_xvfcmp_s_d : gen_helper_xvfcmp_c_d);
+ flags = get_fcmp_flags(a->fcond >> 1);
+ fn(cpu_env, xd, xj, xk, tcg_constant_i32(flags));
+
+ return true;
+}
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4e1f0b30a0..df45dc3d76 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1307,6 +1307,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&xx_i xd xj imm
&x_i xd imm
&xxxx xd xj xk xa
+&xxx_fcond xd xj xk fcond
#
# LASX Formats
@@ -1324,6 +1325,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx_ui7 .... ........ ... imm:7 xj:5 xd:5 &xx_i
@xx_ui8 .... ........ .. imm:8 xj:5 xd:5 &xx_i
@xxxx .... ........ xa:5 xk:5 xj:5 xd:5 &xxxx
+@xxx_fcond .... ........ fcond:5 xk:5 xj:5 xd:5 &xxx_fcond
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1983,6 +1985,9 @@ xvslti_hu 0111 01101000 10001 ..... ..... ..... @xx_ui5
xvslti_wu 0111 01101000 10010 ..... ..... ..... @xx_ui5
xvslti_du 0111 01101000 10011 ..... ..... ..... @xx_ui5
+xvfcmp_cond_s 0000 11001001 ..... ..... ..... ..... @xxx_fcond
+xvfcmp_cond_d 0000 11001010 ..... ..... ..... ..... @xxx_fcond
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index d0bc02de72..1d56fe7b22 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2757,3 +2757,28 @@ XVCMPI(xvslti_bu, 8, UXB, VSLT)
XVCMPI(xvslti_hu, 16, UXH, VSLT)
XVCMPI(xvslti_wu, 32, UXW, VSLT)
XVCMPI(xvslti_du, 64, UXD, VSLT)
+
+#define XVFCMP(NAME, BIT, E, FN) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk, uint32_t flags) \
+{ \
+ int i; \
+ XReg t; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ vec_clear_cause(env); \
+ for (i = 0; i < LASX_LEN / BIT ; i++) { \
+ FloatRelation cmp; \
+ cmp = FN(Xj->E(i), Xk->E(i), &env->fp_status); \
+ t.E(i) = vfcmp_common(env, cmp, flags); \
+ vec_update_fcsr0(env, GETPC()); \
+ } \
+ *Xd = t; \
+}
+
+XVFCMP(xvfcmp_c_s, 32, UXW, float32_compare_quiet)
+XVFCMP(xvfcmp_s_s, 32, UXW, float32_compare)
+XVFCMP(xvfcmp_c_d, 64, UXD, float64_compare_quiet)
+XVFCMP(xvfcmp_s_d, 64, UXD, float64_compare)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 22d71cb39e..4a5c1a47a1 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2622,8 +2622,8 @@ VCMPI(vslti_hu, 16, UH, VSLT)
VCMPI(vslti_wu, 32, UW, VSLT)
VCMPI(vslti_du, 64, UD, VSLT)
-static uint64_t vfcmp_common(CPULoongArchState *env,
- FloatRelation cmp, uint32_t flags)
+uint64_t vfcmp_common(CPULoongArchState *env,
+ FloatRelation cmp, uint32_t flags)
{
uint64_t ret = 0;
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 54fd2689f3..134dd265bf 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -8,6 +8,8 @@
#ifndef LOONGARCH_VEC_H
#define LOONGARCH_VEC_H
+#include "fpu/softfloat.h"
+
#if HOST_BIG_ENDIAN
#define B(x) B[15 - (x)]
#define H(x) H[7 - (x)]
@@ -113,4 +115,7 @@ uint64_t do_frecip_64(CPULoongArchState *env, uint64_t fj);
uint32_t do_frsqrt_32(CPULoongArchState *env, uint32_t fj);
uint64_t do_frsqrt_64(CPULoongArchState *env, uint64_t fj);
+uint64_t vfcmp_common(CPULoongArchState *env,
+ FloatRelation cmp, uint32_t flags);
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 40/46] target/loongarch: Implement xvbitsel xvset
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (38 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 39/46] target/loongarch: Implement xvfcmp Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 41/46] target/loongarch: Implement xvinsgr2vr xvpickve2gr Song Gao
` (5 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVBITSEL.V;
- XVBITSELI.B;
- XVSET{EQZ/NEZ}.V;
- XVSETANYEQZ.{B/H/W/D};
- XVSETALLNEZ.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 19 +++++
target/loongarch/helper.h | 11 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 76 ++++++++++++++++++++
target/loongarch/insns.decode | 17 +++++
target/loongarch/lasx_helper.c | 37 ++++++++++
target/loongarch/lsx_helper.c | 2 +-
target/loongarch/vec.h | 2 +
7 files changed, 163 insertions(+), 1 deletion(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c3bcb9d84a..5c2a81ee80 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1703,6 +1703,11 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
return true; \
}
+static void output_cx(DisasContext *ctx, arg_cx *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "fcc%d, x%d", a->cd, a->xj);
+}
+
static void output_x_i(DisasContext *ctx, arg_x_i *a, const char *mnemonic)
{
output(ctx, mnemonic, "x%d, 0x%x", a->xd, a->imm);
@@ -2478,6 +2483,20 @@ static bool trans_xvfcmp_cond_##suffix(DisasContext *ctx, \
LASX_FCMP_INSN(s)
LASX_FCMP_INSN(d)
+INSN_LASX(xvbitsel_v, xxxx)
+INSN_LASX(xvbitseli_b, xx_i)
+
+INSN_LASX(xvseteqz_v, cx)
+INSN_LASX(xvsetnez_v, cx)
+INSN_LASX(xvsetanyeqz_b, cx)
+INSN_LASX(xvsetanyeqz_h, cx)
+INSN_LASX(xvsetanyeqz_w, cx)
+INSN_LASX(xvsetanyeqz_d, cx)
+INSN_LASX(xvsetallnez_b, cx)
+INSN_LASX(xvsetallnez_h, cx)
+INSN_LASX(xvsetallnez_w, cx)
+INSN_LASX(xvsetallnez_d, cx)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 665bcb812a..f6d64bfde5 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1221,3 +1221,14 @@ DEF_HELPER_5(xvfcmp_c_s, void, env, i32, i32, i32, i32)
DEF_HELPER_5(xvfcmp_s_s, void, env, i32, i32, i32, i32)
DEF_HELPER_5(xvfcmp_c_d, void, env, i32, i32, i32, i32)
DEF_HELPER_5(xvfcmp_s_d, void, env, i32, i32, i32, i32)
+
+DEF_HELPER_FLAGS_4(xvbitseli_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_3(xvsetanyeqz_b, void, env, i32, i32)
+DEF_HELPER_3(xvsetanyeqz_h, void, env, i32, i32)
+DEF_HELPER_3(xvsetanyeqz_w, void, env, i32, i32)
+DEF_HELPER_3(xvsetanyeqz_d, void, env, i32, i32)
+DEF_HELPER_3(xvsetallnez_b, void, env, i32, i32)
+DEF_HELPER_3(xvsetallnez_h, void, env, i32, i32)
+DEF_HELPER_3(xvsetallnez_w, void, env, i32, i32)
+DEF_HELPER_3(xvsetallnez_d, void, env, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index cdcd4a279a..cefb6a4973 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -65,6 +65,17 @@ static bool gen_xx_i(DisasContext *ctx, arg_xx_i *a,
return true;
}
+static bool gen_cx(DisasContext *ctx, arg_cx *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+{
+ TCGv_i32 xj = tcg_constant_i32(a->xj);
+ TCGv_i32 cd = tcg_constant_i32(a->cd);
+
+ CHECK_ASXE;
+ func(cpu_env, cd, xj);
+ return true;
+}
+
static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
void (*func)(unsigned, uint32_t, uint32_t,
uint32_t, uint32_t, uint32_t))
@@ -2706,6 +2717,71 @@ static bool trans_xvfcmp_cond_d(DisasContext *ctx, arg_xxx_fcond *a)
return true;
}
+static bool trans_xvbitsel_v(DisasContext *ctx, arg_xxxx *a)
+{
+ CHECK_ASXE;
+
+ tcg_gen_gvec_bitsel(MO_64, vec_full_offset(a->xd), vec_full_offset(a->xa),
+ vec_full_offset(a->xk), vec_full_offset(a->xj),
+ 32, ctx->vl / 8);
+ return true;
+}
+
+static bool trans_xvbitseli_b(DisasContext *ctx, arg_xx_i *a)
+{
+ static const GVecGen2i op = {
+ .fniv = gen_vbitseli,
+ .fnoi = gen_helper_xvbitseli_b,
+ .vece = MO_8,
+ .load_dest = true
+ };
+
+ CHECK_ASXE;
+
+ tcg_gen_gvec_2i(vec_full_offset(a->xd), vec_full_offset(a->xj),
+ 32, ctx->vl / 8, a->imm, &op);
+ return true;
+}
+
+#define XVSET(NAME, COND) \
+static bool trans_## NAME(DisasContext *ctx, arg_cx * a) \
+{ \
+ TCGv_i64 t1, t2, d[4]; \
+ \
+ d[0] = tcg_temp_new_i64(); \
+ d[1] = tcg_temp_new_i64(); \
+ d[2] = tcg_temp_new_i64(); \
+ d[3] = tcg_temp_new_i64(); \
+ t1 = tcg_temp_new_i64(); \
+ t2 = tcg_temp_new_i64(); \
+ \
+ get_xreg64(d[0], a->xj, 0); \
+ get_xreg64(d[1], a->xj, 1); \
+ get_xreg64(d[2], a->xj, 2); \
+ get_xreg64(d[3], a->xj, 3); \
+ \
+ CHECK_ASXE; \
+ tcg_gen_or_i64(t1, d[0], d[1]); \
+ tcg_gen_or_i64(t2, d[2], d[3]); \
+ tcg_gen_or_i64(t1, t2, t1); \
+ tcg_gen_setcondi_i64(COND, t1, t1, 0); \
+ tcg_gen_st8_tl(t1, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 0x7])); \
+ \
+ return true; \
+}
+
+XVSET(xvseteqz_v, TCG_COND_EQ)
+XVSET(xvsetnez_v, TCG_COND_NE)
+
+TRANS(xvsetanyeqz_b, gen_cx, gen_helper_xvsetanyeqz_b)
+TRANS(xvsetanyeqz_h, gen_cx, gen_helper_xvsetanyeqz_h)
+TRANS(xvsetanyeqz_w, gen_cx, gen_helper_xvsetanyeqz_w)
+TRANS(xvsetanyeqz_d, gen_cx, gen_helper_xvsetanyeqz_d)
+TRANS(xvsetallnez_b, gen_cx, gen_helper_xvsetallnez_b)
+TRANS(xvsetallnez_h, gen_cx, gen_helper_xvsetallnez_h)
+TRANS(xvsetallnez_w, gen_cx, gen_helper_xvsetallnez_w)
+TRANS(xvsetallnez_d, gen_cx, gen_helper_xvsetallnez_d)
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index df45dc3d76..b696d99577 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1308,6 +1308,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&x_i xd imm
&xxxx xd xj xk xa
&xxx_fcond xd xj xk fcond
+&cx cd xj
#
# LASX Formats
@@ -1326,6 +1327,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xx_ui8 .... ........ .. imm:8 xj:5 xd:5 &xx_i
@xxxx .... ........ xa:5 xk:5 xj:5 xd:5 &xxxx
@xxx_fcond .... ........ fcond:5 xk:5 xj:5 xd:5 &xxx_fcond
+@cx .... ........ ..... ..... xj:5 .. cd:3 &cx
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -1988,6 +1990,21 @@ xvslti_du 0111 01101000 10011 ..... ..... ..... @xx_ui5
xvfcmp_cond_s 0000 11001001 ..... ..... ..... ..... @xxx_fcond
xvfcmp_cond_d 0000 11001010 ..... ..... ..... ..... @xxx_fcond
+xvbitsel_v 0000 11010010 ..... ..... ..... ..... @xxxx
+
+xvbitseli_b 0111 01111100 01 ........ ..... ..... @xx_ui8
+
+xvseteqz_v 0111 01101001 11001 00110 ..... 00 ... @cx
+xvsetnez_v 0111 01101001 11001 00111 ..... 00 ... @cx
+xvsetanyeqz_b 0111 01101001 11001 01000 ..... 00 ... @cx
+xvsetanyeqz_h 0111 01101001 11001 01001 ..... 00 ... @cx
+xvsetanyeqz_w 0111 01101001 11001 01010 ..... 00 ... @cx
+xvsetanyeqz_d 0111 01101001 11001 01011 ..... 00 ... @cx
+xvsetallnez_b 0111 01101001 11001 01100 ..... 00 ... @cx
+xvsetallnez_h 0111 01101001 11001 01101 ..... 00 ... @cx
+xvsetallnez_w 0111 01101001 11001 01110 ..... 00 ... @cx
+xvsetallnez_d 0111 01101001 11001 01111 ..... 00 ... @cx
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 1d56fe7b22..56dfe10a0d 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2782,3 +2782,40 @@ XVFCMP(xvfcmp_c_s, 32, UXW, float32_compare_quiet)
XVFCMP(xvfcmp_s_s, 32, UXW, float32_compare)
XVFCMP(xvfcmp_c_d, 64, UXD, float64_compare_quiet)
XVFCMP(xvfcmp_s_d, 64, UXD, float64_compare)
+
+void HELPER(xvbitseli_b)(void *xd, void *xj, uint64_t imm, uint32_t v)
+{
+ int i;
+ XReg *Xd = (XReg *)xd;
+ XReg *Xj = (XReg *)xj;
+
+ for (i = 0; i < LASX_LEN / 8; i++) {
+ Xd->XB(i) = (~Xd->XB(i) & Xj->XB(i)) | (Xd->XB(i) & imm);
+ }
+}
+
+#define XSETANYEQZ(NAME, MO) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t xj) \
+{ \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ env->cf[cd & 0x7] = do_match2(0, Xj->XD(0), Xj->XD(1), MO) || \
+ do_match2(0, Xj->XD(2), Xj->XD(3), MO); \
+}
+XSETANYEQZ(xvsetanyeqz_b, MO_8)
+XSETANYEQZ(xvsetanyeqz_h, MO_16)
+XSETANYEQZ(xvsetanyeqz_w, MO_32)
+XSETANYEQZ(xvsetanyeqz_d, MO_64)
+
+#define XSETALLNEZ(NAME, MO) \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t xj) \
+{ \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ env->cf[cd & 0x7] = !do_match2(0, Xj->XD(0), Xj->XD(1), MO) && \
+ !do_match2(0, Xj->XD(2), Xj->XD(3), MO); \
+}
+XSETALLNEZ(xvsetallnez_b, MO_8)
+XSETALLNEZ(xvsetallnez_h, MO_16)
+XSETALLNEZ(xvsetallnez_w, MO_32)
+XSETALLNEZ(xvsetallnez_d, MO_64)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 4a5c1a47a1..00c9835948 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2688,7 +2688,7 @@ void HELPER(vbitseli_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
}
/* Copy from target/arm/tcg/sve_helper.c */
-static inline bool do_match2(uint64_t n, uint64_t m0, uint64_t m1, int esz)
+bool do_match2(uint64_t n, uint64_t m0, uint64_t m1, int esz)
{
uint64_t bits = 8 << esz;
uint64_t ones = dup_const(esz, 1);
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 134dd265bf..cfac1c0e1c 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -118,4 +118,6 @@ uint64_t do_frsqrt_64(CPULoongArchState *env, uint64_t fj);
uint64_t vfcmp_common(CPULoongArchState *env,
FloatRelation cmp, uint32_t flags);
+bool do_match2(uint64_t n, uint64_t m0, uint64_t m1, int esz);
+
#endif /* LOONGARCH_VEC_H */
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 41/46] target/loongarch: Implement xvinsgr2vr xvpickve2gr
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (39 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 40/46] target/loongarch: Implement xvbitsel xvset Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 42/46] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v Song Gao
` (4 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVINSGR2VR.{W/D};
- XVPICKVE2GR.{W/D}[U].
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 17 ++++++
target/loongarch/insn_trans/trans_lasx.c.inc | 54 ++++++++++++++++++++
target/loongarch/insns.decode | 13 +++++
3 files changed, 84 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5c2a81ee80..fd7d459921 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1738,6 +1738,16 @@ static void output_xr(DisasContext *ctx, arg_xr *a, const char *mnemonic)
output(ctx, mnemonic, "x%d, r%d", a->xd, a->rj);
}
+static void output_xr_i(DisasContext *ctx, arg_xr_i *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, r%d, 0x%x", a->xd, a->rj, a->imm);
+}
+
+static void output_rx_i(DisasContext *ctx, arg_rx_i *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "r%d, x%d, 0x%x", a->rd, a->xj, a->imm);
+}
+
INSN_LASX(xvadd_b, xxx)
INSN_LASX(xvadd_h, xxx)
INSN_LASX(xvadd_w, xxx)
@@ -2497,6 +2507,13 @@ INSN_LASX(xvsetallnez_h, cx)
INSN_LASX(xvsetallnez_w, cx)
INSN_LASX(xvsetallnez_d, cx)
+INSN_LASX(xvinsgr2vr_w, xr_i)
+INSN_LASX(xvinsgr2vr_d, xr_i)
+INSN_LASX(xvpickve2gr_w, rx_i)
+INSN_LASX(xvpickve2gr_d, rx_i)
+INSN_LASX(xvpickve2gr_wu, rx_i)
+INSN_LASX(xvpickve2gr_du, rx_i)
+
INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index cefb6a4973..0fc26023d1 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2782,6 +2782,60 @@ TRANS(xvsetallnez_h, gen_cx, gen_helper_xvsetallnez_h)
TRANS(xvsetallnez_w, gen_cx, gen_helper_xvsetallnez_w)
TRANS(xvsetallnez_d, gen_cx, gen_helper_xvsetallnez_d)
+static bool trans_xvinsgr2vr_w(DisasContext *ctx, arg_xr_i *a)
+{
+ TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
+ CHECK_ASXE;
+ tcg_gen_st32_i64(src, cpu_env,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XW(a->imm)));
+ return true;
+}
+
+static bool trans_xvinsgr2vr_d(DisasContext *ctx, arg_xr_i *a)
+{
+ TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
+ CHECK_ASXE;
+ tcg_gen_st_i64(src, cpu_env,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XD(a->imm)));
+ return true;
+}
+
+static bool trans_xvpickve2gr_w(DisasContext *ctx, arg_rx_i *a)
+{
+ TCGv dst = gpr_dst(ctx, a->rd, EXT_NONE);
+ CHECK_ASXE;
+ tcg_gen_ld32s_i64(dst, cpu_env,
+ offsetof(CPULoongArchState, fpr[a->xj].xreg.XW(a->imm)));
+ return true;
+}
+
+static bool trans_xvpickve2gr_d(DisasContext *ctx, arg_rx_i *a)
+{
+ TCGv dst = gpr_dst(ctx, a->rd, EXT_NONE);
+ CHECK_ASXE;
+ tcg_gen_ld_i64(dst, cpu_env,
+ offsetof(CPULoongArchState, fpr[a->xj].xreg.XD(a->imm)));
+ return true;
+}
+
+static bool trans_xvpickve2gr_wu(DisasContext *ctx, arg_rx_i *a)
+{
+ TCGv dst = gpr_dst(ctx, a->rd, EXT_NONE);
+ CHECK_ASXE;
+ tcg_gen_ld32u_i64(dst, cpu_env,
+ offsetof(CPULoongArchState, fpr[a->xj].xreg.XW(a->imm)));
+ return true;
+}
+
+static bool trans_xvpickve2gr_du(DisasContext *ctx, arg_rx_i *a)
+{
+ TCGv dst = gpr_dst(ctx, a->rd, EXT_NONE);
+ CHECK_ASXE;
+ tcg_gen_ld_i64(dst, cpu_env,
+ offsetof(CPULoongArchState, fpr[a->xj].xreg.XD(a->imm)));
+ return true;
+}
+
static bool gvec_dupx(DisasContext *ctx, arg_xr *a, MemOp mop)
{
TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index b696d99577..8c87b3f840 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1309,6 +1309,8 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&xxxx xd xj xk xa
&xxx_fcond xd xj xk fcond
&cx cd xj
+&xr_i xd rj imm
+&rx_i rd xj imm
#
# LASX Formats
@@ -1328,6 +1330,10 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xxxx .... ........ xa:5 xk:5 xj:5 xd:5 &xxxx
@xxx_fcond .... ........ fcond:5 xk:5 xj:5 xd:5 &xxx_fcond
@cx .... ........ ..... ..... xj:5 .. cd:3 &cx
+@xr_ui3 .... ........ ..... .. imm:3 rj:5 xd:5 &xr_i
+@xr_ui2 .... ........ ..... ... imm:2 rj:5 xd:5 &xr_i
+@rx_ui3 .... ........ ..... .. imm:3 xj:5 rd:5 &rx_i
+@rx_ui2 .... ........ ..... ... imm:2 xj:5 rd:5 &rx_i
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -2005,6 +2011,13 @@ xvsetallnez_h 0111 01101001 11001 01101 ..... 00 ... @cx
xvsetallnez_w 0111 01101001 11001 01110 ..... 00 ... @cx
xvsetallnez_d 0111 01101001 11001 01111 ..... 00 ... @cx
+xvinsgr2vr_w 0111 01101110 10111 10 ... ..... ..... @xr_ui3
+xvinsgr2vr_d 0111 01101110 10111 110 .. ..... ..... @xr_ui2
+xvpickve2gr_w 0111 01101110 11111 10 ... ..... ..... @rx_ui3
+xvpickve2gr_d 0111 01101110 11111 110 .. ..... ..... @rx_ui2
+xvpickve2gr_wu 0111 01101111 00111 10 ... ..... ..... @rx_ui3
+xvpickve2gr_du 0111 01101111 00111 110 .. ..... ..... @rx_ui2
+
xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 42/46] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (40 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 41/46] target/loongarch: Implement xvinsgr2vr xvpickve2gr Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 43/46] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
` (3 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVREPLVE.{B/H/W/D};
- XVREPL128VEI.{B/H/W/D};
- XVREPLVE0.{B/H/W/D/Q};
- XVINSVE0.{W/D};
- XVPICKVE.{W/D};
- XVBSLL.V, XVBSRL.V.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 29 +++
target/loongarch/helper.h | 5 +
target/loongarch/insn_trans/trans_lasx.c.inc | 205 +++++++++++++++++++
target/loongarch/insns.decode | 29 +++
target/loongarch/lasx_helper.c | 29 +++
5 files changed, 297 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index fd7d459921..3b89a5df87 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1748,6 +1748,11 @@ static void output_rx_i(DisasContext *ctx, arg_rx_i *a, const char *mnemonic)
output(ctx, mnemonic, "r%d, x%d, 0x%x", a->rd, a->xj, a->imm);
}
+static void output_xxr(DisasContext *ctx, arg_xxr *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, x%d, r%d", a->xd, a->xj, a->rk);
+}
+
INSN_LASX(xvadd_b, xxx)
INSN_LASX(xvadd_h, xxx)
INSN_LASX(xvadd_w, xxx)
@@ -2518,3 +2523,27 @@ INSN_LASX(xvreplgr2vr_b, xr)
INSN_LASX(xvreplgr2vr_h, xr)
INSN_LASX(xvreplgr2vr_w, xr)
INSN_LASX(xvreplgr2vr_d, xr)
+
+INSN_LASX(xvreplve_b, xxr)
+INSN_LASX(xvreplve_h, xxr)
+INSN_LASX(xvreplve_w, xxr)
+INSN_LASX(xvreplve_d, xxr)
+INSN_LASX(xvrepl128vei_b, xx_i)
+INSN_LASX(xvrepl128vei_h, xx_i)
+INSN_LASX(xvrepl128vei_w, xx_i)
+INSN_LASX(xvrepl128vei_d, xx_i)
+
+INSN_LASX(xvreplve0_b, xx)
+INSN_LASX(xvreplve0_h, xx)
+INSN_LASX(xvreplve0_w, xx)
+INSN_LASX(xvreplve0_d, xx)
+INSN_LASX(xvreplve0_q, xx)
+
+INSN_LASX(xvinsve0_w, xx_i)
+INSN_LASX(xvinsve0_d, xx_i)
+
+INSN_LASX(xvpickve_w, xx_i)
+INSN_LASX(xvpickve_d, xx_i)
+
+INSN_LASX(xvbsll_v, xx_i)
+INSN_LASX(xvbsrl_v, xx_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index f6d64bfde5..6c4525a413 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1232,3 +1232,8 @@ DEF_HELPER_3(xvsetallnez_b, void, env, i32, i32)
DEF_HELPER_3(xvsetallnez_h, void, env, i32, i32)
DEF_HELPER_3(xvsetallnez_w, void, env, i32, i32)
DEF_HELPER_3(xvsetallnez_d, void, env, i32, i32)
+
+DEF_HELPER_4(xvinsve0_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvinsve0_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickve_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickve_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 0fc26023d1..e63b1c67c9 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -2851,3 +2851,208 @@ TRANS(xvreplgr2vr_b, gvec_dupx, MO_8)
TRANS(xvreplgr2vr_h, gvec_dupx, MO_16)
TRANS(xvreplgr2vr_w, gvec_dupx, MO_32)
TRANS(xvreplgr2vr_d, gvec_dupx, MO_64)
+
+static bool gen_xvreplve(DisasContext *ctx, arg_xxr *a, int vece, int bit,
+ void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long))
+{
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_ptr t1 = tcg_temp_new_ptr();
+ TCGv_i64 t2 = tcg_temp_new_i64();
+
+ CHECK_ASXE;
+
+ tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), (LSX_LEN / bit) - 1);
+ tcg_gen_shli_i64(t0, t0, vece);
+ if (HOST_BIG_ENDIAN) {
+ tcg_gen_xori_i64(t0, t0, vece << ((LSX_LEN / bit) - 1));
+ }
+
+ tcg_gen_trunc_i64_ptr(t1, t0);
+ tcg_gen_add_ptr(t1, t1, cpu_env);
+ func(t2, t1, vec_full_offset(a->xj));
+ tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->xd), 16, 16, t2);
+ func(t2, t1, offsetof(CPULoongArchState, fpr[a->xj].xreg.XQ(1)));
+ tcg_gen_gvec_dup_i64(vece,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XQ(1)),
+ 16, 16, t2);
+ return true;
+}
+
+TRANS(xvreplve_b, gen_xvreplve, MO_8, 8, tcg_gen_ld8u_i64)
+TRANS(xvreplve_h, gen_xvreplve, MO_16, 16, tcg_gen_ld16u_i64)
+TRANS(xvreplve_w, gen_xvreplve, MO_32, 32, tcg_gen_ld32u_i64)
+TRANS(xvreplve_d, gen_xvreplve, MO_64, 64, tcg_gen_ld_i64)
+
+static bool trans_xvrepl128vei_b(DisasContext *ctx, arg_xx_i * a)
+{
+ CHECK_ASXE;
+
+ tcg_gen_gvec_dup_mem(MO_8,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XB(0)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XB((a->imm))),
+ 16, 16);
+ tcg_gen_gvec_dup_mem(MO_8,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XB(16)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XB((a->imm + 16))),
+ 16, 16);
+ return true;
+}
+
+static bool trans_xvrepl128vei_h(DisasContext *ctx, arg_xx_i *a)
+{
+ CHECK_ASXE;
+
+ tcg_gen_gvec_dup_mem(MO_16,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XH(0)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XH((a->imm))),
+ 16, 16);
+ tcg_gen_gvec_dup_mem(MO_16,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XH(8)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XH((a->imm + 8))),
+ 16, 16);
+ return true;
+}
+
+static bool trans_xvrepl128vei_w(DisasContext *ctx, arg_xx_i *a)
+{
+ CHECK_ASXE;
+
+ tcg_gen_gvec_dup_mem(MO_32,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XW(0)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XW((a->imm))),
+ 16, 16);
+ tcg_gen_gvec_dup_mem(MO_32,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XW(4)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XW((a->imm + 4))),
+ 16, 16);
+ return true;
+}
+
+static bool trans_xvrepl128vei_d(DisasContext *ctx, arg_xx_i *a)
+{
+ CHECK_ASXE;
+
+ tcg_gen_gvec_dup_mem(MO_64,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XD(0)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XD((a->imm))),
+ 16, 16);
+ tcg_gen_gvec_dup_mem(MO_64,
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.XD(2)),
+ offsetof(CPULoongArchState,
+ fpr[a->xj].xreg.XD((a->imm + 2))),
+ 16, 16);
+ return true;
+}
+
+#define XVREPLVE0(NAME, MOP) \
+static bool trans_## NAME(DisasContext *ctx, arg_xx * a) \
+{ \
+ CHECK_ASXE; \
+ \
+ tcg_gen_gvec_dup_mem(MOP, vec_full_offset(a->xd), vec_full_offset(a->xj), \
+ 32, 32); \
+ return true; \
+}
+
+XVREPLVE0(xvreplve0_b, MO_8)
+XVREPLVE0(xvreplve0_h, MO_16)
+XVREPLVE0(xvreplve0_w, MO_32)
+XVREPLVE0(xvreplve0_d, MO_64)
+XVREPLVE0(xvreplve0_q, MO_128)
+
+TRANS(xvinsve0_w, gen_xx_i, gen_helper_xvinsve0_w)
+TRANS(xvinsve0_d, gen_xx_i, gen_helper_xvinsve0_d)
+
+TRANS(xvpickve_w, gen_xx_i, gen_helper_xvpickve_w)
+TRANS(xvpickve_d, gen_xx_i, gen_helper_xvpickve_d)
+
+static bool trans_xvbsll_v(DisasContext *ctx, arg_xx_i *a)
+{
+ int ofs;
+ TCGv_i64 desthigh[2], destlow[2], high[2], low[2];
+
+ CHECK_ASXE;
+
+ desthigh[0] = tcg_temp_new_i64();
+ desthigh[1] = tcg_temp_new_i64();
+ destlow[0] = tcg_temp_new_i64();
+ destlow[1] = tcg_temp_new_i64();
+ high[0] = tcg_temp_new_i64();
+ high[1] = tcg_temp_new_i64();
+ low[0] = tcg_temp_new_i64();
+ low[1] = tcg_temp_new_i64();
+
+ get_xreg64(low[0], a->xj, 0);
+ get_xreg64(low[1], a->xj, 2);
+
+ ofs = ((a->imm) & 0xf) * 8;
+ if (ofs < 64) {
+ get_xreg64(high[0], a->xj, 1);
+ get_xreg64(high[1], a->xj, 3);
+ tcg_gen_extract2_i64(desthigh[0], low[0], high[0], 64 - ofs);
+ tcg_gen_extract2_i64(desthigh[1], low[1], high[1], 64 - ofs);
+ tcg_gen_shli_i64(destlow[0], low[0], ofs);
+ tcg_gen_shli_i64(destlow[1], low[1], ofs);
+ } else {
+ tcg_gen_shli_i64(desthigh[0], low[0], ofs - 64);
+ tcg_gen_shli_i64(desthigh[1], low[1], ofs - 64);
+ destlow[0] = tcg_constant_i64(0);
+ destlow[1] = tcg_constant_i64(0);
+ }
+
+ set_xreg64(desthigh[0], a->xd, 1);
+ set_xreg64(destlow[0], a->xd, 0);
+ set_xreg64(desthigh[1], a->xd, 3);
+ set_xreg64(destlow[1], a->xd, 2);
+
+ return true;
+}
+
+static bool trans_xvbsrl_v(DisasContext *ctx, arg_xx_i *a)
+{
+ TCGv_i64 desthigh[2], destlow[2], high[2], low[2];
+ int ofs;
+
+ CHECK_ASXE;
+
+ desthigh[0] = tcg_temp_new_i64();
+ desthigh[1] = tcg_temp_new_i64();
+ destlow[0] = tcg_temp_new_i64();
+ destlow[1] = tcg_temp_new_i64();
+ high[0] = tcg_temp_new_i64();
+ high[1] = tcg_temp_new_i64();
+ low[0] = tcg_temp_new_i64();
+ low[1] = tcg_temp_new_i64();
+
+ get_xreg64(high[0], a->xj, 1);
+ get_xreg64(high[1], a->xj, 3);
+
+ ofs = ((a->imm) & 0xf) * 8;
+ if (ofs < 64) {
+ get_xreg64(low[0], a->xj, 0);
+ get_xreg64(low[1], a->xj, 2);
+ tcg_gen_extract2_i64(destlow[0], low[0], high[0], ofs);
+ tcg_gen_extract2_i64(destlow[1], low[1], high[1], ofs);
+ tcg_gen_shri_i64(desthigh[0], high[0], ofs);
+ tcg_gen_shri_i64(desthigh[1], high[1], ofs);
+ } else {
+ tcg_gen_shri_i64(destlow[0], high[0], ofs - 64);
+ tcg_gen_shri_i64(destlow[1], high[1], ofs - 64);
+ desthigh[0] = tcg_constant_i64(0);
+ desthigh[1] = tcg_constant_i64(0);
+ }
+
+ set_xreg64(desthigh[0], a->xd, 1);
+ set_xreg64(destlow[0], a->xd, 0);
+ set_xreg64(desthigh[1], a->xd, 3);
+ set_xreg64(destlow[1], a->xd, 2);
+
+ return true;
+}
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8c87b3f840..697087e6ef 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1311,6 +1311,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&cx cd xj
&xr_i xd rj imm
&rx_i rd xj imm
+&xxr xd xj rk
#
# LASX Formats
@@ -1321,6 +1322,8 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xxx .... ........ ..... xk:5 xj:5 xd:5 &xxx
@xr .... ........ ..... ..... rj:5 xd:5 &xr
@xx_i5 .... ........ ..... imm:s5 xj:5 xd:5 &xx_i
+@xx_ui1 .... ........ ..... .... imm:1 xj:5 xd:5 &xx_i
+@xx_ui2 .... ........ ..... ... imm:2 xj:5 xd:5 &xx_i
@xx_ui3 .... ........ ..... .. imm:3 xj:5 xd:5 &xx_i
@xx_ui4 .... ........ ..... . imm:4 xj:5 xd:5 &xx_i
@xx_ui5 .... ........ ..... imm:5 xj:5 xd:5 &xx_i
@@ -1334,6 +1337,7 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@xr_ui2 .... ........ ..... ... imm:2 rj:5 xd:5 &xr_i
@rx_ui3 .... ........ ..... .. imm:3 xj:5 rd:5 &rx_i
@rx_ui2 .... ........ ..... ... imm:2 xj:5 rd:5 &rx_i
+@xxr .... ........ ..... rk:5 xj:5 xd:5 &xxr
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -2022,3 +2026,28 @@ xvreplgr2vr_b 0111 01101001 11110 00000 ..... ..... @xr
xvreplgr2vr_h 0111 01101001 11110 00001 ..... ..... @xr
xvreplgr2vr_w 0111 01101001 11110 00010 ..... ..... @xr
xvreplgr2vr_d 0111 01101001 11110 00011 ..... ..... @xr
+
+xvreplve_b 0111 01010010 00100 ..... ..... ..... @xxr
+xvreplve_h 0111 01010010 00101 ..... ..... ..... @xxr
+xvreplve_w 0111 01010010 00110 ..... ..... ..... @xxr
+xvreplve_d 0111 01010010 00111 ..... ..... ..... @xxr
+
+xvrepl128vei_b 0111 01101111 01111 0 .... ..... ..... @xx_ui4
+xvrepl128vei_h 0111 01101111 01111 10 ... ..... ..... @xx_ui3
+xvrepl128vei_w 0111 01101111 01111 110 .. ..... ..... @xx_ui2
+xvrepl128vei_d 0111 01101111 01111 1110 . ..... ..... @xx_ui1
+
+xvreplve0_b 0111 01110000 01110 00000 ..... ..... @xx
+xvreplve0_h 0111 01110000 01111 00000 ..... ..... @xx
+xvreplve0_w 0111 01110000 01111 10000 ..... ..... @xx
+xvreplve0_d 0111 01110000 01111 11000 ..... ..... @xx
+xvreplve0_q 0111 01110000 01111 11100 ..... ..... @xx
+
+xvinsve0_w 0111 01101111 11111 10 ... ..... ..... @xx_ui3
+xvinsve0_d 0111 01101111 11111 110 .. ..... ..... @xx_ui2
+
+xvpickve_w 0111 01110000 00111 10 ... ..... ..... @xx_ui3
+xvpickve_d 0111 01110000 00111 110 .. ..... ..... @xx_ui2
+
+xvbsll_v 0111 01101000 11100 ..... ..... ..... @xx_ui5
+xvbsrl_v 0111 01101000 11101 ..... ..... ..... @xx_ui5
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 56dfe10a0d..4422c1292e 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2819,3 +2819,32 @@ XSETALLNEZ(xvsetallnez_b, MO_8)
XSETALLNEZ(xvsetallnez_h, MO_16)
XSETALLNEZ(xvsetallnez_w, MO_32)
XSETALLNEZ(xvsetallnez_d, MO_64)
+
+#define XVINSVE0(NAME, E, MASK) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ Xd->E(imm & MASK) = Xj->E(0); \
+}
+
+XVINSVE0(xvinsve0_w, XW, 0x7)
+XVINSVE0(xvinsve0_d, XD, 0x3)
+
+#define XVPICKVE(NAME, E, BIT, MASK) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ Xd->E(0) = Xj->E(imm & MASK); \
+ for (i = 1; i < LASX_LEN / BIT; i++) { \
+ Xd->E(i) = 0; \
+ } \
+}
+
+XVPICKVE(xvpickve_w, XW, 32, 0x7)
+XVPICKVE(xvpickve_d, XD, 64, 0x3)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 43/46] target/loongarch: Implement xvpack xvpick xvilv{l/h}
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (41 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 42/46] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 44/46] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins Song Gao
` (2 subsequent siblings)
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVPACK{EV/OD}.{B/H/W/D};
- XVPICK{EV/OD}.{B/H/W/D};
- XVILV{L/H}.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 27 ++++
target/loongarch/helper.h | 27 ++++
target/loongarch/insn_trans/trans_lasx.c.inc | 27 ++++
target/loongarch/insns.decode | 27 ++++
target/loongarch/lasx_helper.c | 144 +++++++++++++++++++
5 files changed, 252 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3b89a5df87..4b815c86b8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2547,3 +2547,30 @@ INSN_LASX(xvpickve_d, xx_i)
INSN_LASX(xvbsll_v, xx_i)
INSN_LASX(xvbsrl_v, xx_i)
+
+INSN_LASX(xvpackev_b, xxx)
+INSN_LASX(xvpackev_h, xxx)
+INSN_LASX(xvpackev_w, xxx)
+INSN_LASX(xvpackev_d, xxx)
+INSN_LASX(xvpackod_b, xxx)
+INSN_LASX(xvpackod_h, xxx)
+INSN_LASX(xvpackod_w, xxx)
+INSN_LASX(xvpackod_d, xxx)
+
+INSN_LASX(xvpickev_b, xxx)
+INSN_LASX(xvpickev_h, xxx)
+INSN_LASX(xvpickev_w, xxx)
+INSN_LASX(xvpickev_d, xxx)
+INSN_LASX(xvpickod_b, xxx)
+INSN_LASX(xvpickod_h, xxx)
+INSN_LASX(xvpickod_w, xxx)
+INSN_LASX(xvpickod_d, xxx)
+
+INSN_LASX(xvilvl_b, xxx)
+INSN_LASX(xvilvl_h, xxx)
+INSN_LASX(xvilvl_w, xxx)
+INSN_LASX(xvilvl_d, xxx)
+INSN_LASX(xvilvh_b, xxx)
+INSN_LASX(xvilvh_h, xxx)
+INSN_LASX(xvilvh_w, xxx)
+INSN_LASX(xvilvh_d, xxx)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 6c4525a413..dc5ab59f8e 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1237,3 +1237,30 @@ DEF_HELPER_4(xvinsve0_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvinsve0_d, void, env, i32, i32, i32)
DEF_HELPER_4(xvpickve_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvpickve_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvpackev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpackod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvpickev_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickev_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickev_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickev_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickod_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickod_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickod_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpickod_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvilvl_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvl_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvl_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvl_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvh_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvh_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvh_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvilvh_d, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index e63b1c67c9..75ac0ae1f1 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -3056,3 +3056,30 @@ static bool trans_xvbsrl_v(DisasContext *ctx, arg_xx_i *a)
return true;
}
+
+TRANS(xvpackev_b, gen_xxx, gen_helper_xvpackev_b)
+TRANS(xvpackev_h, gen_xxx, gen_helper_xvpackev_h)
+TRANS(xvpackev_w, gen_xxx, gen_helper_xvpackev_w)
+TRANS(xvpackev_d, gen_xxx, gen_helper_xvpackev_d)
+TRANS(xvpackod_b, gen_xxx, gen_helper_xvpackod_b)
+TRANS(xvpackod_h, gen_xxx, gen_helper_xvpackod_h)
+TRANS(xvpackod_w, gen_xxx, gen_helper_xvpackod_w)
+TRANS(xvpackod_d, gen_xxx, gen_helper_xvpackod_d)
+
+TRANS(xvpickev_b, gen_xxx, gen_helper_xvpickev_b)
+TRANS(xvpickev_h, gen_xxx, gen_helper_xvpickev_h)
+TRANS(xvpickev_w, gen_xxx, gen_helper_xvpickev_w)
+TRANS(xvpickev_d, gen_xxx, gen_helper_xvpickev_d)
+TRANS(xvpickod_b, gen_xxx, gen_helper_xvpickod_b)
+TRANS(xvpickod_h, gen_xxx, gen_helper_xvpickod_h)
+TRANS(xvpickod_w, gen_xxx, gen_helper_xvpickod_w)
+TRANS(xvpickod_d, gen_xxx, gen_helper_xvpickod_d)
+
+TRANS(xvilvl_b, gen_xxx, gen_helper_xvilvl_b)
+TRANS(xvilvl_h, gen_xxx, gen_helper_xvilvl_h)
+TRANS(xvilvl_w, gen_xxx, gen_helper_xvilvl_w)
+TRANS(xvilvl_d, gen_xxx, gen_helper_xvilvl_d)
+TRANS(xvilvh_b, gen_xxx, gen_helper_xvilvh_b)
+TRANS(xvilvh_h, gen_xxx, gen_helper_xvilvh_h)
+TRANS(xvilvh_w, gen_xxx, gen_helper_xvilvh_w)
+TRANS(xvilvh_d, gen_xxx, gen_helper_xvilvh_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 697087e6ef..5c3a18fbe2 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -2051,3 +2051,30 @@ xvpickve_d 0111 01110000 00111 110 .. ..... ..... @xx_ui2
xvbsll_v 0111 01101000 11100 ..... ..... ..... @xx_ui5
xvbsrl_v 0111 01101000 11101 ..... ..... ..... @xx_ui5
+
+xvpackev_b 0111 01010001 01100 ..... ..... ..... @xxx
+xvpackev_h 0111 01010001 01101 ..... ..... ..... @xxx
+xvpackev_w 0111 01010001 01110 ..... ..... ..... @xxx
+xvpackev_d 0111 01010001 01111 ..... ..... ..... @xxx
+xvpackod_b 0111 01010001 10000 ..... ..... ..... @xxx
+xvpackod_h 0111 01010001 10001 ..... ..... ..... @xxx
+xvpackod_w 0111 01010001 10010 ..... ..... ..... @xxx
+xvpackod_d 0111 01010001 10011 ..... ..... ..... @xxx
+
+xvpickev_b 0111 01010001 11100 ..... ..... ..... @xxx
+xvpickev_h 0111 01010001 11101 ..... ..... ..... @xxx
+xvpickev_w 0111 01010001 11110 ..... ..... ..... @xxx
+xvpickev_d 0111 01010001 11111 ..... ..... ..... @xxx
+xvpickod_b 0111 01010010 00000 ..... ..... ..... @xxx
+xvpickod_h 0111 01010010 00001 ..... ..... ..... @xxx
+xvpickod_w 0111 01010010 00010 ..... ..... ..... @xxx
+xvpickod_d 0111 01010010 00011 ..... ..... ..... @xxx
+
+xvilvl_b 0111 01010001 10100 ..... ..... ..... @xxx
+xvilvl_h 0111 01010001 10101 ..... ..... ..... @xxx
+xvilvl_w 0111 01010001 10110 ..... ..... ..... @xxx
+xvilvl_d 0111 01010001 10111 ..... ..... ..... @xxx
+xvilvh_b 0111 01010001 11000 ..... ..... ..... @xxx
+xvilvh_h 0111 01010001 11001 ..... ..... ..... @xxx
+xvilvh_w 0111 01010001 11010 ..... ..... ..... @xxx
+xvilvh_d 0111 01010001 11011 ..... ..... ..... @xxx
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 4422c1292e..50991998bf 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2848,3 +2848,147 @@ void HELPER(NAME)(CPULoongArchState *env, \
XVPICKVE(xvpickve_w, XW, 32, 0x7)
XVPICKVE(xvpickve_d, XD, 64, 0x3)
+
+#define XVPACKEV(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / (BIT * 2); i++) { \
+ temp.E(2 * i + 1) = Xj->E(2 * i); \
+ temp.E(2 * i) = Xk->E(2 * i); \
+ } \
+ *Xd = temp; \
+}
+
+XVPACKEV(xvpackev_b, 8, XB)
+XVPACKEV(xvpackev_h, 16, XH)
+XVPACKEV(xvpackev_w, 32, XW)
+XVPACKEV(xvpackev_d, 64, XD)
+
+#define XVPACKOD(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ for (i = 0; i < LASX_LEN / (BIT * 2); i++) { \
+ temp.E(2 * i + 1) = Xj->E(2 * i + 1); \
+ temp.E(2 * i) = Xk->E(2 * i + 1); \
+ } \
+ *Xd = temp; \
+}
+
+XVPACKOD(xvpackod_b, 8, XB)
+XVPACKOD(xvpackod_h, 16, XH)
+XVPACKOD(xvpackod_w, 32, XW)
+XVPACKOD(xvpackod_d, 64, XD)
+
+#define XVPICKEV(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 4); \
+ for (i = 0; i < max; i++) { \
+ temp.E(i + max) = Xj->E(2 * i); \
+ temp.E(i) = Xk->E(2 * i); \
+ temp.E(i + max * 3) = Xj->E(2 * i + max * 2); \
+ temp.E(i + max * 2) = Xk->E(2 * i + max * 2); \
+ } \
+ *Xd = temp; \
+}
+
+XVPICKEV(xvpickev_b, 8, XB)
+XVPICKEV(xvpickev_h, 16, XH)
+XVPICKEV(xvpickev_w, 32, XW)
+XVPICKEV(xvpickev_d, 64, XD)
+
+#define XVPICKOD(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 4); \
+ for (i = 0; i < max; i++) { \
+ temp.E(i + max) = Xj->E(2 * i + 1); \
+ temp.E(i) = Xk->E(2 * i + 1); \
+ temp.E(i + max * 3) = Xj->E(2 * i + 1 + max * 2); \
+ temp.E(i + max * 2) = Xk->E(2 * i + 1 + max * 2); \
+ } \
+ *Xd = temp; \
+}
+
+XVPICKOD(xvpickod_b, 8, XB)
+XVPICKOD(xvpickod_h, 16, XH)
+XVPICKOD(xvpickod_w, 32, XW)
+XVPICKOD(xvpickod_d, 64, XD)
+
+#define XVILVL(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 4); \
+ for (i = 0; i < max; i++) { \
+ temp.E(2 * i + 1) = Xj->E(i); \
+ temp.E(2 * i) = Xk->E(i); \
+ temp.E(2 * i + 1 + max * 2) = Xj->E(i + max * 2); \
+ temp.E(2 * i + max * 2) = Xk->E(i + max * 2); \
+ } \
+ *Xd = temp; \
+}
+
+XVILVL(xvilvl_b, 8, XB)
+XVILVL(xvilvl_h, 16, XH)
+XVILVL(xvilvl_w, 32, XW)
+XVILVL(xvilvl_d, 64, XD)
+
+#define XVILVH(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, max; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ max = LASX_LEN / (BIT * 4); \
+ for (i = 0; i < max; i++) { \
+ temp.E(2 * i + 1) = Xj->E(i + max); \
+ temp.E(2 * i) = Xk->E(i + max); \
+ temp.E(2 * i + 1 + max * 2) = Xj->E(i + max * 3); \
+ temp.E(2 * i + max * 2) = Xk->E(i + max * 3); \
+ } \
+ *Xd = temp; \
+}
+
+XVILVH(xvilvh_b, 8, XB)
+XVILVH(xvilvh_h, 16, XH)
+XVILVH(xvilvh_w, 32, XW)
+XVILVH(xvilvh_d, 64, XD)
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 44/46] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (42 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 43/46] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 45/46] target/loongarch: Implement xvld xvst Song Gao
2023-06-20 9:38 ` [PATCH v1 46/46] target/loongarch: CPUCFG support LASX Song Gao
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVSHUF.{B/H/W/D};
- XVPERM.W;
- XVSHUF4i.{B/H/W/D};
- XVPERMI.{W/D/Q};
- XVEXTRINS.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 21 +++
target/loongarch/helper.h | 21 +++
target/loongarch/insn_trans/trans_lasx.c.inc | 21 +++
target/loongarch/insns.decode | 21 +++
target/loongarch/lasx_helper.c | 168 +++++++++++++++++++
target/loongarch/lsx_helper.c | 3 +-
target/loongarch/vec.h | 2 +
7 files changed, 255 insertions(+), 2 deletions(-)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 4b815c86b8..9af1c95641 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2574,3 +2574,24 @@ INSN_LASX(xvilvh_b, xxx)
INSN_LASX(xvilvh_h, xxx)
INSN_LASX(xvilvh_w, xxx)
INSN_LASX(xvilvh_d, xxx)
+
+INSN_LASX(xvshuf_b, xxxx)
+INSN_LASX(xvshuf_h, xxx)
+INSN_LASX(xvshuf_w, xxx)
+INSN_LASX(xvshuf_d, xxx)
+
+INSN_LASX(xvperm_w, xxx)
+
+INSN_LASX(xvshuf4i_b, xx_i)
+INSN_LASX(xvshuf4i_h, xx_i)
+INSN_LASX(xvshuf4i_w, xx_i)
+INSN_LASX(xvshuf4i_d, xx_i)
+
+INSN_LASX(xvpermi_w, xx_i)
+INSN_LASX(xvpermi_d, xx_i)
+INSN_LASX(xvpermi_q, xx_i)
+
+INSN_LASX(xvextrins_d, xx_i)
+INSN_LASX(xvextrins_w, xx_i)
+INSN_LASX(xvextrins_h, xx_i)
+INSN_LASX(xvextrins_b, xx_i)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index dc5ab59f8e..1058a7de75 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1264,3 +1264,24 @@ DEF_HELPER_4(xvilvh_b, void, env, i32, i32, i32)
DEF_HELPER_4(xvilvh_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvilvh_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvilvh_d, void, env, i32, i32, i32)
+
+DEF_HELPER_5(xvshuf_b, void, env, i32, i32, i32, i32)
+DEF_HELPER_4(xvshuf_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvshuf_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvshuf_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvperm_w, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvshuf4i_b, void, env, i32, i32, i32)
+DEF_HELPER_4(xvshuf4i_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvshuf4i_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvshuf4i_d, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvpermi_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpermi_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvpermi_q, void, env, i32, i32, i32)
+
+DEF_HELPER_4(xvextrins_d, void, env, i32, i32, i32)
+DEF_HELPER_4(xvextrins_w, void, env, i32, i32, i32)
+DEF_HELPER_4(xvextrins_h, void, env, i32, i32, i32)
+DEF_HELPER_4(xvextrins_b, void, env, i32, i32, i32)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 75ac0ae1f1..1344f75113 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -3083,3 +3083,24 @@ TRANS(xvilvh_b, gen_xxx, gen_helper_xvilvh_b)
TRANS(xvilvh_h, gen_xxx, gen_helper_xvilvh_h)
TRANS(xvilvh_w, gen_xxx, gen_helper_xvilvh_w)
TRANS(xvilvh_d, gen_xxx, gen_helper_xvilvh_d)
+
+TRANS(xvshuf_b, gen_xxxx, gen_helper_xvshuf_b)
+TRANS(xvshuf_h, gen_xxx, gen_helper_xvshuf_h)
+TRANS(xvshuf_w, gen_xxx, gen_helper_xvshuf_w)
+TRANS(xvshuf_d, gen_xxx, gen_helper_xvshuf_d)
+
+TRANS(xvperm_w, gen_xxx, gen_helper_xvperm_w)
+
+TRANS(xvshuf4i_b, gen_xx_i, gen_helper_xvshuf4i_b)
+TRANS(xvshuf4i_h, gen_xx_i, gen_helper_xvshuf4i_h)
+TRANS(xvshuf4i_w, gen_xx_i, gen_helper_xvshuf4i_w)
+TRANS(xvshuf4i_d, gen_xx_i, gen_helper_xvshuf4i_d)
+
+TRANS(xvpermi_w, gen_xx_i, gen_helper_xvpermi_w)
+TRANS(xvpermi_d, gen_xx_i, gen_helper_xvpermi_d)
+TRANS(xvpermi_q, gen_xx_i, gen_helper_xvpermi_q)
+
+TRANS(xvextrins_b, gen_xx_i, gen_helper_xvextrins_b)
+TRANS(xvextrins_h, gen_xx_i, gen_helper_xvextrins_h)
+TRANS(xvextrins_w, gen_xx_i, gen_helper_xvextrins_w)
+TRANS(xvextrins_d, gen_xx_i, gen_helper_xvextrins_d)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 5c3a18fbe2..9c6a6037e9 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -2078,3 +2078,24 @@ xvilvh_b 0111 01010001 11000 ..... ..... ..... @xxx
xvilvh_h 0111 01010001 11001 ..... ..... ..... @xxx
xvilvh_w 0111 01010001 11010 ..... ..... ..... @xxx
xvilvh_d 0111 01010001 11011 ..... ..... ..... @xxx
+
+xvshuf_b 0000 11010110 ..... ..... ..... ..... @xxxx
+xvshuf_h 0111 01010111 10101 ..... ..... ..... @xxx
+xvshuf_w 0111 01010111 10110 ..... ..... ..... @xxx
+xvshuf_d 0111 01010111 10111 ..... ..... ..... @xxx
+
+xvperm_w 0111 01010111 11010 ..... ..... ..... @xxx
+
+xvshuf4i_b 0111 01111001 00 ........ ..... ..... @xx_ui8
+xvshuf4i_h 0111 01111001 01 ........ ..... ..... @xx_ui8
+xvshuf4i_w 0111 01111001 10 ........ ..... ..... @xx_ui8
+xvshuf4i_d 0111 01111001 11 ........ ..... ..... @xx_ui8
+
+xvpermi_w 0111 01111110 01 ........ ..... ..... @xx_ui8
+xvpermi_d 0111 01111110 10 ........ ..... ..... @xx_ui8
+xvpermi_q 0111 01111110 11 ........ ..... ..... @xx_ui8
+
+xvextrins_d 0111 01111000 00 ........ ..... ..... @xx_ui8
+xvextrins_w 0111 01111000 01 ........ ..... ..... @xx_ui8
+xvextrins_h 0111 01111000 10 ........ ..... ..... @xx_ui8
+xvextrins_b 0111 01111000 11 ........ ..... ..... @xx_ui8
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index 50991998bf..a0338dfa6d 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -2992,3 +2992,171 @@ XVILVH(xvilvh_b, 8, XB)
XVILVH(xvilvh_h, 16, XH)
XVILVH(xvilvh_w, 32, XW)
XVILVH(xvilvh_d, 64, XD)
+
+void HELPER(xvshuf_b)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk, uint32_t xa)
+{
+ int i, m;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+ XReg *Xa = &(env->fpr[xa].xreg);
+
+ m = LASX_LEN / (8 * 2);
+ for (i = 0; i < 2 * m ; i++) {
+ uint64_t k = (uint8_t)Xa->XB(i) % (2 * m);
+ if (i < m) {
+ temp.XB(i) = k < m ? Xk->XB(k) : Xj->XB(k - m);
+ } else {
+ temp.XB(i) = k < m ? Xk->XB(k + m) : Xj->XB(k);
+ }
+ }
+ *Xd = temp;
+}
+
+#define XVSHUF(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t xk) \
+{ \
+ int i, m; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ XReg *Xk = &(env->fpr[xk].xreg); \
+ \
+ m = LASX_LEN / (BIT * 2); \
+ for (i = 0; i < m * 2; i++) { \
+ uint64_t k = (uint8_t)Xd->E(i) % (2 * m); \
+ if (i < m) { \
+ temp.E(i) = k < m ? Xk->E(k) : Xj->E(k - m); \
+ } else { \
+ temp.E(i) = k < m ? Xk->E(k + m) : Xj->E(k); \
+ } \
+ } \
+ *Xd = temp; \
+}
+
+XVSHUF(xvshuf_h, 16, XH)
+XVSHUF(xvshuf_w, 32, XW)
+XVSHUF(xvshuf_d, 64, XD)
+
+void HELPER(xvperm_w)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t xk)
+{
+ int i, m;
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+ XReg *Xk = &(env->fpr[xk].xreg);
+
+ m = LASX_LEN / 32;
+ for (i = 0; i < m ; i++) {
+ uint64_t k = (uint8_t)Xk->XW(i) % 8;
+ temp.XW(i) = Xj->XW(k);
+ }
+ *Xd = temp;
+}
+
+#define XVSHUF4I(NAME, BIT, E) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int i, m; \
+ XReg temp; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ m = LASX_LEN / BIT; \
+ for (i = 0; i < m; i++) { \
+ if (i < (m / 2)) { \
+ temp.E(i) = Xj->E(SHF_POS(i, imm)); \
+ } else { \
+ temp.E(i) = Xj->E(SHF_POS(i - (m / 2), imm) + (m / 2)); \
+ } \
+ } \
+ *Xd = temp; \
+}
+
+XVSHUF4I(xvshuf4i_b, 8, XB)
+XVSHUF4I(xvshuf4i_h, 16, XH)
+XVSHUF4I(xvshuf4i_w, 32, XW)
+
+void HELPER(xvshuf4i_d)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ XReg temp;
+ temp.XD(0) = (imm & 2 ? Xj : Xd)->XD(imm & 1);
+ temp.XD(1) = (imm & 8 ? Xj : Xd)->XD((imm >> 2) & 1);
+ temp.XD(2) = (imm & 2 ? Xj : Xd)->XD((imm & 1) + 2);
+ temp.XD(3) = (imm & 8 ? Xj : Xd)->XD(((imm >> 2) & 1) + 2);
+ *Xd = temp;
+}
+
+void HELPER(xvpermi_w)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ temp.XW(0) = Xj->XW(imm & 0x3);
+ temp.XW(1) = Xj->XW((imm >> 2) & 0x3);
+ temp.XW(2) = Xd->XW((imm >> 4) & 0x3);
+ temp.XW(3) = Xd->XW((imm >> 6) & 0x3);
+ temp.XW(4) = Xj->XW((imm & 0x3) + 4);
+ temp.XW(5) = Xj->XW(((imm >> 2) & 0x3) + 4);
+ temp.XW(6) = Xd->XW(((imm >> 4) & 0x3) + 4);
+ temp.XW(7) = Xd->XW(((imm >> 6) & 0x3) + 4);
+ *Xd = temp;
+}
+
+void HELPER(xvpermi_d)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ temp.XD(0) = Xj->XD(imm & 0x3);
+ temp.XD(1) = Xj->XD((imm >> 2) & 0x3);
+ temp.XD(2) = Xj->XD((imm >> 4) & 0x3);
+ temp.XD(3) = Xj->XD((imm >> 6) & 0x3);
+ *Xd = temp;
+}
+
+void HELPER(xvpermi_q)(CPULoongArchState *env,
+ uint32_t xd, uint32_t xj, uint32_t imm)
+{
+ XReg temp;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ XReg *Xj = &(env->fpr[xj].xreg);
+
+ temp.XQ(0) = (imm & 0x3) > 1 ? Xd->XQ((imm & 0x3) - 2) : Xj->XQ(imm & 0x3);
+ temp.XQ(1) = ((imm >> 4) & 0x3) > 1 ? Xd->XQ(((imm >> 4) & 0x3) - 2) :
+ Xj->XQ((imm >> 4) & 0x3);
+ *Xd = temp;
+}
+
+#define XVEXTRINS(NAME, BIT, E, MASK) \
+void HELPER(NAME)(CPULoongArchState *env, \
+ uint32_t xd, uint32_t xj, uint32_t imm) \
+{ \
+ int ins, extr, m; \
+ XReg *Xd = &(env->fpr[xd].xreg); \
+ XReg *Xj = &(env->fpr[xj].xreg); \
+ \
+ m = LASX_LEN / (BIT * 2); \
+ ins = (imm >> 4) & MASK; \
+ extr = imm & MASK; \
+ Xd->E(ins) = Xj->E(extr); \
+ Xd->E(ins + m) = Xj->E(extr + m); \
+}
+
+XVEXTRINS(xvextrins_b, 8, XB, 0xf)
+XVEXTRINS(xvextrins_h, 16, XH, 0x7)
+XVEXTRINS(xvextrins_w, 32, XW, 0x3)
+XVEXTRINS(xvextrins_d, 64, XD, 0x1)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 00c9835948..c40e0d65ca 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -2909,8 +2909,7 @@ void HELPER(NAME)(CPULoongArchState *env, \
VReg *Vj = &(env->fpr[vj].vreg); \
\
for (i = 0; i < LSX_LEN/BIT; i++) { \
- temp.E(i) = Vj->E(((i) & 0xfc) + (((imm) >> \
- (2 * ((i) & 0x03))) & 0x03)); \
+ temp.E(i) = Vj->E(SHF_POS(i, imm)); \
} \
*Vd = temp; \
}
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index cfac1c0e1c..09d070a865 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -96,6 +96,8 @@
#define VSLE(a, b) (a <= b ? -1 : 0)
#define VSLT(a, b) (a < b ? -1 : 0)
+#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
+
uint64_t do_vmskltz_b(int64_t val);
uint64_t do_vmskltz_h(int64_t val);
uint64_t do_vmskltz_w(int64_t val);
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 45/46] target/loongarch: Implement xvld xvst
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (43 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 44/46] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins Song Gao
@ 2023-06-20 9:38 ` Song Gao
2023-06-20 9:38 ` [PATCH v1 46/46] target/loongarch: CPUCFG support LASX Song Gao
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
This patch includes:
- XVLD[X], XVST[X];
- XVLDREPL.{B/H/W/D};
- XVSTELM.{B/H/W/D}.
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/disas.c | 24 +++++
target/loongarch/helper.h | 3 +
target/loongarch/insn_trans/trans_lasx.c.inc | 97 ++++++++++++++++++++
target/loongarch/insns.decode | 25 +++++
target/loongarch/lasx_helper.c | 59 ++++++++++++
5 files changed, 208 insertions(+)
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 9af1c95641..4403669047 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1753,6 +1753,16 @@ static void output_xxr(DisasContext *ctx, arg_xxr *a, const char *mnemonic)
output(ctx, mnemonic, "x%d, x%d, r%d", a->xd, a->xj, a->rk);
}
+static void output_xrr(DisasContext *ctx, arg_xrr *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, r%d, r%d", a->xd, a->rj, a->rk);
+}
+
+static void output_xr_ii(DisasContext *ctx, arg_xr_ii *a, const char *mnemonic)
+{
+ output(ctx, mnemonic, "x%d, r%d, 0x%x, 0x%x", a->xd, a->rj, a->imm, a->imm2);
+}
+
INSN_LASX(xvadd_b, xxx)
INSN_LASX(xvadd_h, xxx)
INSN_LASX(xvadd_w, xxx)
@@ -2595,3 +2605,17 @@ INSN_LASX(xvextrins_d, xx_i)
INSN_LASX(xvextrins_w, xx_i)
INSN_LASX(xvextrins_h, xx_i)
INSN_LASX(xvextrins_b, xx_i)
+
+INSN_LASX(xvld, xr_i)
+INSN_LASX(xvst, xr_i)
+INSN_LASX(xvldx, xrr)
+INSN_LASX(xvstx, xrr)
+
+INSN_LASX(xvldrepl_d, xr_i)
+INSN_LASX(xvldrepl_w, xr_i)
+INSN_LASX(xvldrepl_h, xr_i)
+INSN_LASX(xvldrepl_b, xr_i)
+INSN_LASX(xvstelm_d, xr_ii)
+INSN_LASX(xvstelm_w, xr_ii)
+INSN_LASX(xvstelm_h, xr_ii)
+INSN_LASX(xvstelm_b, xr_ii)
diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 1058a7de75..adeb181407 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -1285,3 +1285,6 @@ DEF_HELPER_4(xvextrins_d, void, env, i32, i32, i32)
DEF_HELPER_4(xvextrins_w, void, env, i32, i32, i32)
DEF_HELPER_4(xvextrins_h, void, env, i32, i32, i32)
DEF_HELPER_4(xvextrins_b, void, env, i32, i32, i32)
+
+DEF_HELPER_3(xvld_b, void, env, i32, tl)
+DEF_HELPER_3(xvst_b, void, env, i32, tl)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 1344f75113..761f227c76 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -3104,3 +3104,100 @@ TRANS(xvextrins_b, gen_xx_i, gen_helper_xvextrins_b)
TRANS(xvextrins_h, gen_xx_i, gen_helper_xvextrins_h)
TRANS(xvextrins_w, gen_xx_i, gen_helper_xvextrins_w)
TRANS(xvextrins_d, gen_xx_i, gen_helper_xvextrins_d)
+
+static bool gen_lasx_memory(DisasContext *ctx, arg_xr_i * a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+ TCGv temp = NULL;
+
+ CHECK_ASXE;
+
+ if (a->imm) {
+ temp = tcg_temp_new();
+ tcg_gen_addi_tl(temp, addr, a->imm);
+ addr = temp;
+ }
+
+ func(cpu_env, xd, addr);
+ return true;
+}
+
+TRANS(xvld, gen_lasx_memory, gen_helper_xvld_b)
+TRANS(xvst, gen_lasx_memory, gen_helper_xvst_b)
+
+static bool gen_lasx_memoryx(DisasContext *ctx, arg_xrr *a,
+ void (*func)(TCGv_ptr, TCGv_i32, TCGv))
+{
+ TCGv_i32 xd = tcg_constant_i32(a->xd);
+ TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+ TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+ TCGv addr = tcg_temp_new();
+
+ CHECK_ASXE;
+
+ tcg_gen_add_tl(addr, src1, src2);
+ func(cpu_env, xd, addr);
+ return true;
+}
+
+TRANS(xvldx, gen_lasx_memoryx, gen_helper_xvld_b)
+TRANS(xvstx, gen_lasx_memoryx, gen_helper_xvst_b)
+
+#define XVLDREPL(NAME, MO) \
+static bool trans_## NAME(DisasContext *ctx, arg_xr_i * a) \
+{ \
+ TCGv addr, temp; \
+ TCGv_i64 val; \
+ \
+ CHECK_ASXE; \
+ \
+ addr = gpr_src(ctx, a->rj, EXT_NONE); \
+ val = tcg_temp_new_i64(); \
+ \
+ if (a->imm) { \
+ temp = tcg_temp_new(); \
+ tcg_gen_addi_tl(temp, addr, a->imm); \
+ addr = temp; \
+ } \
+ \
+ tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, MO); \
+ tcg_gen_gvec_dup_i64(MO, vec_full_offset(a->xd), 32, ctx->vl / 8, val); \
+ \
+ return true; \
+}
+
+XVLDREPL(xvldrepl_b, MO_8)
+XVLDREPL(xvldrepl_h, MO_16)
+XVLDREPL(xvldrepl_w, MO_32)
+XVLDREPL(xvldrepl_d, MO_64)
+
+#define XVSTELM(NAME, MO, E) \
+static bool trans_## NAME(DisasContext *ctx, arg_xr_ii * a) \
+{ \
+ TCGv addr, temp; \
+ TCGv_i64 val; \
+ \
+ CHECK_ASXE; \
+ \
+ addr = gpr_src(ctx, a->rj, EXT_NONE); \
+ val = tcg_temp_new_i64(); \
+ \
+ if (a->imm) { \
+ temp = tcg_temp_new(); \
+ tcg_gen_addi_tl(temp, addr, a->imm); \
+ addr = temp; \
+ } \
+ \
+ tcg_gen_ld_i64(val, cpu_env, \
+ offsetof(CPULoongArchState, fpr[a->xd].xreg.E(a->imm2))); \
+ tcg_gen_qemu_st_i64(val, addr, ctx->mem_idx, MO); \
+ \
+ return true; \
+}
+
+XVSTELM(xvstelm_b, MO_8, XB)
+XVSTELM(xvstelm_h, MO_16, XH)
+XVSTELM(xvstelm_w, MO_32, XW)
+XVSTELM(xvstelm_d, MO_64, XD)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 9c6a6037e9..b7940e4c23 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1312,6 +1312,8 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
&xr_i xd rj imm
&rx_i rd xj imm
&xxr xd xj rk
+&xrr xd rj rk
+&xr_ii xd rj imm imm2
#
# LASX Formats
@@ -1338,6 +1340,15 @@ vstelm_b 0011 000110 .... ........ ..... ..... @vr_i8i4
@rx_ui3 .... ........ ..... .. imm:3 xj:5 rd:5 &rx_i
@rx_ui2 .... ........ ..... ... imm:2 xj:5 rd:5 &rx_i
@xxr .... ........ ..... rk:5 xj:5 xd:5 &xxr
+@xr_i9 .... ........ . ......... rj:5 xd:5 &xr_i imm=%i9s3
+@xr_i10 .... ........ .......... rj:5 xd:5 &xr_i imm=%i10s2
+@xr_i11 .... ....... ........... rj:5 xd:5 &xr_i imm=%i11s1
+@xr_i12 .... ...... imm:s12 rj:5 xd:5 &xr_i
+@xr_i8i2 .... ........ imm2:2 ........ rj:5 xd:5 &xr_ii imm=%i8s3
+@xr_i8i3 .... ....... imm2:3 ........ rj:5 xd:5 &xr_ii imm=%i8s2
+@xr_i8i4 .... ...... imm2:4 ........ rj:5 xd:5 &xr_ii imm=%i8s1
+@xr_i8i5 .... ..... imm2:5 imm:s8 rj:5 xd:5 &xr_ii
+@xrr .... ........ ..... rk:5 rj:5 xd:5 &xrr
xvadd_b 0111 01000000 10100 ..... ..... ..... @xxx
xvadd_h 0111 01000000 10101 ..... ..... ..... @xxx
@@ -2099,3 +2110,17 @@ xvextrins_d 0111 01111000 00 ........ ..... ..... @xx_ui8
xvextrins_w 0111 01111000 01 ........ ..... ..... @xx_ui8
xvextrins_h 0111 01111000 10 ........ ..... ..... @xx_ui8
xvextrins_b 0111 01111000 11 ........ ..... ..... @xx_ui8
+
+xvld 0010 110010 ............ ..... ..... @xr_i12
+xvst 0010 110011 ............ ..... ..... @xr_i12
+xvldx 0011 10000100 10000 ..... ..... ..... @xrr
+xvstx 0011 10000100 11000 ..... ..... ..... @xrr
+
+xvldrepl_d 0011 00100001 0 ......... ..... ..... @xr_i9
+xvldrepl_w 0011 00100010 .......... ..... ..... @xr_i10
+xvldrepl_h 0011 0010010 ........... ..... ..... @xr_i11
+xvldrepl_b 0011 001010 ............ ..... ..... @xr_i12
+xvstelm_d 0011 00110001 .. ........ ..... ..... @xr_i8i2
+xvstelm_w 0011 0011001 ... ........ ..... ..... @xr_i8i3
+xvstelm_h 0011 001101 .... ........ ..... ..... @xr_i8i4
+xvstelm_b 0011 00111 ..... ........ ..... ..... @xr_i8i5
diff --git a/target/loongarch/lasx_helper.c b/target/loongarch/lasx_helper.c
index a0338dfa6d..16346f218c 100644
--- a/target/loongarch/lasx_helper.c
+++ b/target/loongarch/lasx_helper.c
@@ -12,6 +12,9 @@
#include "fpu/softfloat.h"
#include "internals.h"
#include "vec.h"
+#include "tcg/tcg.h"
+#include "exec/cpu_ldst.h"
+#include "tcg/tcg-ldst.h"
#define XDO_ODD_EVEN(NAME, BIT, E1, E2, DO_OP) \
void HELPER(NAME)(CPULoongArchState *env, \
@@ -3160,3 +3163,59 @@ XVEXTRINS(xvextrins_b, 8, XB, 0xf)
XVEXTRINS(xvextrins_h, 16, XH, 0x7)
XVEXTRINS(xvextrins_w, 32, XW, 0x3)
XVEXTRINS(xvextrins_d, 64, XD, 0x1)
+
+void helper_xvld_b(CPULoongArchState *env, uint32_t xd, target_ulong addr)
+{
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+#if !defined(CONFIG_USER_ONLY)
+ MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, cpu_mmu_index(env, false));
+
+ for (i = 0; i < LASX_LEN / 8; i++) {
+ Xd->XB(i) = helper_ldub_mmu(env, addr + i, oi, GETPC());
+ }
+#else
+ for (i = 0; i < LASX_LEN / 8; i++) {
+ Xd->XB(i) = cpu_ldub_data(env, addr + i);
+ }
+#endif
+}
+
+#define LASX_PAGESPAN(x) \
+ ((((x) & ~TARGET_PAGE_MASK) + (LASX_LEN / 8) - 1) >= TARGET_PAGE_SIZE)
+
+static inline void ensure_lasx_writable_pages(CPULoongArchState *env,
+ target_ulong addr,
+ int mmu_idx,
+ uintptr_t retaddr)
+{
+#ifndef CONFIG_USER_ONLY
+ /* FIXME: Probe the actual accesses (pass and use a size) */
+ if (unlikely(LASX_PAGESPAN(addr))) {
+ /* first page */
+ probe_write(env, addr, 0, mmu_idx, retaddr);
+ /* second page */
+ addr = (addr & TARGET_PAGE_MASK) + TARGET_PAGE_SIZE;
+ probe_write(env, addr, 0, mmu_idx, retaddr);
+ }
+#endif
+}
+
+void helper_xvst_b(CPULoongArchState *env, uint32_t xd, target_ulong addr)
+{
+ int i;
+ XReg *Xd = &(env->fpr[xd].xreg);
+ int mmu_idx = cpu_mmu_index(env, false);
+
+ ensure_lasx_writable_pages(env, addr, mmu_idx, GETPC());
+#if !defined(CONFIG_USER_ONLY)
+ MemOpIdx oi = make_memop_idx(MO_TE | MO_UNALN, mmu_idx);
+ for (i = 0; i < LASX_LEN / 8; i++) {
+ helper_stb_mmu(env, addr + i, Xd->XB(i), oi, GETPC());
+ }
+#else
+ for (i = 0; i < LASX_LEN / 8; i++) {
+ cpu_stb_data(env, addr + i, Xd->XB(i));
+ }
+#endif
+}
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* [PATCH v1 46/46] target/loongarch: CPUCFG support LASX
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
` (44 preceding siblings ...)
2023-06-20 9:38 ` [PATCH v1 45/46] target/loongarch: Implement xvld xvst Song Gao
@ 2023-06-20 9:38 ` Song Gao
45 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-20 9:38 UTC (permalink / raw)
To: qemu-devel; +Cc: richard.henderson
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
target/loongarch/cpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index c9f9cbb19d..aeccbb42e6 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -392,6 +392,7 @@ static void loongarch_la464_initfn(Object *obj)
data = FIELD_DP32(data, CPUCFG2, FP_DP, 1);
data = FIELD_DP32(data, CPUCFG2, FP_VER, 1);
data = FIELD_DP32(data, CPUCFG2, LSX, 1),
+ data = FIELD_DP32(data, CPUCFG2, LASX, 1),
data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
data = FIELD_DP32(data, CPUCFG2, LAM, 1);
--
2.39.1
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: [PATCH v1 01/46] target/loongarch: Add LASX data type XReg
2023-06-20 9:37 ` [PATCH v1 01/46] target/loongarch: Add LASX data type XReg Song Gao
@ 2023-06-20 12:09 ` Richard Henderson
2023-06-21 9:19 ` Song Gao
0 siblings, 1 reply; 54+ messages in thread
From: Richard Henderson @ 2023-06-20 12:09 UTC (permalink / raw)
To: Song Gao, qemu-devel
On 6/20/23 11:37, Song Gao wrote:
> diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
> index b23f38c3d5..347950b4d0 100644
> --- a/target/loongarch/cpu.h
> +++ b/target/loongarch/cpu.h
> @@ -259,9 +259,23 @@ typedef union VReg {
> Int128 Q[LSX_LEN / 128];
> }VReg;
>
> +#define LASX_LEN (256)
> +typedef union XReg {
> + int8_t XB[LASX_LEN / 8];
> + int16_t XH[LASX_LEN / 16];
> + int32_t XW[LASX_LEN / 32];
> + int64_t XD[LASX_LEN / 64];
> + uint8_t UXB[LASX_LEN / 8];
> + uint16_t UXH[LASX_LEN / 16];
> + uint32_t UXW[LASX_LEN / 32];
> + uint64_t UXD[LASX_LEN / 64];
> + Int128 XQ[LASX_LEN / 128];
> +} XReg;
This is following the example of target/i386, and I think it is a bad example.
For Arm, we have one ARMVectorReg which covers AdvSIMD (128-bit) and SVE (2048-bit).
I would prefer if you just expand the definition of VReg to be 256 bits.
r~
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v1 03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable
2023-06-20 9:37 ` [PATCH v1 03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
@ 2023-06-20 12:10 ` Richard Henderson
0 siblings, 0 replies; 54+ messages in thread
From: Richard Henderson @ 2023-06-20 12:10 UTC (permalink / raw)
To: Song Gao, qemu-devel
On 6/20/23 11:37, Song Gao wrote:
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
> target/loongarch/cpu.c | 2 ++
> target/loongarch/cpu.h | 2 ++
> target/loongarch/insn_trans/trans_lasx.c.inc | 10 ++++++++++
> 3 files changed, 14 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub
2023-06-20 9:37 ` [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub Song Gao
@ 2023-06-20 12:25 ` Richard Henderson
2023-06-21 9:19 ` Song Gao
0 siblings, 1 reply; 54+ messages in thread
From: Richard Henderson @ 2023-06-20 12:25 UTC (permalink / raw)
To: Song Gao, qemu-devel
On 6/20/23 11:37, Song Gao wrote:
> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
> + void (*func)(unsigned, uint32_t, uint32_t,
> + uint32_t, uint32_t, uint32_t))
> +{
> + uint32_t xd_ofs, xj_ofs, xk_ofs;
> +
> + CHECK_ASXE;
> +
> + xd_ofs = vec_full_offset(a->xd);
> + xj_ofs = vec_full_offset(a->xj);
> + xk_ofs = vec_full_offset(a->xk);
> +
> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
> + return true;
> +}
Comparing gvec_xxx vs gvec_vvv for LSX,
> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero to satisfy vl / 8.
I presume this is the intended behaviour of mixing LSX with LASX, that the high 128-bits
that are not considered by the LSX instruction are zeroed on write?
Which means that your macros from patch 1,
> +#if HOST_BIG_ENDIAN
...
> +#define XB(x) XB[31 - (x)]
> +#define XH(x) XH[15 - (x)]
are incorrect. We need big-endian within the Int128, but little-endian ordering of the
two Int128. This can be done with
#define XB(x) XB[(x) ^ 15]
#define XH(x) XH[(x) ^ 7]
etc.
It would be nice to share more code with trans_lsx.c, if possible.
r~
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v1 01/46] target/loongarch: Add LASX data type XReg
2023-06-20 12:09 ` Richard Henderson
@ 2023-06-21 9:19 ` Song Gao
0 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-21 9:19 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Hi, Richard
在 2023/6/20 下午8:09, Richard Henderson 写道:
> On 6/20/23 11:37, Song Gao wrote:
>> diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
>> index b23f38c3d5..347950b4d0 100644
>> --- a/target/loongarch/cpu.h
>> +++ b/target/loongarch/cpu.h
>> @@ -259,9 +259,23 @@ typedef union VReg {
>> Int128 Q[LSX_LEN / 128];
>> }VReg;
>> +#define LASX_LEN (256)
>> +typedef union XReg {
>> + int8_t XB[LASX_LEN / 8];
>> + int16_t XH[LASX_LEN / 16];
>> + int32_t XW[LASX_LEN / 32];
>> + int64_t XD[LASX_LEN / 64];
>> + uint8_t UXB[LASX_LEN / 8];
>> + uint16_t UXH[LASX_LEN / 16];
>> + uint32_t UXW[LASX_LEN / 32];
>> + uint64_t UXD[LASX_LEN / 64];
>> + Int128 XQ[LASX_LEN / 128];
>> +} XReg;
>
> This is following the example of target/i386, and I think it is a bad
> example.
>
[....]
> For Arm, we have one ARMVectorReg which covers AdvSIMD (128-bit) and
> SVE (2048-bit).
> I would prefer if you just expand the definition of VReg to be 256 bits.
>
Ok, I will correct it on v2.
Thanks.
Song Gao
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub
2023-06-20 12:25 ` Richard Henderson
@ 2023-06-21 9:19 ` Song Gao
2023-06-21 9:27 ` Richard Henderson
0 siblings, 1 reply; 54+ messages in thread
From: Song Gao @ 2023-06-21 9:19 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
在 2023/6/20 下午8:25, Richard Henderson 写道:
> On 6/20/23 11:37, Song Gao wrote:
>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
>> + void (*func)(unsigned, uint32_t, uint32_t,
>> + uint32_t, uint32_t, uint32_t))
>> +{
>> + uint32_t xd_ofs, xj_ofs, xk_ofs;
>> +
>> + CHECK_ASXE;
>> +
>> + xd_ofs = vec_full_offset(a->xd);
>> + xj_ofs = vec_full_offset(a->xj);
>> + xk_ofs = vec_full_offset(a->xk);
>> +
>> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
>> + return true;
>> +}
>
> Comparing gvec_xxx vs gvec_vvv for LSX,
>
>> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
>
> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero
> to satisfy vl / 8.
>
>
> I presume this is the intended behaviour of mixing LSX with LASX, that
> the high 128-bits that are not considered by the LSX instruction are
> zeroed on write?
>
Yes, the LSX instruction can ignore the high 128-bits.
> Which means that your macros from patch 1,
>
>> +#if HOST_BIG_ENDIAN
> ...
>> +#define XB(x) XB[31 - (x)]
>> +#define XH(x) XH[15 - (x)]
>
> are incorrect. We need big-endian within the Int128, but
> little-endian ordering of the two Int128. This can be done with
>
> #define XB(x) XB[(x) ^ 15]
> #define XH(x) XH[(x) ^ 7]
>
> etc.
>
Ok, I will correct it.
> It would be nice to share more code with trans_lsx.c, if possible.
>
Some functions can be merged, e.g gvec_vvv and gvec_xxx.
Many of the latter patches are similar to LSX. Maybe more code can be
merged.
Thanks.
Song Gao
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub
2023-06-21 9:19 ` Song Gao
@ 2023-06-21 9:27 ` Richard Henderson
2023-06-21 9:56 ` Song Gao
0 siblings, 1 reply; 54+ messages in thread
From: Richard Henderson @ 2023-06-21 9:27 UTC (permalink / raw)
To: Song Gao, qemu-devel
On 6/21/23 11:19, Song Gao wrote:
>
>
> 在 2023/6/20 下午8:25, Richard Henderson 写道:
>> On 6/20/23 11:37, Song Gao wrote:
>>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
>>> + void (*func)(unsigned, uint32_t, uint32_t,
>>> + uint32_t, uint32_t, uint32_t))
>>> +{
>>> + uint32_t xd_ofs, xj_ofs, xk_ofs;
>>> +
>>> + CHECK_ASXE;
>>> +
>>> + xd_ofs = vec_full_offset(a->xd);
>>> + xj_ofs = vec_full_offset(a->xj);
>>> + xk_ofs = vec_full_offset(a->xk);
>>> +
>>> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
>>> + return true;
>>> +}
>>
>> Comparing gvec_xxx vs gvec_vvv for LSX,
>>
>>> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
>>
>> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero to satisfy vl / 8.
>>
>>
>> I presume this is the intended behaviour of mixing LSX with LASX, that the high 128-bits
>> that are not considered by the LSX instruction are zeroed on write?
>>
> Yes, the LSX instruction can ignore the high 128-bits.
Ignore != write zeros on output. What is the behaviour?
r~
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub
2023-06-21 9:27 ` Richard Henderson
@ 2023-06-21 9:56 ` Song Gao
0 siblings, 0 replies; 54+ messages in thread
From: Song Gao @ 2023-06-21 9:56 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
在 2023/6/21 下午5:27, Richard Henderson 写道:
> On 6/21/23 11:19, Song Gao wrote:
>>
>>
>> 在 2023/6/20 下午8:25, Richard Henderson 写道:
>>> On 6/20/23 11:37, Song Gao wrote:
>>>> +static bool gvec_xxx(DisasContext *ctx, arg_xxx *a, MemOp mop,
>>>> + void (*func)(unsigned, uint32_t, uint32_t,
>>>> + uint32_t, uint32_t, uint32_t))
>>>> +{
>>>> + uint32_t xd_ofs, xj_ofs, xk_ofs;
>>>> +
>>>> + CHECK_ASXE;
>>>> +
>>>> + xd_ofs = vec_full_offset(a->xd);
>>>> + xj_ofs = vec_full_offset(a->xj);
>>>> + xk_ofs = vec_full_offset(a->xk);
>>>> +
>>>> + func(mop, xd_ofs, xj_ofs, xk_ofs, 32, ctx->vl / 8);
>>>> + return true;
>>>> +}
>>>
>>> Comparing gvec_xxx vs gvec_vvv for LSX,
>>>
>>>> func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
>>>
>>> gvec_vvv will write 16 bytes of output, followed by 16 bytes of zero
>>> to satisfy vl / 8.
>>>
>>>
>>> I presume this is the intended behaviour of mixing LSX with LASX,
>>> that the high 128-bits that are not considered by the LSX
>>> instruction are zeroed on write?
>>>
>> Yes, the LSX instruction can ignore the high 128-bits.
>
> Ignore != write zeros on output. What is the behaviour?
>
Unpredictable,
For more,
LSX:
LA64 fp instructiosn change fp registers value, the same num LSX
registers [127: 64] is unpredictable.
LASX:
LA64 fp instructions change fp_registers value, the same num LASX
registers[255: 64] is unpredictable.
LSX instructions change LSX registers value, the same num LASX
registers[255: 128] is Unpredictable.
Thanks.
Song Gao.
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2023-06-21 9:57 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20 9:37 [PATCH v1 00/46] Add LoongArch LASX instructions Song Gao
2023-06-20 9:37 ` [PATCH v1 01/46] target/loongarch: Add LASX data type XReg Song Gao
2023-06-20 12:09 ` Richard Henderson
2023-06-21 9:19 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 02/46] target/loongarch: meson.build support build LASX Song Gao
2023-06-20 9:37 ` [PATCH v1 03/46] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
2023-06-20 12:10 ` Richard Henderson
2023-06-20 9:37 ` [PATCH v1 04/46] target/loongarch: Implement xvadd/xvsub Song Gao
2023-06-20 12:25 ` Richard Henderson
2023-06-21 9:19 ` Song Gao
2023-06-21 9:27 ` Richard Henderson
2023-06-21 9:56 ` Song Gao
2023-06-20 9:37 ` [PATCH v1 05/46] target/loongarch: Implement xvreplgr2vr Song Gao
2023-06-20 9:37 ` [PATCH v1 06/46] target/loongarch: Implement xvaddi/xvsubi Song Gao
2023-06-20 9:37 ` [PATCH v1 07/46] target/loongarch: Implement xvneg Song Gao
2023-06-20 9:37 ` [PATCH v1 08/46] target/loongarch: Implement xvsadd/xvssub Song Gao
2023-06-20 9:37 ` [PATCH v1 09/46] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
2023-06-20 9:37 ` [PATCH v1 10/46] target/loongarch: Implement xvaddw/xvsubw Song Gao
2023-06-20 9:37 ` [PATCH v1 11/46] target/loongarch: Implement xavg/xvagr Song Gao
2023-06-20 9:37 ` [PATCH v1 12/46] target/loongarch: Implement xvabsd Song Gao
2023-06-20 9:37 ` [PATCH v1 13/46] target/loongarch: Implement xvadda Song Gao
2023-06-20 9:37 ` [PATCH v1 14/46] target/loongarch: Implement xvmax/xvmin Song Gao
2023-06-20 9:37 ` [PATCH v1 15/46] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
2023-06-20 9:37 ` [PATCH v1 16/46] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
2023-06-20 9:37 ` [PATCH v1 17/46] target/loongarch; Implement xvdiv/xvmod Song Gao
2023-06-20 9:37 ` [PATCH v1 18/46] target/loongarch: Implement xvsat Song Gao
2023-06-20 9:37 ` [PATCH v1 19/46] target/loongarch: Implement xvexth Song Gao
2023-06-20 9:37 ` [PATCH v1 20/46] target/loongarch: Implement vext2xv Song Gao
2023-06-20 9:37 ` [PATCH v1 21/46] target/loongarch: Implement xvsigncov Song Gao
2023-06-20 9:37 ` [PATCH v1 22/46] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
2023-06-20 9:37 ` [PATCH v1 23/46] target/loognarch: Implement xvldi Song Gao
2023-06-20 9:37 ` [PATCH v1 24/46] target/loongarch: Implement LASX logic instructions Song Gao
2023-06-20 9:37 ` [PATCH v1 25/46] target/loongarch: Implement xvsll xvsrl xvsra xvrotr Song Gao
2023-06-20 9:37 ` [PATCH v1 26/46] target/loongarch: Implement xvsllwil xvextl Song Gao
2023-06-20 9:37 ` [PATCH v1 27/46] target/loongarch: Implement xvsrlr xvsrar Song Gao
2023-06-20 9:37 ` [PATCH v1 28/46] target/loongarch: Implement xvsrln xvsran Song Gao
2023-06-20 9:37 ` [PATCH v1 29/46] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
2023-06-20 9:37 ` [PATCH v1 30/46] target/loongarch: Implement xvssrln xvssran Song Gao
2023-06-20 9:37 ` [PATCH v1 31/46] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
2023-06-20 9:38 ` [PATCH v1 32/46] target/loongarch: Implement xvclo xvclz Song Gao
2023-06-20 9:38 ` [PATCH v1 33/46] target/loongarch: Implement xvpcnt Song Gao
2023-06-20 9:38 ` [PATCH v1 34/46] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
2023-06-20 9:38 ` [PATCH v1 35/46] target/loongarch: Implement xvfrstp Song Gao
2023-06-20 9:38 ` [PATCH v1 36/46] target/loongarch: Implement LASX fpu arith instructions Song Gao
2023-06-20 9:38 ` [PATCH v1 37/46] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
2023-06-20 9:38 ` [PATCH v1 38/46] target/loongarch: Implement xvseq xvsle xvslt Song Gao
2023-06-20 9:38 ` [PATCH v1 39/46] target/loongarch: Implement xvfcmp Song Gao
2023-06-20 9:38 ` [PATCH v1 40/46] target/loongarch: Implement xvbitsel xvset Song Gao
2023-06-20 9:38 ` [PATCH v1 41/46] target/loongarch: Implement xvinsgr2vr xvpickve2gr Song Gao
2023-06-20 9:38 ` [PATCH v1 42/46] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v Song Gao
2023-06-20 9:38 ` [PATCH v1 43/46] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
2023-06-20 9:38 ` [PATCH v1 44/46] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins Song Gao
2023-06-20 9:38 ` [PATCH v1 45/46] target/loongarch: Implement xvld xvst Song Gao
2023-06-20 9:38 ` [PATCH v1 46/46] target/loongarch: CPUCFG support LASX Song Gao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).