[PATCH v4 00/48] Add LoongArch LASX instructions

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 00/48] Add LoongArch LASX instructions
@ 2023-08-30  8:48 Song Gao
  2023-08-30  8:48 ` [PATCH v4 01/48] target/loongarch: Add LASX data support Song Gao
                   ` (47 more replies)
  0 siblings, 48 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Hi,

This series adds LoongArch LASX instructions.

About test:
We use RISU test the LoongArch LASX instructions.

QEMU:
    https://github.com/loongson/qemu/tree/tcg-old-abi-support-lasx
RISU:
    https://github.com/loongson/risu/tree/loongarch-suport-lasx

Please review, Thanks.

Changes for v4:
- Rebase;
- Add avail_LASX to check.

Changes for v3:
- Add a new patch 9, rename lsx_helper.c to vec_helper.c,
  and use gen_helper_gvec_* series functions;
- Use i < oprsz / (BIT / 8) in loop;
- Some helper functions use loop;
- patch 46: use tcg_gen_qemu_ld/st_i64 for xvld/xvst{x};
- R-b.

Changes for v2:
- Expand the definition of VReg to be 256 bits.
- Use more LSX functions.
- R-b.

Song Gao (48):
  target/loongarch: Add LASX data support
  target/loongarch: meson.build support build LASX
  target/loongarch: Add CHECK_ASXE maccro for check LASX enable
  target/loongarch: Add avail_LASX to check LASX instructions
  target/loongarch: Implement xvadd/xvsub
  target/loongarch: Implement xvreplgr2vr
  target/loongarch: Implement xvaddi/xvsubi
  target/loongarch: Implement xvneg
  target/loongarch: Implement xvsadd/xvssub
  target/loongarch: rename lsx_helper.c to vec_helper.c
  target/loongarch: Implement xvhaddw/xvhsubw
  target/loongarch: Implement xvaddw/xvsubw
  target/loongarch: Implement xavg/xvagr
  target/loongarch: Implement xvabsd
  target/loongarch: Implement xvadda
  target/loongarch: Implement xvmax/xvmin
  target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od}
  target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}
  target/loongarch; Implement xvdiv/xvmod
  target/loongarch: Implement xvsat
  target/loongarch: Implement xvexth
  target/loongarch: Implement vext2xv
  target/loongarch: Implement xvsigncov
  target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
  target/loognarch: Implement xvldi
  target/loongarch: Implement LASX logic instructions
  target/loongarch: Implement xvsll xvsrl xvsra xvrotr
  target/loongarch: Implement xvsllwil xvextl
  target/loongarch: Implement xvsrlr xvsrar
  target/loongarch: Implement xvsrln xvsran
  target/loongarch: Implement xvsrlrn xvsrarn
  target/loongarch: Implement xvssrln xvssran
  target/loongarch: Implement xvssrlrn xvssrarn
  target/loongarch: Implement xvclo xvclz
  target/loongarch: Implement xvpcnt
  target/loongarch: Implement xvbitclr xvbitset xvbitrev
  target/loongarch: Implement xvfrstp
  target/loongarch: Implement LASX fpu arith instructions
  target/loongarch: Implement LASX fpu fcvt instructions
  target/loongarch: Implement xvseq xvsle xvslt
  target/loongarch: Implement xvfcmp
  target/loongarch: Implement xvbitsel xvset
  target/loongarch: Implement xvinsgr2vr xvpickve2gr
  target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v
  target/loongarch: Implement xvpack xvpick xvilv{l/h}
  target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins
  target/loongarch: Implement xvld xvst
  target/loongarch: CPUCFG support LASX

 target/loongarch/cpu.h                       |   26 +-
 target/loongarch/helper.h                    |  689 ++--
 target/loongarch/internals.h                 |   22 -
 target/loongarch/translate.h                 |    1 +
 target/loongarch/vec.h                       |   98 +
 target/loongarch/insns.decode                |  782 ++++
 linux-user/loongarch64/signal.c              |    1 +
 target/loongarch/cpu.c                       |    4 +
 target/loongarch/disas.c                     |  925 +++++
 target/loongarch/gdbstub.c                   |    1 +
 target/loongarch/lsx_helper.c                | 3004 ---------------
 target/loongarch/machine.c                   |   36 +-
 target/loongarch/translate.c                 |    5 +
 target/loongarch/vec_helper.c                | 3431 ++++++++++++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 1054 ++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 2060 ++++++-----
 target/loongarch/meson.build                 |    2 +-
 17 files changed, 7766 insertions(+), 4375 deletions(-)
 create mode 100644 target/loongarch/vec.h
 delete mode 100644 target/loongarch/lsx_helper.c
 create mode 100644 target/loongarch/vec_helper.c
 create mode 100644 target/loongarch/insn_trans/trans_lasx.c.inc

-- 
2.39.1



^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 01/48] target/loongarch: Add LASX data support
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 02/48] target/loongarch: meson.build support build LASX Song Gao
                   ` (46 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/cpu.h          | 24 ++++++++++++----------
 target/loongarch/internals.h    | 22 --------------------
 target/loongarch/vec.h          | 33 ++++++++++++++++++++++++++++++
 linux-user/loongarch64/signal.c |  1 +
 target/loongarch/cpu.c          |  1 +
 target/loongarch/gdbstub.c      |  1 +
 target/loongarch/lsx_helper.c   |  1 +
 target/loongarch/machine.c      | 36 ++++++++++++++++++++++++++++++++-
 8 files changed, 85 insertions(+), 34 deletions(-)
 create mode 100644 target/loongarch/vec.h

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 4d7201995a..347ad1c8a9 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -251,18 +251,20 @@ FIELD(TLB_MISC, ASID, 1, 10)
 FIELD(TLB_MISC, VPPN, 13, 35)
 FIELD(TLB_MISC, PS, 48, 6)
 
-#define LSX_LEN   (128)
+#define LSX_LEN    (128)
+#define LASX_LEN   (256)
+
 typedef union VReg {
-    int8_t   B[LSX_LEN / 8];
-    int16_t  H[LSX_LEN / 16];
-    int32_t  W[LSX_LEN / 32];
-    int64_t  D[LSX_LEN / 64];
-    uint8_t  UB[LSX_LEN / 8];
-    uint16_t UH[LSX_LEN / 16];
-    uint32_t UW[LSX_LEN / 32];
-    uint64_t UD[LSX_LEN / 64];
-    Int128   Q[LSX_LEN / 128];
-}VReg;
+    int8_t   B[LASX_LEN / 8];
+    int16_t  H[LASX_LEN / 16];
+    int32_t  W[LASX_LEN / 32];
+    int64_t  D[LASX_LEN / 64];
+    uint8_t  UB[LASX_LEN / 8];
+    uint16_t UH[LASX_LEN / 16];
+    uint32_t UW[LASX_LEN / 32];
+    uint64_t UD[LASX_LEN / 64];
+    Int128   Q[LASX_LEN / 128];
+} VReg;
 
 typedef union fpr_t fpr_t;
 union fpr_t {
diff --git a/target/loongarch/internals.h b/target/loongarch/internals.h
index 7b0f29c942..c492863cc5 100644
--- a/target/loongarch/internals.h
+++ b/target/loongarch/internals.h
@@ -21,28 +21,6 @@
 /* Global bit for huge page */
 #define LOONGARCH_HGLOBAL_SHIFT     12
 
-#if  HOST_BIG_ENDIAN
-#define B(x)  B[15 - (x)]
-#define H(x)  H[7 - (x)]
-#define W(x)  W[3 - (x)]
-#define D(x)  D[1 - (x)]
-#define UB(x) UB[15 - (x)]
-#define UH(x) UH[7 - (x)]
-#define UW(x) UW[3 - (x)]
-#define UD(x) UD[1 -(x)]
-#define Q(x)  Q[x]
-#else
-#define B(x)  B[x]
-#define H(x)  H[x]
-#define W(x)  W[x]
-#define D(x)  D[x]
-#define UB(x) UB[x]
-#define UH(x) UH[x]
-#define UW(x) UW[x]
-#define UD(x) UD[x]
-#define Q(x)  Q[x]
-#endif
-
 void loongarch_translate_init(void);
 
 void loongarch_cpu_dump_state(CPUState *cpu, FILE *f, int flags);
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
new file mode 100644
index 0000000000..2f23cae7d7
--- /dev/null
+++ b/target/loongarch/vec.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * QEMU LoongArch vector utilitites
+ *
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
+#ifndef LOONGARCH_VEC_H
+#define LOONGARCH_VEC_H
+
+#if HOST_BIG_ENDIAN
+#define B(x)  B[(x) ^ 15]
+#define H(x)  H[(x) ^ 7]
+#define W(x)  W[(x) ^ 3]
+#define D(x)  D[(x) ^ 1]
+#define UB(x) UB[(x) ^ 15]
+#define UH(x) UH[(x) ^ 7]
+#define UW(x) UW[(x) ^ 3]
+#define UD(x) UD[(x) ^ 1]
+#define Q(x)  Q[x]
+#else
+#define B(x)  B[x]
+#define H(x)  H[x]
+#define W(x)  W[x]
+#define D(x)  D[x]
+#define UB(x) UB[x]
+#define UH(x) UH[x]
+#define UW(x) UW[x]
+#define UD(x) UD[x]
+#define Q(x)  Q[x]
+#endif /* HOST_BIG_ENDIAN */
+
+#endif /* LOONGARCH_VEC_H */
diff --git a/linux-user/loongarch64/signal.c b/linux-user/loongarch64/signal.c
index bb8efb1172..39572c1190 100644
--- a/linux-user/loongarch64/signal.c
+++ b/linux-user/loongarch64/signal.c
@@ -12,6 +12,7 @@
 #include "linux-user/trace.h"
 
 #include "target/loongarch/internals.h"
+#include "target/loongarch/vec.h"
 
 /* FP context was used */
 #define SC_USED_FP              (1 << 0)
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 27fc6e1f33..923e4b30cf 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -18,6 +18,7 @@
 #include "cpu-csr.h"
 #include "sysemu/reset.h"
 #include "tcg/tcg.h"
+#include "vec.h"
 
 const char * const regnames[32] = {
     "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
diff --git a/target/loongarch/gdbstub.c b/target/loongarch/gdbstub.c
index b09804b62f..5fc2f19e96 100644
--- a/target/loongarch/gdbstub.c
+++ b/target/loongarch/gdbstub.c
@@ -11,6 +11,7 @@
 #include "internals.h"
 #include "exec/gdbstub.h"
 #include "gdbstub/helpers.h"
+#include "vec.h"
 
 uint64_t read_fcc(CPULoongArchState *env)
 {
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/lsx_helper.c
index 9571f0aef0..b231a2798b 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/lsx_helper.c
@@ -12,6 +12,7 @@
 #include "fpu/softfloat.h"
 #include "internals.h"
 #include "tcg/tcg.h"
+#include "vec.h"
 
 #define DO_ADD(a, b)  (a + b)
 #define DO_SUB(a, b)  (a - b)
diff --git a/target/loongarch/machine.c b/target/loongarch/machine.c
index d8ac99c9a4..1c4e01d076 100644
--- a/target/loongarch/machine.c
+++ b/target/loongarch/machine.c
@@ -8,7 +8,7 @@
 #include "qemu/osdep.h"
 #include "cpu.h"
 #include "migration/cpu.h"
-#include "internals.h"
+#include "vec.h"
 
 static const VMStateDescription vmstate_fpu_reg = {
     .name = "fpu_reg",
@@ -76,6 +76,39 @@ static const VMStateDescription vmstate_lsx = {
     },
 };
 
+static const VMStateDescription vmstate_lasxh_reg = {
+    .name = "lasxh_reg",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(UD(2), VReg),
+        VMSTATE_UINT64(UD(3), VReg),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+#define VMSTATE_LASXH_REGS(_field, _state, _start)          \
+    VMSTATE_STRUCT_SUB_ARRAY(_field, _state, _start, 32, 0, \
+                             vmstate_lasxh_reg, fpr_t)
+
+static bool lasx_needed(void *opaque)
+{
+    LoongArchCPU *cpu = opaque;
+
+    return FIELD_EX64(cpu->env.cpucfg[2], CPUCFG2, LASX);
+}
+
+static const VMStateDescription vmstate_lasx = {
+    .name = "cpu/lasx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = lasx_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_LASXH_REGS(env.fpr, LoongArchCPU, 0),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 /* TLB state */
 const VMStateDescription vmstate_tlb = {
     .name = "cpu/tlb",
@@ -163,6 +196,7 @@ const VMStateDescription vmstate_loongarch_cpu = {
     .subsections = (const VMStateDescription*[]) {
         &vmstate_fpu,
         &vmstate_lsx,
+        &vmstate_lasx,
         NULL
     }
 };
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 02/48] target/loongarch: meson.build support build LASX
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
  2023-08-30  8:48 ` [PATCH v4 01/48] target/loongarch: Add LASX data support Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 03/48] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
                   ` (45 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/translate.c                 | 1 +
 target/loongarch/insn_trans/trans_lasx.c.inc | 6 ++++++
 2 files changed, 7 insertions(+)
 create mode 100644 target/loongarch/insn_trans/trans_lasx.c.inc

diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index fd393ed76d..1f91afee81 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -262,6 +262,7 @@ static uint64_t make_address_pc(DisasContext *ctx, uint64_t addr)
 #include "insn_trans/trans_branch.c.inc"
 #include "insn_trans/trans_privileged.c.inc"
 #include "insn_trans/trans_lsx.c.inc"
+#include "insn_trans/trans_lasx.c.inc"
 
 static void loongarch_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
 {
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
new file mode 100644
index 0000000000..56a9839255
--- /dev/null
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * LASX translate functions
+ * Copyright (c) 2023 Loongson Technology Corporation Limited
+ */
+
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 03/48] target/loongarch: Add CHECK_ASXE maccro for check LASX enable
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
  2023-08-30  8:48 ` [PATCH v4 01/48] target/loongarch: Add LASX data support Song Gao
  2023-08-30  8:48 ` [PATCH v4 02/48] target/loongarch: meson.build support build LASX Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 04/48] target/loongarch: Add avail_LASX to check LASX instructions Song Gao
                   ` (44 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.h                       |  2 ++
 target/loongarch/cpu.c                       |  2 ++
 target/loongarch/insn_trans/trans_lasx.c.inc | 10 ++++++++++
 3 files changed, 14 insertions(+)

diff --git a/target/loongarch/cpu.h b/target/loongarch/cpu.h
index 347ad1c8a9..f125a8e49b 100644
--- a/target/loongarch/cpu.h
+++ b/target/loongarch/cpu.h
@@ -462,6 +462,7 @@ static inline void set_pc(CPULoongArchState *env, uint64_t value)
 #define HW_FLAGS_CRMD_PG    R_CSR_CRMD_PG_MASK   /* 0x10 */
 #define HW_FLAGS_EUEN_FPE   0x04
 #define HW_FLAGS_EUEN_SXE   0x08
+#define HW_FLAGS_EUEN_ASXE  0x10
 #define HW_FLAGS_VA32       0x20
 
 static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
@@ -472,6 +473,7 @@ static inline void cpu_get_tb_cpu_state(CPULoongArchState *env, vaddr *pc,
     *flags = env->CSR_CRMD & (R_CSR_CRMD_PLV_MASK | R_CSR_CRMD_PG_MASK);
     *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, FPE) * HW_FLAGS_EUEN_FPE;
     *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, SXE) * HW_FLAGS_EUEN_SXE;
+    *flags |= FIELD_EX64(env->CSR_EUEN, CSR_EUEN, ASXE) * HW_FLAGS_EUEN_ASXE;
     *flags |= is_va32(env) * HW_FLAGS_VA32;
 }
 
diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 923e4b30cf..4deae22104 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -54,6 +54,7 @@ static const char * const excp_names[] = {
     [EXCCODE_DBP] = "Debug breakpoint",
     [EXCCODE_BCE] = "Bound Check Exception",
     [EXCCODE_SXD] = "128 bit vector instructions Disable exception",
+    [EXCCODE_ASXD] = "256 bit vector instructions Disable exception",
 };
 
 const char *loongarch_exception_name(int32_t exception)
@@ -189,6 +190,7 @@ static void loongarch_cpu_do_interrupt(CPUState *cs)
     case EXCCODE_FPD:
     case EXCCODE_FPE:
     case EXCCODE_SXD:
+    case EXCCODE_ASXD:
         env->CSR_BADV = env->pc;
         QEMU_FALLTHROUGH;
     case EXCCODE_BCE:
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 56a9839255..75a77f5dce 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -4,3 +4,13 @@
  * Copyright (c) 2023 Loongson Technology Corporation Limited
  */
 
+#ifndef CONFIG_USER_ONLY
+#define CHECK_ASXE do { \
+    if ((ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0) { \
+        generate_exception(ctx, EXCCODE_ASXD); \
+        return true; \
+    } \
+} while (0)
+#else
+#define CHECK_ASXE
+#endif
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 04/48] target/loongarch: Add avail_LASX to check LASX instructions
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (2 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 03/48] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 14:20   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 05/48] target/loongarch: Implement xvadd/xvsub Song Gao
                   ` (43 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/translate.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/translate.h b/target/loongarch/translate.h
index 89b49a859e..195f53573a 100644
--- a/target/loongarch/translate.h
+++ b/target/loongarch/translate.h
@@ -23,6 +23,7 @@
 #define avail_LSPW(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSPW))
 #define avail_LAM(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LAM))
 #define avail_LSX(C)   (FIELD_EX32((C)->cpucfg2, CPUCFG2, LSX))
+#define avail_LASX(C)  (FIELD_EX32((C)->cpucfg2, CPUCFG2, LASX))
 #define avail_IOCSR(C) (FIELD_EX32((C)->cpucfg1, CPUCFG1, IOCSR))
 
 /*
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 04/48] target/loongarch: Add avail_LASX to check LASX instructions
  2023-08-30  8:48 ` [PATCH v4 04/48] target/loongarch: Add avail_LASX to check LASX instructions Song Gao
@ 2023-08-30 14:20   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 14:20 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/translate.h | 1 +
>   1 file changed, 1 insertion(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 05/48] target/loongarch: Implement xvadd/xvsub
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (3 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 04/48] target/loongarch: Add avail_LASX to check LASX instructions Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 15:38   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr Song Gao
                   ` (42 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVADD.{B/H/W/D/Q};
- XVSUB.{B/H/W/D/Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/vec.h                       |  17 +
 target/loongarch/insns.decode                |  14 +
 target/loongarch/disas.c                     |  23 +
 target/loongarch/translate.c                 |   4 +
 target/loongarch/insn_trans/trans_lasx.c.inc |  56 +-
 target/loongarch/insn_trans/trans_lsx.c.inc  | 513 +++++++++----------
 6 files changed, 355 insertions(+), 272 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 2f23cae7d7..512f2fd83f 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -8,6 +8,23 @@
 #ifndef LOONGARCH_VEC_H
 #define LOONGARCH_VEC_H
 
+#ifndef CONFIG_USER_ONLY
+ #define CHECK_VEC do { \
+     if ((ctx->vl == LSX_LEN) && \
+         (ctx->base.tb->flags & HW_FLAGS_EUEN_SXE) == 0) { \
+         generate_exception(ctx, EXCCODE_SXD); \
+         return true; \
+     } \
+     if ((ctx->vl == LASX_LEN) && \
+         (ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0) { \
+         generate_exception(ctx, EXCCODE_ASXD); \
+         return true; \
+     } \
+ } while (0)
+#else
+ #define CHECK_VEC
+#endif /*!CONFIG_USER_ONLY */
+
 #if HOST_BIG_ENDIAN
 #define B(x)  B[(x) ^ 15]
 #define H(x)  H[(x) ^ 7]
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c9c3bc2c73..bcc18fb6c5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1296,3 +1296,17 @@ vstelm_d         0011 00010001 0 . ........ ..... .....   @vr_i8i1
 vstelm_w         0011 00010010 .. ........ ..... .....    @vr_i8i2
 vstelm_h         0011 0001010 ... ........ ..... .....    @vr_i8i3
 vstelm_b         0011 000110 .... ........ ..... .....    @vr_i8i4
+
+#
+# LoongArch LASX instructions
+#
+xvadd_b          0111 01000000 10100 ..... ..... .....    @vvv
+xvadd_h          0111 01000000 10101 ..... ..... .....    @vvv
+xvadd_w          0111 01000000 10110 ..... ..... .....    @vvv
+xvadd_d          0111 01000000 10111 ..... ..... .....    @vvv
+xvadd_q          0111 01010010 11010 ..... ..... .....    @vvv
+xvsub_b          0111 01000000 11000 ..... ..... .....    @vvv
+xvsub_h          0111 01000000 11001 ..... ..... .....    @vvv
+xvsub_w          0111 01000000 11010 ..... ..... .....    @vvv
+xvsub_d          0111 01000000 11011 ..... ..... .....    @vvv
+xvsub_q          0111 01010010 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 5c402d944d..d8b62ba532 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1695,3 +1695,26 @@ INSN_LSX(vstelm_d,         vr_ii)
 INSN_LSX(vstelm_w,         vr_ii)
 INSN_LSX(vstelm_h,         vr_ii)
 INSN_LSX(vstelm_b,         vr_ii)
+
+#define INSN_LASX(insn, type)                               \
+static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
+{                                                           \
+    output_##type ## _x(ctx, a, #insn);                     \
+    return true;                                            \
+}
+
+static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
+}
+
+INSN_LASX(xvadd_b,           vvv)
+INSN_LASX(xvadd_h,           vvv)
+INSN_LASX(xvadd_w,           vvv)
+INSN_LASX(xvadd_d,           vvv)
+INSN_LASX(xvadd_q,           vvv)
+INSN_LASX(xvsub_b,           vvv)
+INSN_LASX(xvsub_h,           vvv)
+INSN_LASX(xvsub_w,           vvv)
+INSN_LASX(xvsub_d,           vvv)
+INSN_LASX(xvsub_q,           vvv)
diff --git a/target/loongarch/translate.c b/target/loongarch/translate.c
index 1f91afee81..36039dfeef 100644
--- a/target/loongarch/translate.c
+++ b/target/loongarch/translate.c
@@ -18,6 +18,7 @@
 #include "fpu/softfloat.h"
 #include "translate.h"
 #include "internals.h"
+#include "vec.h"
 
 /* Global register indices */
 TCGv cpu_gpr[32], cpu_pc;
@@ -122,6 +123,9 @@ static void loongarch_tr_init_disas_context(DisasContextBase *dcbase,
     if (FIELD_EX64(env->cpucfg[2], CPUCFG2, LSX)) {
         ctx->vl = LSX_LEN;
     }
+    if (FIELD_EX64(env->cpucfg[2], CPUCFG2, LASX)) {
+        ctx->vl = LASX_LEN;
+    }
 
     ctx->la64 = is_la64(env);
     ctx->va32 = (ctx->base.tb->flags & HW_FLAGS_VA32) != 0;
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 75a77f5dce..218b8dc648 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -4,13 +4,49 @@
  * Copyright (c) 2023 Loongson Technology Corporation Limited
  */
 
-#ifndef CONFIG_USER_ONLY
-#define CHECK_ASXE do { \
-    if ((ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0) { \
-        generate_exception(ctx, EXCCODE_ASXD); \
-        return true; \
-    } \
-} while (0)
-#else
-#define CHECK_ASXE
-#endif
+TRANS(xvadd_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_add)
+TRANS(xvadd_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_add)
+TRANS(xvadd_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_add)
+TRANS(xvadd_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_add)
+
+#define XVADDSUB_Q(NAME)                                         \
+static bool trans_xv## NAME ##_q(DisasContext *ctx, arg_vvv * a) \
+{                                                                \
+    TCGv_i64 rh, rl, ah, al, bh, bl;                             \
+    int i;                                                       \
+                                                                 \
+    if (!avail_LASX(ctx)) {                                      \
+        return false;                                            \
+    }                                                            \
+                                                                 \
+    CHECK_VEC;                                                   \
+                                                                 \
+    rh = tcg_temp_new_i64();                                     \
+    rl = tcg_temp_new_i64();                                     \
+    ah = tcg_temp_new_i64();                                     \
+    al = tcg_temp_new_i64();                                     \
+    bh = tcg_temp_new_i64();                                     \
+    bl = tcg_temp_new_i64();                                     \
+                                                                 \
+    for (i = 0; i < 2; i++) {                                    \
+        get_vreg64(ah, a->vj, 1 + i * 2);                        \
+        get_vreg64(al, a->vj, 0 + i * 2);                        \
+        get_vreg64(bh, a->vk, 1 + i * 2);                        \
+        get_vreg64(bl, a->vk, 0 + i * 2);                        \
+                                                                 \
+        tcg_gen_## NAME ##2_i64(rl, rh, al, ah, bl, bh);         \
+                                                                 \
+        set_vreg64(rh, a->vd, 1 + i * 2);                        \
+        set_vreg64(rl, a->vd, 0 + i * 2);                        \
+   }                                                             \
+                                                                 \
+    return true;                                                 \
+}
+
+XVADDSUB_Q(add)
+XVADDSUB_Q(sub)
+
+TRANS(xvsub_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_sub)
+TRANS(xvsub_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sub)
+TRANS(xvsub_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sub)
+TRANS(xvsub_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sub)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5fbf2718f7..0e12213e8b 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -4,17 +4,6 @@
  * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
  */
 
-#ifndef CONFIG_USER_ONLY
-#define CHECK_SXE do { \
-    if ((ctx->base.tb->flags & HW_FLAGS_EUEN_SXE) == 0) { \
-        generate_exception(ctx, EXCCODE_SXD); \
-        return true; \
-    } \
-} while (0)
-#else
-#define CHECK_SXE
-#endif
-
 static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
                      void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
                                   TCGv_i32, TCGv_i32))
@@ -24,7 +13,7 @@ static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
     TCGv_i32 vk = tcg_constant_i32(a->vk);
     TCGv_i32 va = tcg_constant_i32(a->va);
 
-    CHECK_SXE;
+    CHECK_VEC;
     func(cpu_env, vd, vj, vk, va);
     return true;
 }
@@ -36,7 +25,7 @@ static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
     TCGv_i32 vj = tcg_constant_i32(a->vj);
     TCGv_i32 vk = tcg_constant_i32(a->vk);
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     func(cpu_env, vd, vj, vk);
     return true;
@@ -48,7 +37,7 @@ static bool gen_vv(DisasContext *ctx, arg_vv *a,
     TCGv_i32 vd = tcg_constant_i32(a->vd);
     TCGv_i32 vj = tcg_constant_i32(a->vj);
 
-    CHECK_SXE;
+    CHECK_VEC;
     func(cpu_env, vd, vj);
     return true;
 }
@@ -60,7 +49,7 @@ static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
     TCGv_i32 vj = tcg_constant_i32(a->vj);
     TCGv_i32 imm = tcg_constant_i32(a->imm);
 
-    CHECK_SXE;
+    CHECK_VEC;
     func(cpu_env, vd, vj, imm);
     return true;
 }
@@ -71,24 +60,24 @@ static bool gen_cv(DisasContext *ctx, arg_cv *a,
     TCGv_i32 vj = tcg_constant_i32(a->vj);
     TCGv_i32 cd = tcg_constant_i32(a->cd);
 
-    CHECK_SXE;
+    CHECK_VEC;
     func(cpu_env, cd, vj);
     return true;
 }
 
-static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
+static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, uint32_t oprsz, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
 {
     uint32_t vd_ofs, vj_ofs, vk_ofs;
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
     vk_ofs = vec_full_offset(a->vk);
 
-    func(mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
+    func(mop, vd_ofs, vj_ofs, vk_ofs, oprsz, ctx->vl / 8);
     return true;
 }
 
@@ -98,7 +87,7 @@ static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp mop,
 {
     uint32_t vd_ofs, vj_ofs;
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
@@ -113,7 +102,7 @@ static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
 {
     uint32_t vd_ofs, vj_ofs;
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
@@ -126,7 +115,7 @@ static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
 {
     uint32_t vd_ofs, vj_ofs;
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
@@ -135,10 +124,10 @@ static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
     return true;
 }
 
-TRANS(vadd_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_add)
-TRANS(vadd_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_add)
-TRANS(vadd_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_add)
-TRANS(vadd_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_add)
+TRANS(vadd_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_add)
+TRANS(vadd_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_add)
+TRANS(vadd_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_add)
+TRANS(vadd_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_add)
 
 #define VADDSUB_Q(NAME)                                        \
 static bool trans_v## NAME ##_q(DisasContext *ctx, arg_vvv *a) \
@@ -149,7 +138,7 @@ static bool trans_v## NAME ##_q(DisasContext *ctx, arg_vvv *a) \
         return false;                                          \
     }                                                          \
                                                                \
-    CHECK_SXE;                                                 \
+    CHECK_VEC;                                                 \
                                                                \
     rh = tcg_temp_new_i64();                                   \
     rl = tcg_temp_new_i64();                                   \
@@ -174,10 +163,10 @@ static bool trans_v## NAME ##_q(DisasContext *ctx, arg_vvv *a) \
 VADDSUB_Q(add)
 VADDSUB_Q(sub)
 
-TRANS(vsub_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_sub)
-TRANS(vsub_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_sub)
-TRANS(vsub_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_sub)
-TRANS(vsub_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_sub)
+TRANS(vsub_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_sub)
+TRANS(vsub_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_sub)
+TRANS(vsub_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_sub)
+TRANS(vsub_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_sub)
 
 TRANS(vaddi_bu, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_addi)
 TRANS(vaddi_hu, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_addi)
@@ -193,22 +182,22 @@ TRANS(vneg_h, LSX, gvec_vv, MO_16, tcg_gen_gvec_neg)
 TRANS(vneg_w, LSX, gvec_vv, MO_32, tcg_gen_gvec_neg)
 TRANS(vneg_d, LSX, gvec_vv, MO_64, tcg_gen_gvec_neg)
 
-TRANS(vsadd_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_ssadd)
-TRANS(vsadd_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_ssadd)
-TRANS(vsadd_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_ssadd)
-TRANS(vsadd_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_ssadd)
-TRANS(vsadd_bu, LSX, gvec_vvv, MO_8, tcg_gen_gvec_usadd)
-TRANS(vsadd_hu, LSX, gvec_vvv, MO_16, tcg_gen_gvec_usadd)
-TRANS(vsadd_wu, LSX, gvec_vvv, MO_32, tcg_gen_gvec_usadd)
-TRANS(vsadd_du, LSX, gvec_vvv, MO_64, tcg_gen_gvec_usadd)
-TRANS(vssub_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_sssub)
-TRANS(vssub_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_sssub)
-TRANS(vssub_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_sssub)
-TRANS(vssub_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_sssub)
-TRANS(vssub_bu, LSX, gvec_vvv, MO_8, tcg_gen_gvec_ussub)
-TRANS(vssub_hu, LSX, gvec_vvv, MO_16, tcg_gen_gvec_ussub)
-TRANS(vssub_wu, LSX, gvec_vvv, MO_32, tcg_gen_gvec_ussub)
-TRANS(vssub_du, LSX, gvec_vvv, MO_64, tcg_gen_gvec_ussub)
+TRANS(vsadd_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_ssadd)
+TRANS(vsadd_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_ssadd)
+TRANS(vsadd_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_ssadd)
+TRANS(vsadd_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_ssadd)
+TRANS(vsadd_bu, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_usadd)
+TRANS(vsadd_hu, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_usadd)
+TRANS(vsadd_wu, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_usadd)
+TRANS(vsadd_du, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_usadd)
+TRANS(vssub_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_sssub)
+TRANS(vssub_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_sssub)
+TRANS(vssub_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_sssub)
+TRANS(vssub_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_sssub)
+TRANS(vssub_bu, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_ussub)
+TRANS(vssub_hu, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_ussub)
+TRANS(vssub_wu, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_ussub)
+TRANS(vssub_du, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_ussub)
 
 TRANS(vhaddw_h_b, LSX, gen_vvv, gen_helper_vhaddw_h_b)
 TRANS(vhaddw_w_h, LSX, gen_vvv, gen_helper_vhaddw_w_h)
@@ -305,10 +294,10 @@ static void do_vaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vaddwev_h_b, LSX, gvec_vvv, MO_8, do_vaddwev_s)
-TRANS(vaddwev_w_h, LSX, gvec_vvv, MO_16, do_vaddwev_s)
-TRANS(vaddwev_d_w, LSX, gvec_vvv, MO_32, do_vaddwev_s)
-TRANS(vaddwev_q_d, LSX, gvec_vvv, MO_64, do_vaddwev_s)
+TRANS(vaddwev_h_b, LSX, gvec_vvv, 16, MO_8, do_vaddwev_s)
+TRANS(vaddwev_w_h, LSX, gvec_vvv, 16, MO_16, do_vaddwev_s)
+TRANS(vaddwev_d_w, LSX, gvec_vvv, 16, MO_32, do_vaddwev_s)
+TRANS(vaddwev_q_d, LSX, gvec_vvv, 16, MO_64, do_vaddwev_s)
 
 static void gen_vaddwod_w_h(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
 {
@@ -384,10 +373,10 @@ static void do_vaddwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vaddwod_h_b, LSX, gvec_vvv, MO_8, do_vaddwod_s)
-TRANS(vaddwod_w_h, LSX, gvec_vvv, MO_16, do_vaddwod_s)
-TRANS(vaddwod_d_w, LSX, gvec_vvv, MO_32, do_vaddwod_s)
-TRANS(vaddwod_q_d, LSX, gvec_vvv, MO_64, do_vaddwod_s)
+TRANS(vaddwod_h_b, LSX, gvec_vvv, 16, MO_8, do_vaddwod_s)
+TRANS(vaddwod_w_h, LSX, gvec_vvv, 16, MO_16, do_vaddwod_s)
+TRANS(vaddwod_d_w, LSX, gvec_vvv, 16, MO_32, do_vaddwod_s)
+TRANS(vaddwod_q_d, LSX, gvec_vvv, 16, MO_64, do_vaddwod_s)
 
 static void gen_vsubwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -467,10 +456,10 @@ static void do_vsubwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vsubwev_h_b, LSX, gvec_vvv, MO_8, do_vsubwev_s)
-TRANS(vsubwev_w_h, LSX, gvec_vvv, MO_16, do_vsubwev_s)
-TRANS(vsubwev_d_w, LSX, gvec_vvv, MO_32, do_vsubwev_s)
-TRANS(vsubwev_q_d, LSX, gvec_vvv, MO_64, do_vsubwev_s)
+TRANS(vsubwev_h_b, LSX, gvec_vvv, 16, MO_8, do_vsubwev_s)
+TRANS(vsubwev_w_h, LSX, gvec_vvv, 16, MO_16, do_vsubwev_s)
+TRANS(vsubwev_d_w, LSX, gvec_vvv, 16, MO_32, do_vsubwev_s)
+TRANS(vsubwev_q_d, LSX, gvec_vvv, 16, MO_64, do_vsubwev_s)
 
 static void gen_vsubwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -546,10 +535,10 @@ static void do_vsubwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vsubwod_h_b, LSX, gvec_vvv, MO_8, do_vsubwod_s)
-TRANS(vsubwod_w_h, LSX, gvec_vvv, MO_16, do_vsubwod_s)
-TRANS(vsubwod_d_w, LSX, gvec_vvv, MO_32, do_vsubwod_s)
-TRANS(vsubwod_q_d, LSX, gvec_vvv, MO_64, do_vsubwod_s)
+TRANS(vsubwod_h_b, LSX, gvec_vvv, 16, MO_8, do_vsubwod_s)
+TRANS(vsubwod_w_h, LSX, gvec_vvv, 16, MO_16, do_vsubwod_s)
+TRANS(vsubwod_d_w, LSX, gvec_vvv, 16, MO_32, do_vsubwod_s)
+TRANS(vsubwod_q_d, LSX, gvec_vvv, 16, MO_64, do_vsubwod_s)
 
 static void gen_vaddwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -621,10 +610,10 @@ static void do_vaddwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vaddwev_h_bu, LSX, gvec_vvv, MO_8, do_vaddwev_u)
-TRANS(vaddwev_w_hu, LSX, gvec_vvv, MO_16, do_vaddwev_u)
-TRANS(vaddwev_d_wu, LSX, gvec_vvv, MO_32, do_vaddwev_u)
-TRANS(vaddwev_q_du, LSX, gvec_vvv, MO_64, do_vaddwev_u)
+TRANS(vaddwev_h_bu, LSX, gvec_vvv, 16, MO_8, do_vaddwev_u)
+TRANS(vaddwev_w_hu, LSX, gvec_vvv, 16, MO_16, do_vaddwev_u)
+TRANS(vaddwev_d_wu, LSX, gvec_vvv, 16, MO_32, do_vaddwev_u)
+TRANS(vaddwev_q_du, LSX, gvec_vvv, 16, MO_64, do_vaddwev_u)
 
 static void gen_vaddwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -700,10 +689,10 @@ static void do_vaddwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vaddwod_h_bu, LSX, gvec_vvv, MO_8, do_vaddwod_u)
-TRANS(vaddwod_w_hu, LSX, gvec_vvv, MO_16, do_vaddwod_u)
-TRANS(vaddwod_d_wu, LSX, gvec_vvv, MO_32, do_vaddwod_u)
-TRANS(vaddwod_q_du, LSX, gvec_vvv, MO_64, do_vaddwod_u)
+TRANS(vaddwod_h_bu, LSX, gvec_vvv, 16, MO_8, do_vaddwod_u)
+TRANS(vaddwod_w_hu, LSX, gvec_vvv, 16, MO_16, do_vaddwod_u)
+TRANS(vaddwod_d_wu, LSX, gvec_vvv, 16, MO_32, do_vaddwod_u)
+TRANS(vaddwod_q_du, LSX, gvec_vvv, 16, MO_64, do_vaddwod_u)
 
 static void gen_vsubwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -775,10 +764,10 @@ static void do_vsubwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vsubwev_h_bu, LSX, gvec_vvv, MO_8, do_vsubwev_u)
-TRANS(vsubwev_w_hu, LSX, gvec_vvv, MO_16, do_vsubwev_u)
-TRANS(vsubwev_d_wu, LSX, gvec_vvv, MO_32, do_vsubwev_u)
-TRANS(vsubwev_q_du, LSX, gvec_vvv, MO_64, do_vsubwev_u)
+TRANS(vsubwev_h_bu, LSX, gvec_vvv, 16, MO_8, do_vsubwev_u)
+TRANS(vsubwev_w_hu, LSX, gvec_vvv, 16, MO_16, do_vsubwev_u)
+TRANS(vsubwev_d_wu, LSX, gvec_vvv, 16, MO_32, do_vsubwev_u)
+TRANS(vsubwev_q_du, LSX, gvec_vvv, 16, MO_64, do_vsubwev_u)
 
 static void gen_vsubwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -854,10 +843,10 @@ static void do_vsubwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vsubwod_h_bu, LSX, gvec_vvv, MO_8, do_vsubwod_u)
-TRANS(vsubwod_w_hu, LSX, gvec_vvv, MO_16, do_vsubwod_u)
-TRANS(vsubwod_d_wu, LSX, gvec_vvv, MO_32, do_vsubwod_u)
-TRANS(vsubwod_q_du, LSX, gvec_vvv, MO_64, do_vsubwod_u)
+TRANS(vsubwod_h_bu, LSX, gvec_vvv, 16, MO_8, do_vsubwod_u)
+TRANS(vsubwod_w_hu, LSX, gvec_vvv, 16, MO_16, do_vsubwod_u)
+TRANS(vsubwod_d_wu, LSX, gvec_vvv, 16, MO_32, do_vsubwod_u)
+TRANS(vsubwod_q_du, LSX, gvec_vvv, 16, MO_64, do_vsubwod_u)
 
 static void gen_vaddwev_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -937,10 +926,10 @@ static void do_vaddwev_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vaddwev_h_bu_b, LSX, gvec_vvv, MO_8, do_vaddwev_u_s)
-TRANS(vaddwev_w_hu_h, LSX, gvec_vvv, MO_16, do_vaddwev_u_s)
-TRANS(vaddwev_d_wu_w, LSX, gvec_vvv, MO_32, do_vaddwev_u_s)
-TRANS(vaddwev_q_du_d, LSX, gvec_vvv, MO_64, do_vaddwev_u_s)
+TRANS(vaddwev_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vaddwev_u_s)
+TRANS(vaddwev_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vaddwev_u_s)
+TRANS(vaddwev_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vaddwev_u_s)
+TRANS(vaddwev_q_du_d, LSX, gvec_vvv, 16, MO_64, do_vaddwev_u_s)
 
 static void gen_vaddwod_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -1017,10 +1006,10 @@ static void do_vaddwod_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vaddwod_h_bu_b, LSX, gvec_vvv, MO_8, do_vaddwod_u_s)
-TRANS(vaddwod_w_hu_h, LSX, gvec_vvv, MO_16, do_vaddwod_u_s)
-TRANS(vaddwod_d_wu_w, LSX, gvec_vvv, MO_32, do_vaddwod_u_s)
-TRANS(vaddwod_q_du_d, LSX, gvec_vvv, MO_64, do_vaddwod_u_s)
+TRANS(vaddwod_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vaddwod_u_s)
+TRANS(vaddwod_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vaddwod_u_s)
+TRANS(vaddwod_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vaddwod_u_s)
+TRANS(vaddwod_q_du_d, LSX, gvec_vvv, 16, MO_64, do_vaddwod_u_s)
 
 static void do_vavg(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
                     void (*gen_shr_vec)(unsigned, TCGv_vec,
@@ -1129,14 +1118,14 @@ static void do_vavg_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vavg_b, LSX, gvec_vvv, MO_8, do_vavg_s)
-TRANS(vavg_h, LSX, gvec_vvv, MO_16, do_vavg_s)
-TRANS(vavg_w, LSX, gvec_vvv, MO_32, do_vavg_s)
-TRANS(vavg_d, LSX, gvec_vvv, MO_64, do_vavg_s)
-TRANS(vavg_bu, LSX, gvec_vvv, MO_8, do_vavg_u)
-TRANS(vavg_hu, LSX, gvec_vvv, MO_16, do_vavg_u)
-TRANS(vavg_wu, LSX, gvec_vvv, MO_32, do_vavg_u)
-TRANS(vavg_du, LSX, gvec_vvv, MO_64, do_vavg_u)
+TRANS(vavg_b, LSX, gvec_vvv, 16, MO_8, do_vavg_s)
+TRANS(vavg_h, LSX, gvec_vvv, 16, MO_16, do_vavg_s)
+TRANS(vavg_w, LSX, gvec_vvv, 16, MO_32, do_vavg_s)
+TRANS(vavg_d, LSX, gvec_vvv, 16, MO_64, do_vavg_s)
+TRANS(vavg_bu, LSX, gvec_vvv, 16, MO_8, do_vavg_u)
+TRANS(vavg_hu, LSX, gvec_vvv, 16, MO_16, do_vavg_u)
+TRANS(vavg_wu, LSX, gvec_vvv, 16, MO_32, do_vavg_u)
+TRANS(vavg_du, LSX, gvec_vvv, 16, MO_64, do_vavg_u)
 
 static void do_vavgr_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                        uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
@@ -1210,14 +1199,14 @@ static void do_vavgr_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vavgr_b, LSX, gvec_vvv, MO_8, do_vavgr_s)
-TRANS(vavgr_h, LSX, gvec_vvv, MO_16, do_vavgr_s)
-TRANS(vavgr_w, LSX, gvec_vvv, MO_32, do_vavgr_s)
-TRANS(vavgr_d, LSX, gvec_vvv, MO_64, do_vavgr_s)
-TRANS(vavgr_bu, LSX, gvec_vvv, MO_8, do_vavgr_u)
-TRANS(vavgr_hu, LSX, gvec_vvv, MO_16, do_vavgr_u)
-TRANS(vavgr_wu, LSX, gvec_vvv, MO_32, do_vavgr_u)
-TRANS(vavgr_du, LSX, gvec_vvv, MO_64, do_vavgr_u)
+TRANS(vavgr_b, LSX, gvec_vvv, 16, MO_8, do_vavgr_s)
+TRANS(vavgr_h, LSX, gvec_vvv, 16, MO_16, do_vavgr_s)
+TRANS(vavgr_w, LSX, gvec_vvv, 16, MO_32, do_vavgr_s)
+TRANS(vavgr_d, LSX, gvec_vvv, 16, MO_64, do_vavgr_s)
+TRANS(vavgr_bu, LSX, gvec_vvv, 16, MO_8, do_vavgr_u)
+TRANS(vavgr_hu, LSX, gvec_vvv, 16, MO_16, do_vavgr_u)
+TRANS(vavgr_wu, LSX, gvec_vvv, 16, MO_32, do_vavgr_u)
+TRANS(vavgr_du, LSX, gvec_vvv, 16, MO_64, do_vavgr_u)
 
 static void gen_vabsd_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -1305,14 +1294,14 @@ static void do_vabsd_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vabsd_b, LSX, gvec_vvv, MO_8, do_vabsd_s)
-TRANS(vabsd_h, LSX, gvec_vvv, MO_16, do_vabsd_s)
-TRANS(vabsd_w, LSX, gvec_vvv, MO_32, do_vabsd_s)
-TRANS(vabsd_d, LSX, gvec_vvv, MO_64, do_vabsd_s)
-TRANS(vabsd_bu, LSX, gvec_vvv, MO_8, do_vabsd_u)
-TRANS(vabsd_hu, LSX, gvec_vvv, MO_16, do_vabsd_u)
-TRANS(vabsd_wu, LSX, gvec_vvv, MO_32, do_vabsd_u)
-TRANS(vabsd_du, LSX, gvec_vvv, MO_64, do_vabsd_u)
+TRANS(vabsd_b, LSX, gvec_vvv, 16, MO_8, do_vabsd_s)
+TRANS(vabsd_h, LSX, gvec_vvv, 16, MO_16, do_vabsd_s)
+TRANS(vabsd_w, LSX, gvec_vvv, 16, MO_32, do_vabsd_s)
+TRANS(vabsd_d, LSX, gvec_vvv, 16, MO_64, do_vabsd_s)
+TRANS(vabsd_bu, LSX, gvec_vvv, 16, MO_8, do_vabsd_u)
+TRANS(vabsd_hu, LSX, gvec_vvv, 16, MO_16, do_vabsd_u)
+TRANS(vabsd_wu, LSX, gvec_vvv, 16, MO_32, do_vabsd_u)
+TRANS(vabsd_du, LSX, gvec_vvv, 16, MO_64, do_vabsd_u)
 
 static void gen_vadda(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -1362,28 +1351,28 @@ static void do_vadda(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vadda_b, LSX, gvec_vvv, MO_8, do_vadda)
-TRANS(vadda_h, LSX, gvec_vvv, MO_16, do_vadda)
-TRANS(vadda_w, LSX, gvec_vvv, MO_32, do_vadda)
-TRANS(vadda_d, LSX, gvec_vvv, MO_64, do_vadda)
-
-TRANS(vmax_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_smax)
-TRANS(vmax_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_smax)
-TRANS(vmax_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_smax)
-TRANS(vmax_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_smax)
-TRANS(vmax_bu, LSX, gvec_vvv, MO_8, tcg_gen_gvec_umax)
-TRANS(vmax_hu, LSX, gvec_vvv, MO_16, tcg_gen_gvec_umax)
-TRANS(vmax_wu, LSX, gvec_vvv, MO_32, tcg_gen_gvec_umax)
-TRANS(vmax_du, LSX, gvec_vvv, MO_64, tcg_gen_gvec_umax)
-
-TRANS(vmin_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_smin)
-TRANS(vmin_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_smin)
-TRANS(vmin_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_smin)
-TRANS(vmin_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_smin)
-TRANS(vmin_bu, LSX, gvec_vvv, MO_8, tcg_gen_gvec_umin)
-TRANS(vmin_hu, LSX, gvec_vvv, MO_16, tcg_gen_gvec_umin)
-TRANS(vmin_wu, LSX, gvec_vvv, MO_32, tcg_gen_gvec_umin)
-TRANS(vmin_du, LSX, gvec_vvv, MO_64, tcg_gen_gvec_umin)
+TRANS(vadda_b, LSX, gvec_vvv, 16, MO_8, do_vadda)
+TRANS(vadda_h, LSX, gvec_vvv, 16, MO_16, do_vadda)
+TRANS(vadda_w, LSX, gvec_vvv, 16, MO_32, do_vadda)
+TRANS(vadda_d, LSX, gvec_vvv, 16, MO_64, do_vadda)
+
+TRANS(vmax_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_smax)
+TRANS(vmax_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_smax)
+TRANS(vmax_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_smax)
+TRANS(vmax_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_smax)
+TRANS(vmax_bu, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_umax)
+TRANS(vmax_hu, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_umax)
+TRANS(vmax_wu, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_umax)
+TRANS(vmax_du, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_umax)
+
+TRANS(vmin_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_smin)
+TRANS(vmin_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_smin)
+TRANS(vmin_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_smin)
+TRANS(vmin_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_smin)
+TRANS(vmin_bu, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_umin)
+TRANS(vmin_hu, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_umin)
+TRANS(vmin_wu, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_umin)
+TRANS(vmin_du, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_umin)
 
 static void gen_vmini_s(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
 {
@@ -1567,10 +1556,10 @@ TRANS(vmaxi_hu, LSX, gvec_vv_i, MO_16, do_vmaxi_u)
 TRANS(vmaxi_wu, LSX, gvec_vv_i, MO_32, do_vmaxi_u)
 TRANS(vmaxi_du, LSX, gvec_vv_i, MO_64, do_vmaxi_u)
 
-TRANS(vmul_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_mul)
-TRANS(vmul_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_mul)
-TRANS(vmul_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_mul)
-TRANS(vmul_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_mul)
+TRANS(vmul_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_mul)
+TRANS(vmul_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_mul)
+TRANS(vmul_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_mul)
+TRANS(vmul_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_mul)
 
 static void gen_vmuh_w(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
 {
@@ -1611,10 +1600,10 @@ static void do_vmuh_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmuh_b, LSX, gvec_vvv, MO_8, do_vmuh_s)
-TRANS(vmuh_h, LSX, gvec_vvv, MO_16, do_vmuh_s)
-TRANS(vmuh_w, LSX, gvec_vvv, MO_32, do_vmuh_s)
-TRANS(vmuh_d, LSX, gvec_vvv, MO_64, do_vmuh_s)
+TRANS(vmuh_b, LSX, gvec_vvv, 16, MO_8, do_vmuh_s)
+TRANS(vmuh_h, LSX, gvec_vvv, 16, MO_16, do_vmuh_s)
+TRANS(vmuh_w, LSX, gvec_vvv, 16, MO_32, do_vmuh_s)
+TRANS(vmuh_d, LSX, gvec_vvv, 16, MO_64, do_vmuh_s)
 
 static void gen_vmuh_wu(TCGv_i32 t, TCGv_i32 a, TCGv_i32 b)
 {
@@ -1655,10 +1644,10 @@ static void do_vmuh_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmuh_bu, LSX, gvec_vvv, MO_8,  do_vmuh_u)
-TRANS(vmuh_hu, LSX, gvec_vvv, MO_16, do_vmuh_u)
-TRANS(vmuh_wu, LSX, gvec_vvv, MO_32, do_vmuh_u)
-TRANS(vmuh_du, LSX, gvec_vvv, MO_64, do_vmuh_u)
+TRANS(vmuh_bu, LSX, gvec_vvv, 16, MO_8,  do_vmuh_u)
+TRANS(vmuh_hu, LSX, gvec_vvv, 16, MO_16, do_vmuh_u)
+TRANS(vmuh_wu, LSX, gvec_vvv, 16, MO_32, do_vmuh_u)
+TRANS(vmuh_du, LSX, gvec_vvv, 16, MO_64, do_vmuh_u)
 
 static void gen_vmulwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -1728,9 +1717,9 @@ static void do_vmulwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmulwev_h_b, LSX, gvec_vvv, MO_8, do_vmulwev_s)
-TRANS(vmulwev_w_h, LSX, gvec_vvv, MO_16, do_vmulwev_s)
-TRANS(vmulwev_d_w, LSX, gvec_vvv, MO_32, do_vmulwev_s)
+TRANS(vmulwev_h_b, LSX, gvec_vvv, 16, MO_8, do_vmulwev_s)
+TRANS(vmulwev_w_h, LSX, gvec_vvv, 16, MO_16, do_vmulwev_s)
+TRANS(vmulwev_d_w, LSX, gvec_vvv, 16, MO_32, do_vmulwev_s)
 
 static void tcg_gen_mulus2_i64(TCGv_i64 rl, TCGv_i64 rh,
                                TCGv_i64 arg1, TCGv_i64 arg2)
@@ -1836,9 +1825,9 @@ static void do_vmulwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmulwod_h_b, LSX, gvec_vvv, MO_8, do_vmulwod_s)
-TRANS(vmulwod_w_h, LSX, gvec_vvv, MO_16, do_vmulwod_s)
-TRANS(vmulwod_d_w, LSX, gvec_vvv, MO_32, do_vmulwod_s)
+TRANS(vmulwod_h_b, LSX, gvec_vvv, 16, MO_8, do_vmulwod_s)
+TRANS(vmulwod_w_h, LSX, gvec_vvv, 16, MO_16, do_vmulwod_s)
+TRANS(vmulwod_d_w, LSX, gvec_vvv, 16, MO_32, do_vmulwod_s)
 
 static void gen_vmulwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -1906,9 +1895,9 @@ static void do_vmulwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmulwev_h_bu, LSX, gvec_vvv, MO_8, do_vmulwev_u)
-TRANS(vmulwev_w_hu, LSX, gvec_vvv, MO_16, do_vmulwev_u)
-TRANS(vmulwev_d_wu, LSX, gvec_vvv, MO_32, do_vmulwev_u)
+TRANS(vmulwev_h_bu, LSX, gvec_vvv, 16, MO_8, do_vmulwev_u)
+TRANS(vmulwev_w_hu, LSX, gvec_vvv, 16, MO_16, do_vmulwev_u)
+TRANS(vmulwev_d_wu, LSX, gvec_vvv, 16, MO_32, do_vmulwev_u)
 
 static void gen_vmulwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -1976,9 +1965,9 @@ static void do_vmulwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmulwod_h_bu, LSX, gvec_vvv, MO_8, do_vmulwod_u)
-TRANS(vmulwod_w_hu, LSX, gvec_vvv, MO_16, do_vmulwod_u)
-TRANS(vmulwod_d_wu, LSX, gvec_vvv, MO_32, do_vmulwod_u)
+TRANS(vmulwod_h_bu, LSX, gvec_vvv, 16, MO_8, do_vmulwod_u)
+TRANS(vmulwod_w_hu, LSX, gvec_vvv, 16, MO_16, do_vmulwod_u)
+TRANS(vmulwod_d_wu, LSX, gvec_vvv, 16, MO_32, do_vmulwod_u)
 
 static void gen_vmulwev_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2048,9 +2037,9 @@ static void do_vmulwev_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmulwev_h_bu_b, LSX, gvec_vvv, MO_8, do_vmulwev_u_s)
-TRANS(vmulwev_w_hu_h, LSX, gvec_vvv, MO_16, do_vmulwev_u_s)
-TRANS(vmulwev_d_wu_w, LSX, gvec_vvv, MO_32, do_vmulwev_u_s)
+TRANS(vmulwev_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vmulwev_u_s)
+TRANS(vmulwev_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vmulwev_u_s)
+TRANS(vmulwev_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vmulwev_u_s)
 
 static void gen_vmulwod_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2117,9 +2106,9 @@ static void do_vmulwod_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmulwod_h_bu_b, LSX, gvec_vvv, MO_8, do_vmulwod_u_s)
-TRANS(vmulwod_w_hu_h, LSX, gvec_vvv, MO_16, do_vmulwod_u_s)
-TRANS(vmulwod_d_wu_w, LSX, gvec_vvv, MO_32, do_vmulwod_u_s)
+TRANS(vmulwod_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vmulwod_u_s)
+TRANS(vmulwod_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vmulwod_u_s)
+TRANS(vmulwod_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vmulwod_u_s)
 
 static void gen_vmadd(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2190,10 +2179,10 @@ static void do_vmadd(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmadd_b, LSX, gvec_vvv, MO_8, do_vmadd)
-TRANS(vmadd_h, LSX, gvec_vvv, MO_16, do_vmadd)
-TRANS(vmadd_w, LSX, gvec_vvv, MO_32, do_vmadd)
-TRANS(vmadd_d, LSX, gvec_vvv, MO_64, do_vmadd)
+TRANS(vmadd_b, LSX, gvec_vvv, 16, MO_8, do_vmadd)
+TRANS(vmadd_h, LSX, gvec_vvv, 16, MO_16, do_vmadd)
+TRANS(vmadd_w, LSX, gvec_vvv, 16, MO_32, do_vmadd)
+TRANS(vmadd_d, LSX, gvec_vvv, 16, MO_64, do_vmadd)
 
 static void gen_vmsub(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2264,10 +2253,10 @@ static void do_vmsub(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmsub_b, LSX, gvec_vvv, MO_8, do_vmsub)
-TRANS(vmsub_h, LSX, gvec_vvv, MO_16, do_vmsub)
-TRANS(vmsub_w, LSX, gvec_vvv, MO_32, do_vmsub)
-TRANS(vmsub_d, LSX, gvec_vvv, MO_64, do_vmsub)
+TRANS(vmsub_b, LSX, gvec_vvv, 16, MO_8, do_vmsub)
+TRANS(vmsub_h, LSX, gvec_vvv, 16, MO_16, do_vmsub)
+TRANS(vmsub_w, LSX, gvec_vvv, 16, MO_32, do_vmsub)
+TRANS(vmsub_d, LSX, gvec_vvv, 16, MO_64, do_vmsub)
 
 static void gen_vmaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2339,9 +2328,9 @@ static void do_vmaddwev_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmaddwev_h_b, LSX, gvec_vvv, MO_8, do_vmaddwev_s)
-TRANS(vmaddwev_w_h, LSX, gvec_vvv, MO_16, do_vmaddwev_s)
-TRANS(vmaddwev_d_w, LSX, gvec_vvv, MO_32, do_vmaddwev_s)
+TRANS(vmaddwev_h_b, LSX, gvec_vvv, 16, MO_8, do_vmaddwev_s)
+TRANS(vmaddwev_w_h, LSX, gvec_vvv, 16, MO_16, do_vmaddwev_s)
+TRANS(vmaddwev_d_w, LSX, gvec_vvv, 16, MO_32, do_vmaddwev_s)
 
 #define VMADD_Q(NAME, FN, idx1, idx2)                     \
 static bool trans_## NAME (DisasContext *ctx, arg_vvv *a) \
@@ -2447,9 +2436,9 @@ static void do_vmaddwod_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmaddwod_h_b, LSX, gvec_vvv, MO_8, do_vmaddwod_s)
-TRANS(vmaddwod_w_h, LSX, gvec_vvv, MO_16, do_vmaddwod_s)
-TRANS(vmaddwod_d_w, LSX, gvec_vvv, MO_32, do_vmaddwod_s)
+TRANS(vmaddwod_h_b, LSX, gvec_vvv, 16, MO_8, do_vmaddwod_s)
+TRANS(vmaddwod_w_h, LSX, gvec_vvv, 16, MO_16, do_vmaddwod_s)
+TRANS(vmaddwod_d_w, LSX, gvec_vvv, 16, MO_32, do_vmaddwod_s)
 
 static void gen_vmaddwev_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2517,9 +2506,9 @@ static void do_vmaddwev_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmaddwev_h_bu, LSX, gvec_vvv, MO_8, do_vmaddwev_u)
-TRANS(vmaddwev_w_hu, LSX, gvec_vvv, MO_16, do_vmaddwev_u)
-TRANS(vmaddwev_d_wu, LSX, gvec_vvv, MO_32, do_vmaddwev_u)
+TRANS(vmaddwev_h_bu, LSX, gvec_vvv, 16, MO_8, do_vmaddwev_u)
+TRANS(vmaddwev_w_hu, LSX, gvec_vvv, 16, MO_16, do_vmaddwev_u)
+TRANS(vmaddwev_d_wu, LSX, gvec_vvv, 16, MO_32, do_vmaddwev_u)
 
 static void gen_vmaddwod_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2588,9 +2577,9 @@ static void do_vmaddwod_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmaddwod_h_bu, LSX, gvec_vvv, MO_8, do_vmaddwod_u)
-TRANS(vmaddwod_w_hu, LSX, gvec_vvv, MO_16, do_vmaddwod_u)
-TRANS(vmaddwod_d_wu, LSX, gvec_vvv, MO_32, do_vmaddwod_u)
+TRANS(vmaddwod_h_bu, LSX, gvec_vvv, 16, MO_8, do_vmaddwod_u)
+TRANS(vmaddwod_w_hu, LSX, gvec_vvv, 16, MO_16, do_vmaddwod_u)
+TRANS(vmaddwod_d_wu, LSX, gvec_vvv, 16, MO_32, do_vmaddwod_u)
 
 static void gen_vmaddwev_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2661,9 +2650,9 @@ static void do_vmaddwev_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmaddwev_h_bu_b, LSX, gvec_vvv, MO_8, do_vmaddwev_u_s)
-TRANS(vmaddwev_w_hu_h, LSX, gvec_vvv, MO_16, do_vmaddwev_u_s)
-TRANS(vmaddwev_d_wu_w, LSX, gvec_vvv, MO_32, do_vmaddwev_u_s)
+TRANS(vmaddwev_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vmaddwev_u_s)
+TRANS(vmaddwev_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vmaddwev_u_s)
+TRANS(vmaddwev_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vmaddwev_u_s)
 
 static void gen_vmaddwod_u_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2733,9 +2722,9 @@ static void do_vmaddwod_u_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vmaddwod_h_bu_b, LSX, gvec_vvv, MO_8, do_vmaddwod_u_s)
-TRANS(vmaddwod_w_hu_h, LSX, gvec_vvv, MO_16, do_vmaddwod_u_s)
-TRANS(vmaddwod_d_wu_w, LSX, gvec_vvv, MO_32, do_vmaddwod_u_s)
+TRANS(vmaddwod_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vmaddwod_u_s)
+TRANS(vmaddwod_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vmaddwod_u_s)
+TRANS(vmaddwod_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vmaddwod_u_s)
 
 TRANS(vdiv_b, LSX, gen_vvv, gen_helper_vdiv_b)
 TRANS(vdiv_h, LSX, gen_vvv, gen_helper_vdiv_h)
@@ -2912,10 +2901,10 @@ static void do_vsigncov(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vsigncov_b, LSX, gvec_vvv, MO_8, do_vsigncov)
-TRANS(vsigncov_h, LSX, gvec_vvv, MO_16, do_vsigncov)
-TRANS(vsigncov_w, LSX, gvec_vvv, MO_32, do_vsigncov)
-TRANS(vsigncov_d, LSX, gvec_vvv, MO_64, do_vsigncov)
+TRANS(vsigncov_b, LSX, gvec_vvv, 16, MO_8, do_vsigncov)
+TRANS(vsigncov_h, LSX, gvec_vvv, 16, MO_16, do_vsigncov)
+TRANS(vsigncov_w, LSX, gvec_vvv, 16, MO_32, do_vsigncov)
+TRANS(vsigncov_d, LSX, gvec_vvv, 16, MO_64, do_vsigncov)
 
 TRANS(vmskltz_b, LSX, gen_vv, gen_helper_vmskltz_b)
 TRANS(vmskltz_h, LSX, gen_vv, gen_helper_vmskltz_h)
@@ -3049,7 +3038,7 @@ static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     sel = (a->imm >> 12) & 0x1;
 
@@ -3066,10 +3055,10 @@ static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
     return true;
 }
 
-TRANS(vand_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_and)
-TRANS(vor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_or)
-TRANS(vxor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_xor)
-TRANS(vnor_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_nor)
+TRANS(vand_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_and)
+TRANS(vor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_or)
+TRANS(vxor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_xor)
+TRANS(vnor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_nor)
 
 static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
 {
@@ -3079,7 +3068,7 @@ static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
@@ -3088,7 +3077,7 @@ static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
     tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, 16, ctx->vl/8);
     return true;
 }
-TRANS(vorn_v, LSX, gvec_vvv, MO_64, tcg_gen_gvec_orc)
+TRANS(vorn_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_orc)
 TRANS(vandi_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_andi)
 TRANS(vori_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_ori)
 TRANS(vxori_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_xori)
@@ -3126,37 +3115,37 @@ static void do_vnori_b(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
 
 TRANS(vnori_b, LSX, gvec_vv_i, MO_8, do_vnori_b)
 
-TRANS(vsll_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_shlv)
-TRANS(vsll_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_shlv)
-TRANS(vsll_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_shlv)
-TRANS(vsll_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_shlv)
+TRANS(vsll_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_shlv)
+TRANS(vsll_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_shlv)
+TRANS(vsll_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_shlv)
+TRANS(vsll_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_shlv)
 TRANS(vslli_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_shli)
 TRANS(vslli_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_shli)
 TRANS(vslli_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_shli)
 TRANS(vslli_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_shli)
 
-TRANS(vsrl_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_shrv)
-TRANS(vsrl_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_shrv)
-TRANS(vsrl_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_shrv)
-TRANS(vsrl_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_shrv)
+TRANS(vsrl_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_shrv)
+TRANS(vsrl_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_shrv)
+TRANS(vsrl_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_shrv)
+TRANS(vsrl_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_shrv)
 TRANS(vsrli_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_shri)
 TRANS(vsrli_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_shri)
 TRANS(vsrli_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_shri)
 TRANS(vsrli_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_shri)
 
-TRANS(vsra_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_sarv)
-TRANS(vsra_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_sarv)
-TRANS(vsra_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_sarv)
-TRANS(vsra_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_sarv)
+TRANS(vsra_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_sarv)
+TRANS(vsra_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_sarv)
+TRANS(vsra_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_sarv)
+TRANS(vsra_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_sarv)
 TRANS(vsrai_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_sari)
 TRANS(vsrai_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_sari)
 TRANS(vsrai_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_sari)
 TRANS(vsrai_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_sari)
 
-TRANS(vrotr_b, LSX, gvec_vvv, MO_8, tcg_gen_gvec_rotrv)
-TRANS(vrotr_h, LSX, gvec_vvv, MO_16, tcg_gen_gvec_rotrv)
-TRANS(vrotr_w, LSX, gvec_vvv, MO_32, tcg_gen_gvec_rotrv)
-TRANS(vrotr_d, LSX, gvec_vvv, MO_64, tcg_gen_gvec_rotrv)
+TRANS(vrotr_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_rotrv)
+TRANS(vrotr_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_rotrv)
+TRANS(vrotr_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_rotrv)
+TRANS(vrotr_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_rotrv)
 TRANS(vrotri_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_rotri)
 TRANS(vrotri_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_rotri)
 TRANS(vrotri_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_rotri)
@@ -3361,10 +3350,10 @@ static void do_vbitclr(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vbitclr_b, LSX, gvec_vvv, MO_8, do_vbitclr)
-TRANS(vbitclr_h, LSX, gvec_vvv, MO_16, do_vbitclr)
-TRANS(vbitclr_w, LSX, gvec_vvv, MO_32, do_vbitclr)
-TRANS(vbitclr_d, LSX, gvec_vvv, MO_64, do_vbitclr)
+TRANS(vbitclr_b, LSX, gvec_vvv, 16, MO_8, do_vbitclr)
+TRANS(vbitclr_h, LSX, gvec_vvv, 16, MO_16, do_vbitclr)
+TRANS(vbitclr_w, LSX, gvec_vvv, 16, MO_32, do_vbitclr)
+TRANS(vbitclr_d, LSX, gvec_vvv, 16, MO_64, do_vbitclr)
 
 static void do_vbiti(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm,
                      void (*func)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec))
@@ -3472,10 +3461,10 @@ static void do_vbitset(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vbitset_b, LSX, gvec_vvv, MO_8, do_vbitset)
-TRANS(vbitset_h, LSX, gvec_vvv, MO_16, do_vbitset)
-TRANS(vbitset_w, LSX, gvec_vvv, MO_32, do_vbitset)
-TRANS(vbitset_d, LSX, gvec_vvv, MO_64, do_vbitset)
+TRANS(vbitset_b, LSX, gvec_vvv, 16, MO_8, do_vbitset)
+TRANS(vbitset_h, LSX, gvec_vvv, 16, MO_16, do_vbitset)
+TRANS(vbitset_w, LSX, gvec_vvv, 16, MO_32, do_vbitset)
+TRANS(vbitset_d, LSX, gvec_vvv, 16, MO_64, do_vbitset)
 
 static void do_vbitseti(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                         int64_t imm, uint32_t oprsz, uint32_t maxsz)
@@ -3554,10 +3543,10 @@ static void do_vbitrev(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_3(vd_ofs, vj_ofs, vk_ofs, oprsz, maxsz, &op[vece]);
 }
 
-TRANS(vbitrev_b, LSX, gvec_vvv, MO_8, do_vbitrev)
-TRANS(vbitrev_h, LSX, gvec_vvv, MO_16, do_vbitrev)
-TRANS(vbitrev_w, LSX, gvec_vvv, MO_32, do_vbitrev)
-TRANS(vbitrev_d, LSX, gvec_vvv, MO_64, do_vbitrev)
+TRANS(vbitrev_b, LSX, gvec_vvv, 16, MO_8, do_vbitrev)
+TRANS(vbitrev_h, LSX, gvec_vvv, 16, MO_16, do_vbitrev)
+TRANS(vbitrev_w, LSX, gvec_vvv, 16, MO_32, do_vbitrev)
+TRANS(vbitrev_d, LSX, gvec_vvv, 16, MO_64, do_vbitrev)
 
 static void do_vbitrevi(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                         int64_t imm, uint32_t oprsz, uint32_t maxsz)
@@ -3706,7 +3695,7 @@ static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond)
 {
     uint32_t vd_ofs, vj_ofs, vk_ofs;
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
@@ -3752,7 +3741,7 @@ static bool do_## NAME ##_s(DisasContext *ctx, arg_vv_i *a, MemOp mop) \
 {                                                                      \
     uint32_t vd_ofs, vj_ofs;                                           \
                                                                        \
-    CHECK_SXE;                                                         \
+    CHECK_VEC;                                                         \
                                                                        \
     static const TCGOpcode vecop_list[] = {                            \
         INDEX_op_cmp_vec, 0                                            \
@@ -3801,7 +3790,7 @@ static bool do_## NAME ##_u(DisasContext *ctx, arg_vv_i *a, MemOp mop) \
 {                                                                      \
     uint32_t vd_ofs, vj_ofs;                                           \
                                                                        \
-    CHECK_SXE;                                                         \
+    CHECK_VEC;                                                         \
                                                                        \
     static const TCGOpcode vecop_list[] = {                            \
         INDEX_op_cmp_vec, 0                                            \
@@ -3899,7 +3888,7 @@ static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     fn = (a->fcond & 1 ? gen_helper_vfcmp_s_s : gen_helper_vfcmp_c_s);
     flags = get_fcmp_flags(a->fcond >> 1);
@@ -3920,7 +3909,7 @@ static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     fn = (a->fcond & 1 ? gen_helper_vfcmp_s_d : gen_helper_vfcmp_c_d);
     flags = get_fcmp_flags(a->fcond >> 1);
@@ -3935,7 +3924,7 @@ static bool trans_vbitsel_v(DisasContext *ctx, arg_vvvv *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     tcg_gen_gvec_bitsel(MO_64, vec_full_offset(a->vd), vec_full_offset(a->va),
                         vec_full_offset(a->vk), vec_full_offset(a->vj),
@@ -3961,7 +3950,7 @@ static bool trans_vbitseli_b(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     tcg_gen_gvec_2i(vec_full_offset(a->vd), vec_full_offset(a->vj),
                     16, ctx->vl/8, a->imm, &op);
@@ -3984,7 +3973,7 @@ static bool trans_## NAME (DisasContext *ctx, arg_cv *a)                       \
         return false;                                                          \
     }                                                                          \
                                                                                \
-    CHECK_SXE;                                                                 \
+    CHECK_VEC;                                                                 \
     tcg_gen_or_i64(t1, al, ah);                                                \
     tcg_gen_setcondi_i64(COND, t1, t1, 0);                                     \
     tcg_gen_st8_tl(t1, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 0x7])); \
@@ -4012,7 +4001,7 @@ static bool trans_vinsgr2vr_b(DisasContext *ctx, arg_vr_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_st8_i64(src, cpu_env,
                     offsetof(CPULoongArchState, fpr[a->vd].vreg.B(a->imm)));
     return true;
@@ -4026,7 +4015,7 @@ static bool trans_vinsgr2vr_h(DisasContext *ctx, arg_vr_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_st16_i64(src, cpu_env,
                     offsetof(CPULoongArchState, fpr[a->vd].vreg.H(a->imm)));
     return true;
@@ -4040,7 +4029,7 @@ static bool trans_vinsgr2vr_w(DisasContext *ctx, arg_vr_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_st32_i64(src, cpu_env,
                      offsetof(CPULoongArchState, fpr[a->vd].vreg.W(a->imm)));
     return true;
@@ -4054,7 +4043,7 @@ static bool trans_vinsgr2vr_d(DisasContext *ctx, arg_vr_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_st_i64(src, cpu_env,
                    offsetof(CPULoongArchState, fpr[a->vd].vreg.D(a->imm)));
     return true;
@@ -4068,7 +4057,7 @@ static bool trans_vpickve2gr_b(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld8s_i64(dst, cpu_env,
                      offsetof(CPULoongArchState, fpr[a->vj].vreg.B(a->imm)));
     return true;
@@ -4082,7 +4071,7 @@ static bool trans_vpickve2gr_h(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld16s_i64(dst, cpu_env,
                       offsetof(CPULoongArchState, fpr[a->vj].vreg.H(a->imm)));
     return true;
@@ -4096,7 +4085,7 @@ static bool trans_vpickve2gr_w(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld32s_i64(dst, cpu_env,
                       offsetof(CPULoongArchState, fpr[a->vj].vreg.W(a->imm)));
     return true;
@@ -4110,7 +4099,7 @@ static bool trans_vpickve2gr_d(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld_i64(dst, cpu_env,
                    offsetof(CPULoongArchState, fpr[a->vj].vreg.D(a->imm)));
     return true;
@@ -4124,7 +4113,7 @@ static bool trans_vpickve2gr_bu(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld8u_i64(dst, cpu_env,
                      offsetof(CPULoongArchState, fpr[a->vj].vreg.B(a->imm)));
     return true;
@@ -4138,7 +4127,7 @@ static bool trans_vpickve2gr_hu(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld16u_i64(dst, cpu_env,
                       offsetof(CPULoongArchState, fpr[a->vj].vreg.H(a->imm)));
     return true;
@@ -4152,7 +4141,7 @@ static bool trans_vpickve2gr_wu(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld32u_i64(dst, cpu_env,
                       offsetof(CPULoongArchState, fpr[a->vj].vreg.W(a->imm)));
     return true;
@@ -4166,7 +4155,7 @@ static bool trans_vpickve2gr_du(DisasContext *ctx, arg_rv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_ld_i64(dst, cpu_env,
                    offsetof(CPULoongArchState, fpr[a->vj].vreg.D(a->imm)));
     return true;
@@ -4180,7 +4169,7 @@ static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     tcg_gen_gvec_dup_i64(mop, vec_full_offset(a->vd),
                          16, ctx->vl/8, src);
@@ -4198,7 +4187,7 @@ static bool trans_vreplvei_b(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_gvec_dup_mem(MO_8,vec_full_offset(a->vd),
                          offsetof(CPULoongArchState,
                                   fpr[a->vj].vreg.B((a->imm))),
@@ -4212,7 +4201,7 @@ static bool trans_vreplvei_h(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_gvec_dup_mem(MO_16, vec_full_offset(a->vd),
                          offsetof(CPULoongArchState,
                                   fpr[a->vj].vreg.H((a->imm))),
@@ -4225,7 +4214,7 @@ static bool trans_vreplvei_w(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_gvec_dup_mem(MO_32, vec_full_offset(a->vd),
                          offsetof(CPULoongArchState,
                                   fpr[a->vj].vreg.W((a->imm))),
@@ -4238,7 +4227,7 @@ static bool trans_vreplvei_d(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
     tcg_gen_gvec_dup_mem(MO_64, vec_full_offset(a->vd),
                          offsetof(CPULoongArchState,
                                   fpr[a->vj].vreg.D((a->imm))),
@@ -4257,7 +4246,7 @@ static bool gen_vreplve(DisasContext *ctx, arg_vvr *a, int vece, int bit,
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), (LSX_LEN/bit) -1);
     tcg_gen_shli_i64(t0, t0, vece);
@@ -4287,7 +4276,7 @@ static bool trans_vbsll_v(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     desthigh = tcg_temp_new_i64();
     destlow = tcg_temp_new_i64();
@@ -4321,7 +4310,7 @@ static bool trans_vbsrl_v(DisasContext *ctx, arg_vv_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     desthigh = tcg_temp_new_i64();
     destlow = tcg_temp_new_i64();
@@ -4399,7 +4388,7 @@ static bool trans_vld(DisasContext *ctx, arg_vr_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     addr = gpr_src(ctx, a->rj, EXT_NONE);
     val = tcg_temp_new_i128();
@@ -4426,7 +4415,7 @@ static bool trans_vst(DisasContext *ctx, arg_vr_i *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     addr = gpr_src(ctx, a->rj, EXT_NONE);
     val = tcg_temp_new_i128();
@@ -4453,7 +4442,7 @@ static bool trans_vldx(DisasContext *ctx, arg_vrr *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     src1 = gpr_src(ctx, a->rj, EXT_NONE);
     src2 = gpr_src(ctx, a->rk, EXT_NONE);
@@ -4480,7 +4469,7 @@ static bool trans_vstx(DisasContext *ctx, arg_vrr *a)
         return false;
     }
 
-    CHECK_SXE;
+    CHECK_VEC;
 
     src1 = gpr_src(ctx, a->rj, EXT_NONE);
     src2 = gpr_src(ctx, a->rk, EXT_NONE);
@@ -4507,7 +4496,7 @@ static bool trans_## NAME (DisasContext *ctx, arg_vr_i *a)                \
         return false;                                                     \
     }                                                                     \
                                                                           \
-    CHECK_SXE;                                                            \
+    CHECK_VEC;                                                            \
                                                                           \
     addr = gpr_src(ctx, a->rj, EXT_NONE);                                 \
     val = tcg_temp_new_i64();                                             \
@@ -4535,7 +4524,7 @@ static bool trans_## NAME (DisasContext *ctx, arg_vr_ii *a)                  \
         return false;                                                        \
     }                                                                        \
                                                                              \
-    CHECK_SXE;                                                               \
+    CHECK_VEC;                                                               \
                                                                              \
     addr = gpr_src(ctx, a->rj, EXT_NONE);                                    \
     val = tcg_temp_new_i64();                                                \
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 05/48] target/loongarch: Implement xvadd/xvsub
  2023-08-30  8:48 ` [PATCH v4 05/48] target/loongarch: Implement xvadd/xvsub Song Gao
@ 2023-08-30 15:38   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 15:38 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> +#ifndef CONFIG_USER_ONLY
> + #define CHECK_VEC do { \
> +     if ((ctx->vl == LSX_LEN) && \
> +         (ctx->base.tb->flags & HW_FLAGS_EUEN_SXE) == 0) { \
> +         generate_exception(ctx, EXCCODE_SXD); \
> +         return true; \
> +     } \
> +     if ((ctx->vl == LASX_LEN) && \
> +         (ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0) { \
> +         generate_exception(ctx, EXCCODE_ASXD); \
> +         return true; \
> +     } \
> + } while (0)
> +#else
> + #define CHECK_VEC
> +#endif /*!CONFIG_USER_ONLY */

I think this is wrong.  The check would seem to be determined by the instruction (oprsz) 
rather than a fixed configuration of the cpu (vl).

You're also replacing

> -#ifndef CONFIG_USER_ONLY
> -#define CHECK_ASXE do { \
> -    if ((ctx->base.tb->flags & HW_FLAGS_EUEN_ASXE) == 0) { \
> -        generate_exception(ctx, EXCCODE_ASXD); \
> -        return true; \
> -    } \
> -} while (0)
> -#else
> -#define CHECK_ASXE
> -#endif

this, the correct test, which you just added in patch 3.


> +TRANS(xvadd_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_add)
> +TRANS(xvadd_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_add)
> +TRANS(xvadd_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_add)
> +TRANS(xvadd_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_add)

The size of the changes required to add oprsz to gen_vvv would seem to be an poor choice. 
If you do go that way, all of the LSX changes would need to be a separate patch.

Perhaps better as

static bool gvec_vvv_vl(DisasContext *ctx, arg_vvv *a, uint32_t oprsz, MemOp mop,
                         void (*func)(unsigned, uint32_t, uint32_t,
                                      uint32_t, uint32_t, uint32_t))
{
     uint32_t vd_ofs = vec_full_offset(a->vd);
     uint32_t vj_ofs = vec_full_offset(a->vj);
     uint32_t vk_ofs = vec_full_offset(a->vk);

     func(mop, vd_ofs, vj_ofs, vk_ofs, oprsz, ctx->vl / 8);
     return true;
}

static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
{
     CHECK_SXE;
     return gvec_vvv_vl(ctx, a, 16, mop, func);
}

static bool gvec_xxx(DisasContext *ctx, arg_vvv *a, MemOp mop,
                      void (*func)(unsigned, uint32_t, uint32_t,
                                   uint32_t, uint32_t, uint32_t))
{
     CHECK_ASXE;
     return gvec_vvv_vl(ctx, a, 32, mop, func);
}

so that you don't have to replicate "16" or "32" across each instruction.


> +#define XVADDSUB_Q(NAME)                                         \
> +static bool trans_xv## NAME ##_q(DisasContext *ctx, arg_vvv * a) \
> +{                                                                \
> +    TCGv_i64 rh, rl, ah, al, bh, bl;                             \
> +    int i;                                                       \
> +                                                                 \
> +    if (!avail_LASX(ctx)) {                                      \
> +        return false;                                            \
> +    }                                                            \
> +                                                                 \
> +    CHECK_VEC;                                                   \
> +                                                                 \
> +    rh = tcg_temp_new_i64();                                     \
> +    rl = tcg_temp_new_i64();                                     \
> +    ah = tcg_temp_new_i64();                                     \
> +    al = tcg_temp_new_i64();                                     \
> +    bh = tcg_temp_new_i64();                                     \
> +    bl = tcg_temp_new_i64();                                     \
> +                                                                 \
> +    for (i = 0; i < 2; i++) {                                    \
> +        get_vreg64(ah, a->vj, 1 + i * 2);                        \
> +        get_vreg64(al, a->vj, 0 + i * 2);                        \
> +        get_vreg64(bh, a->vk, 1 + i * 2);                        \
> +        get_vreg64(bl, a->vk, 0 + i * 2);                        \
> +                                                                 \
> +        tcg_gen_## NAME ##2_i64(rl, rh, al, ah, bl, bh);         \
> +                                                                 \
> +        set_vreg64(rh, a->vd, 1 + i * 2);                        \
> +        set_vreg64(rl, a->vd, 0 + i * 2);                        \
> +   }                                                             \
> +                                                                 \
> +    return true;                                                 \
> +}

This should be a function, not a macro, passing in tcg_gen_{add,sub}2_i64.

> +
> +XVADDSUB_Q(add)
> +XVADDSUB_Q(sub)

Which lets these be normal TRANS expansions.



r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (4 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 05/48] target/loongarch: Implement xvadd/xvsub Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 16:09   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 07/48] target/loongarch: Implement xvaddi/xvsubi Song Gao
                   ` (41 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVREPLGR2VR.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/insns.decode                |  5 +++++
 target/loongarch/disas.c                     | 10 ++++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc |  5 +++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 12 ++++++------
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index bcc18fb6c5..04bd238995 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1310,3 +1310,8 @@ xvsub_h          0111 01000000 11001 ..... ..... .....    @vvv
 xvsub_w          0111 01000000 11010 ..... ..... .....    @vvv
 xvsub_d          0111 01000000 11011 ..... ..... .....    @vvv
 xvsub_q          0111 01010010 11011 ..... ..... .....    @vvv
+
+xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
+xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
+xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
+xvreplgr2vr_d    0111 01101001 11110 00011 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d8b62ba532..c47f455ed0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
     output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
 }
 
+static void output_vr_x(DisasContext *ctx, arg_vr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
+}
+
 INSN_LASX(xvadd_b,           vvv)
 INSN_LASX(xvadd_h,           vvv)
 INSN_LASX(xvadd_w,           vvv)
@@ -1718,3 +1723,8 @@ INSN_LASX(xvsub_h,           vvv)
 INSN_LASX(xvsub_w,           vvv)
 INSN_LASX(xvsub_d,           vvv)
 INSN_LASX(xvsub_q,           vvv)
+
+INSN_LASX(xvreplgr2vr_b,     vr)
+INSN_LASX(xvreplgr2vr_h,     vr)
+INSN_LASX(xvreplgr2vr_w,     vr)
+INSN_LASX(xvreplgr2vr_d,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 218b8dc648..66b5abc790 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -50,3 +50,8 @@ TRANS(xvsub_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_sub)
 TRANS(xvsub_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sub)
 TRANS(xvsub_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sub)
 TRANS(xvsub_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sub)
+
+TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
+TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
+TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
+TRANS(xvreplgr2vr_d, LASX, gvec_dup, 32, MO_64)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 0e12213e8b..c0e7a9a372 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -4161,7 +4161,7 @@ static bool trans_vpickve2gr_du(DisasContext *ctx, arg_rv_i *a)
     return true;
 }
 
-static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
+static bool gvec_dup(DisasContext *ctx, arg_vr *a, uint32_t oprsz, MemOp mop)
 {
     TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
 
@@ -4172,14 +4172,14 @@ static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
     CHECK_VEC;
 
     tcg_gen_gvec_dup_i64(mop, vec_full_offset(a->vd),
-                         16, ctx->vl/8, src);
+                         oprsz, ctx->vl / 8, src);
     return true;
 }
 
-TRANS(vreplgr2vr_b, LSX, gvec_dup, MO_8)
-TRANS(vreplgr2vr_h, LSX, gvec_dup, MO_16)
-TRANS(vreplgr2vr_w, LSX, gvec_dup, MO_32)
-TRANS(vreplgr2vr_d, LSX, gvec_dup, MO_64)
+TRANS(vreplgr2vr_b, LSX, gvec_dup, 16, MO_8)
+TRANS(vreplgr2vr_h, LSX, gvec_dup, 16, MO_16)
+TRANS(vreplgr2vr_w, LSX, gvec_dup, 16, MO_32)
+TRANS(vreplgr2vr_d, LSX, gvec_dup, 16, MO_64)
 
 static bool trans_vreplvei_b(DisasContext *ctx, arg_vv_i *a)
 {
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr
  2023-08-30  8:48 ` [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr Song Gao
@ 2023-08-30 16:09   ` Richard Henderson
  2023-08-31  7:17     ` gaosong
  0 siblings, 1 reply; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 16:09 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVREPLGR2VR.{B/H/W/D}.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/loongarch/insns.decode                |  5 +++++
>   target/loongarch/disas.c                     | 10 ++++++++++
>   target/loongarch/insn_trans/trans_lasx.c.inc |  5 +++++
>   target/loongarch/insn_trans/trans_lsx.c.inc  | 12 ++++++------
>   4 files changed, 26 insertions(+), 6 deletions(-)
> 
> diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
> index bcc18fb6c5..04bd238995 100644
> --- a/target/loongarch/insns.decode
> +++ b/target/loongarch/insns.decode
> @@ -1310,3 +1310,8 @@ xvsub_h          0111 01000000 11001 ..... ..... .....    @vvv
>   xvsub_w          0111 01000000 11010 ..... ..... .....    @vvv
>   xvsub_d          0111 01000000 11011 ..... ..... .....    @vvv
>   xvsub_q          0111 01010010 11011 ..... ..... .....    @vvv
> +
> +xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
> +xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
> +xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
> +xvreplgr2vr_d    0111 01101001 11110 00011 ..... .....    @vr
> diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
> index d8b62ba532..c47f455ed0 100644
> --- a/target/loongarch/disas.c
> +++ b/target/loongarch/disas.c
> @@ -1708,6 +1708,11 @@ static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
>       output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
>   }
>   
> +static void output_vr_x(DisasContext *ctx, arg_vr *a, const char *mnemonic)
> +{
> +    output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
> +}
> +
>   INSN_LASX(xvadd_b,           vvv)
>   INSN_LASX(xvadd_h,           vvv)
>   INSN_LASX(xvadd_w,           vvv)
> @@ -1718,3 +1723,8 @@ INSN_LASX(xvsub_h,           vvv)
>   INSN_LASX(xvsub_w,           vvv)
>   INSN_LASX(xvsub_d,           vvv)
>   INSN_LASX(xvsub_q,           vvv)
> +
> +INSN_LASX(xvreplgr2vr_b,     vr)
> +INSN_LASX(xvreplgr2vr_h,     vr)
> +INSN_LASX(xvreplgr2vr_w,     vr)
> +INSN_LASX(xvreplgr2vr_d,     vr)
> diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
> index 218b8dc648..66b5abc790 100644
> --- a/target/loongarch/insn_trans/trans_lasx.c.inc
> +++ b/target/loongarch/insn_trans/trans_lasx.c.inc
> @@ -50,3 +50,8 @@ TRANS(xvsub_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_sub)
>   TRANS(xvsub_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sub)
>   TRANS(xvsub_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sub)
>   TRANS(xvsub_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sub)
> +
> +TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
> +TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
> +TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
> +TRANS(xvreplgr2vr_d, LASX, gvec_dup, 32, MO_64)
> diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
> index 0e12213e8b..c0e7a9a372 100644
> --- a/target/loongarch/insn_trans/trans_lsx.c.inc
> +++ b/target/loongarch/insn_trans/trans_lsx.c.inc
> @@ -4161,7 +4161,7 @@ static bool trans_vpickve2gr_du(DisasContext *ctx, arg_rv_i *a)
>       return true;
>   }
>   
> -static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
> +static bool gvec_dup(DisasContext *ctx, arg_vr *a, uint32_t oprsz, MemOp mop)
>   {
>       TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
>   
> @@ -4172,14 +4172,14 @@ static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
>       CHECK_VEC;
>   
>       tcg_gen_gvec_dup_i64(mop, vec_full_offset(a->vd),
> -                         16, ctx->vl/8, src);
> +                         oprsz, ctx->vl / 8, src);
>       return true;
>   }
>   
> -TRANS(vreplgr2vr_b, LSX, gvec_dup, MO_8)
> -TRANS(vreplgr2vr_h, LSX, gvec_dup, MO_16)
> -TRANS(vreplgr2vr_w, LSX, gvec_dup, MO_32)
> -TRANS(vreplgr2vr_d, LSX, gvec_dup, MO_64)
> +TRANS(vreplgr2vr_b, LSX, gvec_dup, 16, MO_8)
> +TRANS(vreplgr2vr_h, LSX, gvec_dup, 16, MO_16)
> +TRANS(vreplgr2vr_w, LSX, gvec_dup, 16, MO_32)
> +TRANS(vreplgr2vr_d, LSX, gvec_dup, 16, MO_64)

Hmm.

Ok, so revising the advice I gave versus the previous patch, I can see how having a common 
CHECK_VEC is helpful.  But it still needs to use oprsz not vl for the size check.

It would be better to replace with a function, like

     if (!check_vec(ctx, oprsz)) {
         return true;
     }

rather than a macro with a hidden return.  The replacement should be done in a patch by 
itself, probably using check_vec(ctx, 16) for all of the existing LSX code until, step by 
step, oprsz is plumbed into all of the places required.

I still think having separate minimal gen_vvv and gen_xxx helpers will help reduce the 
possibility of typos, when there are a lot of instructions within an instruction format. 
But when there are just 8, like here, just adding oprsz certainly looks simpler.

I wonder if it is really clearer having the LASX instructions in a separate file?  Perhaps 
it be better to keep all of the similar patterns together, e.g.

static bool gvec_dup(...)
{
...
}

TRANS(vreplgr2vr_b, LSX, gvec_dup, 16, MO_8)
TRANS(vreplgr2vr_h, LSX, gvec_dup, 16, MO_16)
TRANS(vreplgr2vr_w, LSX, gvec_dup, 16, MO_32)
TRANS(vreplgr2vr_d, LSX, gvec_dup, 16, MO_64)

TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
TRANS(xvreplgr2vr_d, LASX, gvec_dup, 32, MO_64)


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr
  2023-08-30 16:09   ` Richard Henderson
@ 2023-08-31  7:17     ` gaosong
  0 siblings, 0 replies; 86+ messages in thread
From: gaosong @ 2023-08-31  7:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

在 2023/8/31 上午12:09, Richard Henderson 写道:
> On 8/30/23 01:48, Song Gao wrote:
>> This patch includes:
>> - XVREPLGR2VR.{B/H/W/D}.
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/loongarch/insns.decode                |  5 +++++
>>   target/loongarch/disas.c                     | 10 ++++++++++
>>   target/loongarch/insn_trans/trans_lasx.c.inc |  5 +++++
>>   target/loongarch/insn_trans/trans_lsx.c.inc  | 12 ++++++------
>>   4 files changed, 26 insertions(+), 6 deletions(-)
>>
>> diff --git a/target/loongarch/insns.decode 
>> b/target/loongarch/insns.decode
>> index bcc18fb6c5..04bd238995 100644
>> --- a/target/loongarch/insns.decode
>> +++ b/target/loongarch/insns.decode
>> @@ -1310,3 +1310,8 @@ xvsub_h          0111 01000000 11001 ..... ..... 
>> .....    @vvv
>>   xvsub_w          0111 01000000 11010 ..... ..... .....    @vvv
>>   xvsub_d          0111 01000000 11011 ..... ..... .....    @vvv
>>   xvsub_q          0111 01010010 11011 ..... ..... .....    @vvv
>> +
>> +xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
>> +xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
>> +xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
>> +xvreplgr2vr_d    0111 01101001 11110 00011 ..... .....    @vr
>> diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
>> index d8b62ba532..c47f455ed0 100644
>> --- a/target/loongarch/disas.c
>> +++ b/target/loongarch/disas.c
>> @@ -1708,6 +1708,11 @@ static void output_vvv_x(DisasContext *ctx, 
>> arg_vvv * a, const char *mnemonic)
>>       output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
>>   }
>> +static void output_vr_x(DisasContext *ctx, arg_vr *a, const char 
>> *mnemonic)
>> +{
>> +    output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
>> +}
>> +
>>   INSN_LASX(xvadd_b,           vvv)
>>   INSN_LASX(xvadd_h,           vvv)
>>   INSN_LASX(xvadd_w,           vvv)
>> @@ -1718,3 +1723,8 @@ INSN_LASX(xvsub_h,           vvv)
>>   INSN_LASX(xvsub_w,           vvv)
>>   INSN_LASX(xvsub_d,           vvv)
>>   INSN_LASX(xvsub_q,           vvv)
>> +
>> +INSN_LASX(xvreplgr2vr_b,     vr)
>> +INSN_LASX(xvreplgr2vr_h,     vr)
>> +INSN_LASX(xvreplgr2vr_w,     vr)
>> +INSN_LASX(xvreplgr2vr_d,     vr)
>> diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc 
>> b/target/loongarch/insn_trans/trans_lasx.c.inc
>> index 218b8dc648..66b5abc790 100644
>> --- a/target/loongarch/insn_trans/trans_lasx.c.inc
>> +++ b/target/loongarch/insn_trans/trans_lasx.c.inc
>> @@ -50,3 +50,8 @@ TRANS(xvsub_b, LASX, gvec_vvv, 32, MO_8, 
>> tcg_gen_gvec_sub)
>>   TRANS(xvsub_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sub)
>>   TRANS(xvsub_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sub)
>>   TRANS(xvsub_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sub)
>> +
>> +TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
>> +TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
>> +TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
>> +TRANS(xvreplgr2vr_d, LASX, gvec_dup, 32, MO_64)
>> diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc 
>> b/target/loongarch/insn_trans/trans_lsx.c.inc
>> index 0e12213e8b..c0e7a9a372 100644
>> --- a/target/loongarch/insn_trans/trans_lsx.c.inc
>> +++ b/target/loongarch/insn_trans/trans_lsx.c.inc
>> @@ -4161,7 +4161,7 @@ static bool trans_vpickve2gr_du(DisasContext 
>> *ctx, arg_rv_i *a)
>>       return true;
>>   }
>> -static bool gvec_dup(DisasContext *ctx, arg_vr *a, MemOp mop)
>> +static bool gvec_dup(DisasContext *ctx, arg_vr *a, uint32_t oprsz, 
>> MemOp mop)
>>   {
>>       TCGv src = gpr_src(ctx, a->rj, EXT_NONE);
>> @@ -4172,14 +4172,14 @@ static bool gvec_dup(DisasContext *ctx, arg_vr 
>> *a, MemOp mop)
>>       CHECK_VEC;
>>       tcg_gen_gvec_dup_i64(mop, vec_full_offset(a->vd),
>> -                         16, ctx->vl/8, src);
>> +                         oprsz, ctx->vl / 8, src);
>>       return true;
>>   }
>> -TRANS(vreplgr2vr_b, LSX, gvec_dup, MO_8)
>> -TRANS(vreplgr2vr_h, LSX, gvec_dup, MO_16)
>> -TRANS(vreplgr2vr_w, LSX, gvec_dup, MO_32)
>> -TRANS(vreplgr2vr_d, LSX, gvec_dup, MO_64)
>> +TRANS(vreplgr2vr_b, LSX, gvec_dup, 16, MO_8)
>> +TRANS(vreplgr2vr_h, LSX, gvec_dup, 16, MO_16)
>> +TRANS(vreplgr2vr_w, LSX, gvec_dup, 16, MO_32)
>> +TRANS(vreplgr2vr_d, LSX, gvec_dup, 16, MO_64)
> 
> Hmm.
> 
> Ok, so revising the advice I gave versus the previous patch, I can see 
> how having a common CHECK_VEC is helpful.  But it still needs to use 
> oprsz not vl for the size check.
> 
> It would be better to replace with a function, like
> 
>      if (!check_vec(ctx, oprsz)) {
>          return true;
>      }
> 
> rather than a macro with a hidden return.  The replacement should be 
> done in a patch by itself, probably using check_vec(ctx, 16) for all of 
> the existing LSX code until, step by step, oprsz is plumbed into all of 
> the places required.
> 
> I still think having separate minimal gen_vvv and gen_xxx helpers will 
> help reduce the possibility of typos, when there are a lot of 
> instructions within an instruction format. But when there are just 8, 
> like here, just adding oprsz certainly looks simpler.
>
Thanks for you suggestions. I will correct them on v5.

> I wonder if it is really clearer having the LASX instructions in a 
> separate file?  Perhaps it be better to keep all of the similar patterns 
> together, e.g.
> 
OK.

I can merge LSX and LASX in a new file(trans_vec.c.inc or ..).
It seems need more time do this. I will send V5 a few days later.

Thanks.
Song Gao

> static bool gvec_dup(...)
> {
> ...
> }
> 
> TRANS(vreplgr2vr_b, LSX, gvec_dup, 16, MO_8)
> TRANS(vreplgr2vr_h, LSX, gvec_dup, 16, MO_16)
> TRANS(vreplgr2vr_w, LSX, gvec_dup, 16, MO_32)
> TRANS(vreplgr2vr_d, LSX, gvec_dup, 16, MO_64)
> 
> TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
> TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
> TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
> TRANS(xvreplgr2vr_d, LASX, gvec_dup, 32, MO_64)
> 
> 
> r~



^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 07/48] target/loongarch: Implement xvaddi/xvsubi
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (5 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 08/48] target/loongarch: Implement xvneg Song Gao
                   ` (40 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVADDI.{B/H/W/D}U;
- XVSUBI.{B/H/W/D}U.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/insns.decode                |   9 ++
 target/loongarch/disas.c                     |  14 ++
 target/loongarch/insn_trans/trans_lasx.c.inc |   9 ++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 136 +++++++++----------
 4 files changed, 100 insertions(+), 68 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 04bd238995..c48dca70b8 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1311,6 +1311,15 @@ xvsub_w          0111 01000000 11010 ..... ..... .....    @vvv
 xvsub_d          0111 01000000 11011 ..... ..... .....    @vvv
 xvsub_q          0111 01010010 11011 ..... ..... .....    @vvv
 
+xvaddi_bu        0111 01101000 10100 ..... ..... .....    @vv_ui5
+xvaddi_hu        0111 01101000 10101 ..... ..... .....    @vv_ui5
+xvaddi_wu        0111 01101000 10110 ..... ..... .....    @vv_ui5
+xvaddi_du        0111 01101000 10111 ..... ..... .....    @vv_ui5
+xvsubi_bu        0111 01101000 11000 ..... ..... .....    @vv_ui5
+xvsubi_hu        0111 01101000 11001 ..... ..... .....    @vv_ui5
+xvsubi_wu        0111 01101000 11010 ..... ..... .....    @vv_ui5
+xvsubi_du        0111 01101000 11011 ..... ..... .....    @vv_ui5
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index c47f455ed0..f59e3cebf0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
     output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
 }
 
+static void output_vv_i_x(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, x%d, 0x%x", a->vd, a->vj, a->imm);
+}
+
 static void output_vr_x(DisasContext *ctx, arg_vr *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
@@ -1724,6 +1729,15 @@ INSN_LASX(xvsub_w,           vvv)
 INSN_LASX(xvsub_d,           vvv)
 INSN_LASX(xvsub_q,           vvv)
 
+INSN_LASX(xvaddi_bu,         vv_i)
+INSN_LASX(xvaddi_hu,         vv_i)
+INSN_LASX(xvaddi_wu,         vv_i)
+INSN_LASX(xvaddi_du,         vv_i)
+INSN_LASX(xvsubi_bu,         vv_i)
+INSN_LASX(xvsubi_hu,         vv_i)
+INSN_LASX(xvsubi_wu,         vv_i)
+INSN_LASX(xvsubi_du,         vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 66b5abc790..0e8a711fde 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -51,6 +51,15 @@ TRANS(xvsub_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sub)
 TRANS(xvsub_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sub)
 TRANS(xvsub_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sub)
 
+TRANS(xvaddi_bu, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_addi)
+TRANS(xvaddi_hu, LASX, gvec_vv_i, 32, MO_16, tcg_gen_gvec_addi)
+TRANS(xvaddi_wu, LASX, gvec_vv_i, 32, MO_32, tcg_gen_gvec_addi)
+TRANS(xvaddi_du, LASX, gvec_vv_i, 32, MO_64, tcg_gen_gvec_addi)
+TRANS(xvsubi_bu, LASX, gvec_subi, 32, MO_8)
+TRANS(xvsubi_hu, LASX, gvec_subi, 32, MO_16)
+TRANS(xvsubi_wu, LASX, gvec_subi, 32, MO_32)
+TRANS(xvsubi_du, LASX, gvec_subi, 32, MO_64)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index c0e7a9a372..00f134a0b1 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -96,7 +96,7 @@ static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp mop,
     return true;
 }
 
-static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
+static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, uint32_t oprsz, MemOp mop,
                       void (*func)(unsigned, uint32_t, uint32_t,
                                    int64_t, uint32_t, uint32_t))
 {
@@ -107,11 +107,11 @@ static bool gvec_vv_i(DisasContext *ctx, arg_vv_i *a, MemOp mop,
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
 
-    func(mop, vd_ofs, vj_ofs, a->imm , 16, ctx->vl/8);
+    func(mop, vd_ofs, vj_ofs, a->imm, oprsz, ctx->vl / 8);
     return true;
 }
 
-static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
+static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, uint32_t oprsz, MemOp mop)
 {
     uint32_t vd_ofs, vj_ofs;
 
@@ -120,7 +120,7 @@ static bool gvec_subi(DisasContext *ctx, arg_vv_i *a, MemOp mop)
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
 
-    tcg_gen_gvec_addi(mop, vd_ofs, vj_ofs, -a->imm, 16, ctx->vl/8);
+    tcg_gen_gvec_addi(mop, vd_ofs, vj_ofs, -a->imm, oprsz, ctx->vl / 8);
     return true;
 }
 
@@ -168,14 +168,14 @@ TRANS(vsub_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_sub)
 TRANS(vsub_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_sub)
 TRANS(vsub_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_sub)
 
-TRANS(vaddi_bu, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_addi)
-TRANS(vaddi_hu, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_addi)
-TRANS(vaddi_wu, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_addi)
-TRANS(vaddi_du, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_addi)
-TRANS(vsubi_bu, LSX, gvec_subi, MO_8)
-TRANS(vsubi_hu, LSX, gvec_subi, MO_16)
-TRANS(vsubi_wu, LSX, gvec_subi, MO_32)
-TRANS(vsubi_du, LSX, gvec_subi, MO_64)
+TRANS(vaddi_bu, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_addi)
+TRANS(vaddi_hu, LSX, gvec_vv_i, 16, MO_16, tcg_gen_gvec_addi)
+TRANS(vaddi_wu, LSX, gvec_vv_i, 16, MO_32, tcg_gen_gvec_addi)
+TRANS(vaddi_du, LSX, gvec_vv_i, 16, MO_64, tcg_gen_gvec_addi)
+TRANS(vsubi_bu, LSX, gvec_subi, 16, MO_8)
+TRANS(vsubi_hu, LSX, gvec_subi, 16, MO_16)
+TRANS(vsubi_wu, LSX, gvec_subi, 16, MO_32)
+TRANS(vsubi_du, LSX, gvec_subi, 16, MO_64)
 
 TRANS(vneg_b, LSX, gvec_vv, MO_8, tcg_gen_gvec_neg)
 TRANS(vneg_h, LSX, gvec_vv, MO_16, tcg_gen_gvec_neg)
@@ -1466,14 +1466,14 @@ static void do_vmini_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
 }
 
-TRANS(vmini_b, LSX, gvec_vv_i, MO_8, do_vmini_s)
-TRANS(vmini_h, LSX, gvec_vv_i, MO_16, do_vmini_s)
-TRANS(vmini_w, LSX, gvec_vv_i, MO_32, do_vmini_s)
-TRANS(vmini_d, LSX, gvec_vv_i, MO_64, do_vmini_s)
-TRANS(vmini_bu, LSX, gvec_vv_i, MO_8, do_vmini_u)
-TRANS(vmini_hu, LSX, gvec_vv_i, MO_16, do_vmini_u)
-TRANS(vmini_wu, LSX, gvec_vv_i, MO_32, do_vmini_u)
-TRANS(vmini_du, LSX, gvec_vv_i, MO_64, do_vmini_u)
+TRANS(vmini_b, LSX, gvec_vv_i, 16, MO_8, do_vmini_s)
+TRANS(vmini_h, LSX, gvec_vv_i, 16, MO_16, do_vmini_s)
+TRANS(vmini_w, LSX, gvec_vv_i, 16, MO_32, do_vmini_s)
+TRANS(vmini_d, LSX, gvec_vv_i, 16, MO_64, do_vmini_s)
+TRANS(vmini_bu, LSX, gvec_vv_i, 16, MO_8, do_vmini_u)
+TRANS(vmini_hu, LSX, gvec_vv_i, 16, MO_16, do_vmini_u)
+TRANS(vmini_wu, LSX, gvec_vv_i, 16, MO_32, do_vmini_u)
+TRANS(vmini_du, LSX, gvec_vv_i, 16, MO_64, do_vmini_u)
 
 static void do_vmaxi_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                        int64_t imm, uint32_t oprsz, uint32_t maxsz)
@@ -1547,14 +1547,14 @@ static void do_vmaxi_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
 }
 
-TRANS(vmaxi_b, LSX, gvec_vv_i, MO_8, do_vmaxi_s)
-TRANS(vmaxi_h, LSX, gvec_vv_i, MO_16, do_vmaxi_s)
-TRANS(vmaxi_w, LSX, gvec_vv_i, MO_32, do_vmaxi_s)
-TRANS(vmaxi_d, LSX, gvec_vv_i, MO_64, do_vmaxi_s)
-TRANS(vmaxi_bu, LSX, gvec_vv_i, MO_8, do_vmaxi_u)
-TRANS(vmaxi_hu, LSX, gvec_vv_i, MO_16, do_vmaxi_u)
-TRANS(vmaxi_wu, LSX, gvec_vv_i, MO_32, do_vmaxi_u)
-TRANS(vmaxi_du, LSX, gvec_vv_i, MO_64, do_vmaxi_u)
+TRANS(vmaxi_b, LSX, gvec_vv_i, 16, MO_8, do_vmaxi_s)
+TRANS(vmaxi_h, LSX, gvec_vv_i, 16, MO_16, do_vmaxi_s)
+TRANS(vmaxi_w, LSX, gvec_vv_i, 16, MO_32, do_vmaxi_s)
+TRANS(vmaxi_d, LSX, gvec_vv_i, 16, MO_64, do_vmaxi_s)
+TRANS(vmaxi_bu, LSX, gvec_vv_i, 16, MO_8, do_vmaxi_u)
+TRANS(vmaxi_hu, LSX, gvec_vv_i, 16, MO_16, do_vmaxi_u)
+TRANS(vmaxi_wu, LSX, gvec_vv_i, 16, MO_32, do_vmaxi_u)
+TRANS(vmaxi_du, LSX, gvec_vv_i, 16, MO_64, do_vmaxi_u)
 
 TRANS(vmul_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_mul)
 TRANS(vmul_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_mul)
@@ -2790,10 +2790,10 @@ static void do_vsat_s(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                     tcg_constant_i64((1ll<< imm) -1), &op[vece]);
 }
 
-TRANS(vsat_b, LSX, gvec_vv_i, MO_8, do_vsat_s)
-TRANS(vsat_h, LSX, gvec_vv_i, MO_16, do_vsat_s)
-TRANS(vsat_w, LSX, gvec_vv_i, MO_32, do_vsat_s)
-TRANS(vsat_d, LSX, gvec_vv_i, MO_64, do_vsat_s)
+TRANS(vsat_b, LSX, gvec_vv_i, 16, MO_8, do_vsat_s)
+TRANS(vsat_h, LSX, gvec_vv_i, 16, MO_16, do_vsat_s)
+TRANS(vsat_w, LSX, gvec_vv_i, 16, MO_32, do_vsat_s)
+TRANS(vsat_d, LSX, gvec_vv_i, 16, MO_64, do_vsat_s)
 
 static void gen_vsat_u(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec max)
 {
@@ -2839,10 +2839,10 @@ static void do_vsat_u(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                     tcg_constant_i64(max), &op[vece]);
 }
 
-TRANS(vsat_bu, LSX, gvec_vv_i, MO_8, do_vsat_u)
-TRANS(vsat_hu, LSX, gvec_vv_i, MO_16, do_vsat_u)
-TRANS(vsat_wu, LSX, gvec_vv_i, MO_32, do_vsat_u)
-TRANS(vsat_du, LSX, gvec_vv_i, MO_64, do_vsat_u)
+TRANS(vsat_bu, LSX, gvec_vv_i, 16, MO_8, do_vsat_u)
+TRANS(vsat_hu, LSX, gvec_vv_i, 16, MO_16, do_vsat_u)
+TRANS(vsat_wu, LSX, gvec_vv_i, 16, MO_32, do_vsat_u)
+TRANS(vsat_du, LSX, gvec_vv_i, 16, MO_64, do_vsat_u)
 
 TRANS(vexth_h_b, LSX, gen_vv, gen_helper_vexth_h_b)
 TRANS(vexth_w_h, LSX, gen_vv, gen_helper_vexth_w_h)
@@ -3078,9 +3078,9 @@ static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
     return true;
 }
 TRANS(vorn_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_orc)
-TRANS(vandi_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_andi)
-TRANS(vori_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_ori)
-TRANS(vxori_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_xori)
+TRANS(vandi_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_andi)
+TRANS(vori_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_ori)
+TRANS(vxori_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_xori)
 
 static void gen_vnori(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
 {
@@ -3113,43 +3113,43 @@ static void do_vnori_b(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op);
 }
 
-TRANS(vnori_b, LSX, gvec_vv_i, MO_8, do_vnori_b)
+TRANS(vnori_b, LSX, gvec_vv_i, 16, MO_8, do_vnori_b)
 
 TRANS(vsll_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_shlv)
 TRANS(vsll_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_shlv)
 TRANS(vsll_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_shlv)
 TRANS(vsll_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_shlv)
-TRANS(vslli_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_shli)
-TRANS(vslli_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_shli)
-TRANS(vslli_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_shli)
-TRANS(vslli_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_shli)
+TRANS(vslli_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_shli)
+TRANS(vslli_h, LSX, gvec_vv_i, 16, MO_16, tcg_gen_gvec_shli)
+TRANS(vslli_w, LSX, gvec_vv_i, 16, MO_32, tcg_gen_gvec_shli)
+TRANS(vslli_d, LSX, gvec_vv_i, 16, MO_64, tcg_gen_gvec_shli)
 
 TRANS(vsrl_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_shrv)
 TRANS(vsrl_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_shrv)
 TRANS(vsrl_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_shrv)
 TRANS(vsrl_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_shrv)
-TRANS(vsrli_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_shri)
-TRANS(vsrli_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_shri)
-TRANS(vsrli_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_shri)
-TRANS(vsrli_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_shri)
+TRANS(vsrli_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_shri)
+TRANS(vsrli_h, LSX, gvec_vv_i, 16, MO_16, tcg_gen_gvec_shri)
+TRANS(vsrli_w, LSX, gvec_vv_i, 16, MO_32, tcg_gen_gvec_shri)
+TRANS(vsrli_d, LSX, gvec_vv_i, 16, MO_64, tcg_gen_gvec_shri)
 
 TRANS(vsra_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_sarv)
 TRANS(vsra_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_sarv)
 TRANS(vsra_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_sarv)
 TRANS(vsra_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_sarv)
-TRANS(vsrai_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_sari)
-TRANS(vsrai_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_sari)
-TRANS(vsrai_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_sari)
-TRANS(vsrai_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_sari)
+TRANS(vsrai_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_sari)
+TRANS(vsrai_h, LSX, gvec_vv_i, 16, MO_16, tcg_gen_gvec_sari)
+TRANS(vsrai_w, LSX, gvec_vv_i, 16, MO_32, tcg_gen_gvec_sari)
+TRANS(vsrai_d, LSX, gvec_vv_i, 16, MO_64, tcg_gen_gvec_sari)
 
 TRANS(vrotr_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_rotrv)
 TRANS(vrotr_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_rotrv)
 TRANS(vrotr_w, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_rotrv)
 TRANS(vrotr_d, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_rotrv)
-TRANS(vrotri_b, LSX, gvec_vv_i, MO_8, tcg_gen_gvec_rotri)
-TRANS(vrotri_h, LSX, gvec_vv_i, MO_16, tcg_gen_gvec_rotri)
-TRANS(vrotri_w, LSX, gvec_vv_i, MO_32, tcg_gen_gvec_rotri)
-TRANS(vrotri_d, LSX, gvec_vv_i, MO_64, tcg_gen_gvec_rotri)
+TRANS(vrotri_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_rotri)
+TRANS(vrotri_h, LSX, gvec_vv_i, 16, MO_16, tcg_gen_gvec_rotri)
+TRANS(vrotri_w, LSX, gvec_vv_i, 16, MO_32, tcg_gen_gvec_rotri)
+TRANS(vrotri_d, LSX, gvec_vv_i, 16, MO_64, tcg_gen_gvec_rotri)
 
 TRANS(vsllwil_h_b, LSX, gen_vv_i, gen_helper_vsllwil_h_b)
 TRANS(vsllwil_w_h, LSX, gen_vv_i, gen_helper_vsllwil_w_h)
@@ -3420,10 +3420,10 @@ static void do_vbitclri(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
 }
 
-TRANS(vbitclri_b, LSX, gvec_vv_i, MO_8, do_vbitclri)
-TRANS(vbitclri_h, LSX, gvec_vv_i, MO_16, do_vbitclri)
-TRANS(vbitclri_w, LSX, gvec_vv_i, MO_32, do_vbitclri)
-TRANS(vbitclri_d, LSX, gvec_vv_i, MO_64, do_vbitclri)
+TRANS(vbitclri_b, LSX, gvec_vv_i, 16, MO_8, do_vbitclri)
+TRANS(vbitclri_h, LSX, gvec_vv_i, 16, MO_16, do_vbitclri)
+TRANS(vbitclri_w, LSX, gvec_vv_i, 16, MO_32, do_vbitclri)
+TRANS(vbitclri_d, LSX, gvec_vv_i, 16, MO_64, do_vbitclri)
 
 static void do_vbitset(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                        uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
@@ -3502,10 +3502,10 @@ static void do_vbitseti(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
 }
 
-TRANS(vbitseti_b, LSX, gvec_vv_i, MO_8, do_vbitseti)
-TRANS(vbitseti_h, LSX, gvec_vv_i, MO_16, do_vbitseti)
-TRANS(vbitseti_w, LSX, gvec_vv_i, MO_32, do_vbitseti)
-TRANS(vbitseti_d, LSX, gvec_vv_i, MO_64, do_vbitseti)
+TRANS(vbitseti_b, LSX, gvec_vv_i, 16, MO_8, do_vbitseti)
+TRANS(vbitseti_h, LSX, gvec_vv_i, 16, MO_16, do_vbitseti)
+TRANS(vbitseti_w, LSX, gvec_vv_i, 16, MO_32, do_vbitseti)
+TRANS(vbitseti_d, LSX, gvec_vv_i, 16, MO_64, do_vbitseti)
 
 static void do_vbitrev(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
                        uint32_t vk_ofs, uint32_t oprsz, uint32_t maxsz)
@@ -3584,10 +3584,10 @@ static void do_vbitrevi(unsigned vece, uint32_t vd_ofs, uint32_t vj_ofs,
     tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, maxsz, imm, &op[vece]);
 }
 
-TRANS(vbitrevi_b, LSX, gvec_vv_i, MO_8, do_vbitrevi)
-TRANS(vbitrevi_h, LSX, gvec_vv_i, MO_16, do_vbitrevi)
-TRANS(vbitrevi_w, LSX, gvec_vv_i, MO_32, do_vbitrevi)
-TRANS(vbitrevi_d, LSX, gvec_vv_i, MO_64, do_vbitrevi)
+TRANS(vbitrevi_b, LSX, gvec_vv_i, 16, MO_8, do_vbitrevi)
+TRANS(vbitrevi_h, LSX, gvec_vv_i, 16, MO_16, do_vbitrevi)
+TRANS(vbitrevi_w, LSX, gvec_vv_i, 16, MO_32, do_vbitrevi)
+TRANS(vbitrevi_d, LSX, gvec_vv_i, 16, MO_64, do_vbitrevi)
 
 TRANS(vfrstp_b, LSX, gen_vvv, gen_helper_vfrstp_b)
 TRANS(vfrstp_h, LSX, gen_vvv, gen_helper_vfrstp_h)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 08/48] target/loongarch: Implement xvneg
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (6 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 07/48] target/loongarch: Implement xvaddi/xvsubi Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 09/48] target/loongarch: Implement xvsadd/xvssub Song Gao
                   ` (39 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVNEG.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/insns.decode                |  5 +++++
 target/loongarch/disas.c                     | 10 ++++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc |  5 +++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 12 ++++++------
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c48dca70b8..759172628f 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1320,6 +1320,11 @@ xvsubi_hu        0111 01101000 11001 ..... ..... .....    @vv_ui5
 xvsubi_wu        0111 01101000 11010 ..... ..... .....    @vv_ui5
 xvsubi_du        0111 01101000 11011 ..... ..... .....    @vv_ui5
 
+xvneg_b          0111 01101001 11000 01100 ..... .....    @vv
+xvneg_h          0111 01101001 11000 01101 ..... .....    @vv
+xvneg_w          0111 01101001 11000 01110 ..... .....    @vv
+xvneg_d          0111 01101001 11000 01111 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f59e3cebf0..4e26d49acc 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1713,6 +1713,11 @@ static void output_vv_i_x(DisasContext *ctx, arg_vv_i *a, const char *mnemonic)
     output(ctx, mnemonic, "x%d, x%d, 0x%x", a->vd, a->vj, a->imm);
 }
 
+static void output_vv_x(DisasContext *ctx, arg_vv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, x%d", a->vd, a->vj);
+}
+
 static void output_vr_x(DisasContext *ctx, arg_vr *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
@@ -1738,6 +1743,11 @@ INSN_LASX(xvsubi_hu,         vv_i)
 INSN_LASX(xvsubi_wu,         vv_i)
 INSN_LASX(xvsubi_du,         vv_i)
 
+INSN_LASX(xvneg_b,           vv)
+INSN_LASX(xvneg_h,           vv)
+INSN_LASX(xvneg_w,           vv)
+INSN_LASX(xvneg_d,           vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 0e8a711fde..29eefe6934 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -60,6 +60,11 @@ TRANS(xvsubi_hu, LASX, gvec_subi, 32, MO_16)
 TRANS(xvsubi_wu, LASX, gvec_subi, 32, MO_32)
 TRANS(xvsubi_du, LASX, gvec_subi, 32, MO_64)
 
+TRANS(xvneg_b, LASX, gvec_vv, 32, MO_8, tcg_gen_gvec_neg)
+TRANS(xvneg_h, LASX, gvec_vv, 32, MO_16, tcg_gen_gvec_neg)
+TRANS(xvneg_w, LASX, gvec_vv, 32, MO_32, tcg_gen_gvec_neg)
+TRANS(xvneg_d, LASX, gvec_vv, 32, MO_64, tcg_gen_gvec_neg)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 00f134a0b1..86a0d4d6b9 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -81,7 +81,7 @@ static bool gvec_vvv(DisasContext *ctx, arg_vvv *a, uint32_t oprsz, MemOp mop,
     return true;
 }
 
-static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp mop,
+static bool gvec_vv(DisasContext *ctx, arg_vv *a, uint32_t oprsz, MemOp mop,
                     void (*func)(unsigned, uint32_t, uint32_t,
                                  uint32_t, uint32_t))
 {
@@ -92,7 +92,7 @@ static bool gvec_vv(DisasContext *ctx, arg_vv *a, MemOp mop,
     vd_ofs = vec_full_offset(a->vd);
     vj_ofs = vec_full_offset(a->vj);
 
-    func(mop, vd_ofs, vj_ofs, 16, ctx->vl/8);
+    func(mop, vd_ofs, vj_ofs, oprsz, ctx->vl / 8);
     return true;
 }
 
@@ -177,10 +177,10 @@ TRANS(vsubi_hu, LSX, gvec_subi, 16, MO_16)
 TRANS(vsubi_wu, LSX, gvec_subi, 16, MO_32)
 TRANS(vsubi_du, LSX, gvec_subi, 16, MO_64)
 
-TRANS(vneg_b, LSX, gvec_vv, MO_8, tcg_gen_gvec_neg)
-TRANS(vneg_h, LSX, gvec_vv, MO_16, tcg_gen_gvec_neg)
-TRANS(vneg_w, LSX, gvec_vv, MO_32, tcg_gen_gvec_neg)
-TRANS(vneg_d, LSX, gvec_vv, MO_64, tcg_gen_gvec_neg)
+TRANS(vneg_b, LSX, gvec_vv, 16, MO_8, tcg_gen_gvec_neg)
+TRANS(vneg_h, LSX, gvec_vv, 16, MO_16, tcg_gen_gvec_neg)
+TRANS(vneg_w, LSX, gvec_vv, 16, MO_32, tcg_gen_gvec_neg)
+TRANS(vneg_d, LSX, gvec_vv, 16, MO_64, tcg_gen_gvec_neg)
 
 TRANS(vsadd_b, LSX, gvec_vvv, 16, MO_8, tcg_gen_gvec_ssadd)
 TRANS(vsadd_h, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_ssadd)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 09/48] target/loongarch: Implement xvsadd/xvssub
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (7 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 08/48] target/loongarch: Implement xvneg Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c Song Gao
                   ` (38 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSADD.{B/H/W/D}[U];
- XVSSUB.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/insns.decode                | 18 ++++++++++++++++++
 target/loongarch/disas.c                     | 17 +++++++++++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 759172628f..32f857ff7c 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1325,6 +1325,24 @@ xvneg_h          0111 01101001 11000 01101 ..... .....    @vv
 xvneg_w          0111 01101001 11000 01110 ..... .....    @vv
 xvneg_d          0111 01101001 11000 01111 ..... .....    @vv
 
+xvsadd_b         0111 01000100 01100 ..... ..... .....    @vvv
+xvsadd_h         0111 01000100 01101 ..... ..... .....    @vvv
+xvsadd_w         0111 01000100 01110 ..... ..... .....    @vvv
+xvsadd_d         0111 01000100 01111 ..... ..... .....    @vvv
+xvsadd_bu        0111 01000100 10100 ..... ..... .....    @vvv
+xvsadd_hu        0111 01000100 10101 ..... ..... .....    @vvv
+xvsadd_wu        0111 01000100 10110 ..... ..... .....    @vvv
+xvsadd_du        0111 01000100 10111 ..... ..... .....    @vvv
+
+xvssub_b         0111 01000100 10000 ..... ..... .....    @vvv
+xvssub_h         0111 01000100 10001 ..... ..... .....    @vvv
+xvssub_w         0111 01000100 10010 ..... ..... .....    @vvv
+xvssub_d         0111 01000100 10011 ..... ..... .....    @vvv
+xvssub_bu        0111 01000100 11000 ..... ..... .....    @vvv
+xvssub_hu        0111 01000100 11001 ..... ..... .....    @vvv
+xvssub_wu        0111 01000100 11010 ..... ..... .....    @vvv
+xvssub_du        0111 01000100 11011 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 4e26d49acc..0fd88a56c1 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1748,6 +1748,23 @@ INSN_LASX(xvneg_h,           vv)
 INSN_LASX(xvneg_w,           vv)
 INSN_LASX(xvneg_d,           vv)
 
+INSN_LASX(xvsadd_b,          vvv)
+INSN_LASX(xvsadd_h,          vvv)
+INSN_LASX(xvsadd_w,          vvv)
+INSN_LASX(xvsadd_d,          vvv)
+INSN_LASX(xvsadd_bu,         vvv)
+INSN_LASX(xvsadd_hu,         vvv)
+INSN_LASX(xvsadd_wu,         vvv)
+INSN_LASX(xvsadd_du,         vvv)
+INSN_LASX(xvssub_b,          vvv)
+INSN_LASX(xvssub_h,          vvv)
+INSN_LASX(xvssub_w,          vvv)
+INSN_LASX(xvssub_d,          vvv)
+INSN_LASX(xvssub_bu,         vvv)
+INSN_LASX(xvssub_hu,         vvv)
+INSN_LASX(xvssub_wu,         vvv)
+INSN_LASX(xvssub_du,         vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 29eefe6934..c818a09312 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -65,6 +65,23 @@ TRANS(xvneg_h, LASX, gvec_vv, 32, MO_16, tcg_gen_gvec_neg)
 TRANS(xvneg_w, LASX, gvec_vv, 32, MO_32, tcg_gen_gvec_neg)
 TRANS(xvneg_d, LASX, gvec_vv, 32, MO_64, tcg_gen_gvec_neg)
 
+TRANS(xvsadd_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_ssadd)
+TRANS(xvsadd_bu, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_usadd)
+TRANS(xvsadd_hu, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_usadd)
+TRANS(xvsadd_wu, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_usadd)
+TRANS(xvsadd_du, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_usadd)
+TRANS(xvssub_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_sssub)
+TRANS(xvssub_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sssub)
+TRANS(xvssub_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sssub)
+TRANS(xvssub_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sssub)
+TRANS(xvssub_bu, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_ussub)
+TRANS(xvssub_hu, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_ussub)
+TRANS(xvssub_wu, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_ussub)
+TRANS(xvssub_du, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_ussub)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (8 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 09/48] target/loongarch: Implement xvsadd/xvssub Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 18:06   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
                   ` (37 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Use gen_helper_gvec_* series function.
and rename lsx_helper.c to vec_helper.c.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h                     |  642 ++++----
 .../loongarch/{lsx_helper.c => vec_helper.c}  | 1297 ++++++++---------
 target/loongarch/insn_trans/trans_lsx.c.inc   |  731 +++++-----
 target/loongarch/meson.build                  |    2 +-
 4 files changed, 1329 insertions(+), 1343 deletions(-)
 rename target/loongarch/{lsx_helper.c => vec_helper.c} (71%)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index ffb1e0b0bf..1abd9e1410 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -133,22 +133,22 @@ DEF_HELPER_1(idle, void, env)
 #endif
 
 /* LoongArch LSX  */
-DEF_HELPER_4(vhaddw_h_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_w_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_d_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_q_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_hu_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_wu_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_du_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhaddw_qu_du, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_h_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_w_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_d_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_q_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_hu_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_wu_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_du_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vhsubw_qu_du, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_4(vhaddw_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhaddw_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vhsubw_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vaddwev_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vaddwev_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
@@ -305,22 +305,22 @@ DEF_HELPER_FLAGS_4(vmaddwod_h_bu_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_w_hu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vmaddwod_d_wu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
-DEF_HELPER_4(vdiv_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vdiv_du, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_wu, void, env, i32, i32, i32)
-DEF_HELPER_4(vmod_du, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_4(vdiv_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vdiv_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_bu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_hu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_wu, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vmod_du, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vsat_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
@@ -331,161 +331,161 @@ DEF_HELPER_FLAGS_4(vsat_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vsat_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_3(vexth_h_b, void, env, i32, i32)
-DEF_HELPER_3(vexth_w_h, void, env, i32, i32)
-DEF_HELPER_3(vexth_d_w, void, env, i32, i32)
-DEF_HELPER_3(vexth_q_d, void, env, i32, i32)
-DEF_HELPER_3(vexth_hu_bu, void, env, i32, i32)
-DEF_HELPER_3(vexth_wu_hu, void, env, i32, i32)
-DEF_HELPER_3(vexth_du_wu, void, env, i32, i32)
-DEF_HELPER_3(vexth_qu_du, void, env, i32, i32)
+DEF_HELPER_FLAGS_3(vexth_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vexth_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 
-DEF_HELPER_3(vmskltz_b, void, env, i32, i32)
-DEF_HELPER_3(vmskltz_h, void, env, i32, i32)
-DEF_HELPER_3(vmskltz_w, void, env, i32, i32)
-DEF_HELPER_3(vmskltz_d, void, env, i32, i32)
-DEF_HELPER_3(vmskgez_b, void, env, i32, i32)
-DEF_HELPER_3(vmsknz_b, void, env, i32,i32)
+DEF_HELPER_FLAGS_3(vmskltz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskltz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskltz_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskltz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmskgez_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vmsknz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vnori_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_4(vsllwil_h_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vsllwil_w_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsllwil_d_w, void, env, i32, i32, i32)
-DEF_HELPER_3(vextl_q_d, void, env, i32, i32)
-DEF_HELPER_4(vsllwil_hu_bu, void, env, i32, i32, i32)
-DEF_HELPER_4(vsllwil_wu_hu, void, env, i32, i32, i32)
-DEF_HELPER_4(vsllwil_du_wu, void, env, i32, i32, i32)
-DEF_HELPER_3(vextl_qu_du, void, env, i32, i32)
-
-DEF_HELPER_4(vsrlr_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlr_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlr_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlr_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlri_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlri_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlri_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlri_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vsrar_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrar_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrar_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrar_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrari_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrari_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrari_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrari_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vsrln_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrln_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrln_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsran_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsran_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsran_w_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vsrlni_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlni_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlni_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlni_d_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrani_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrani_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrani_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrani_d_q, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vsrlrn_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlrn_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlrn_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarn_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarn_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarn_w_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vsrlrni_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlrni_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlrni_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrlrni_d_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarni_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarni_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarni_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vsrarni_d_q, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vssrln_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrln_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrln_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssran_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssran_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssran_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrln_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrln_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrln_wu_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssran_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssran_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssran_wu_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vssrlni_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_d_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_d_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_wu_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlni_du_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_wu_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrani_du_q, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vssrlrn_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrn_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrn_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarn_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarn_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarn_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrn_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrn_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrn_wu_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarn_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarn_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarn_wu_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vssrlrni_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_d_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_b_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_h_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_d_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_wu_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrlrni_du_q, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_bu_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_hu_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_wu_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vssrarni_du_q, void, env, i32, i32, i32)
-
-DEF_HELPER_3(vclo_b, void, env, i32, i32)
-DEF_HELPER_3(vclo_h, void, env, i32, i32)
-DEF_HELPER_3(vclo_w, void, env, i32, i32)
-DEF_HELPER_3(vclo_d, void, env, i32, i32)
-DEF_HELPER_3(vclz_b, void, env, i32, i32)
-DEF_HELPER_3(vclz_h, void, env, i32, i32)
-DEF_HELPER_3(vclz_w, void, env, i32, i32)
-DEF_HELPER_3(vclz_d, void, env, i32, i32)
-
-DEF_HELPER_3(vpcnt_b, void, env, i32, i32)
-DEF_HELPER_3(vpcnt_h, void, env, i32, i32)
-DEF_HELPER_3(vpcnt_w, void, env, i32, i32)
-DEF_HELPER_3(vpcnt_d, void, env, i32, i32)
+DEF_HELPER_FLAGS_4(vsllwil_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsllwil_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsllwil_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_3(vextl_q_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsllwil_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsllwil_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsllwil_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_3(vextl_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsrlr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrlr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrlr_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrlr_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrlri_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlri_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlri_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlri_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vsrar_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrar_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrar_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrar_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrari_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrari_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrari_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrari_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vsrln_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrln_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrln_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsran_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsran_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsran_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsrlni_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlni_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlni_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlni_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrani_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrani_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrani_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrani_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vsrlrn_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrlrn_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrlrn_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrarn_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrarn_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vsrarn_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vsrlrni_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlrni_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlrni_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrlrni_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrarni_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrarni_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrarni_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vsrarni_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vssrln_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrln_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrln_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssran_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssran_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssran_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrln_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrln_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrln_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssran_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssran_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssran_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vssrlni_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlni_du_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrani_du_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vssrlrn_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrlrn_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrlrn_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrarn_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrarn_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrarn_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrlrn_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrlrn_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrlrn_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrarn_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrarn_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vssrarn_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vssrlrni_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_b_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_h_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_d_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrlrni_du_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_bu_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_hu_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_wu_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vssrarni_du_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_3(vclo_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclo_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclo_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclo_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vclz_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_3(vpcnt_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vpcnt_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vpcnt_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vpcnt_d, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
 DEF_HELPER_FLAGS_4(vbitclr_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vbitclr_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
@@ -514,107 +514,107 @@ DEF_HELPER_FLAGS_4(vbitrevi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vbitrevi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vbitrevi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_4(vfrstp_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vfrstp_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vfrstpi_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vfrstpi_h, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vfadd_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfadd_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vfsub_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfsub_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmul_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmul_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vfdiv_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfdiv_d, void, env, i32, i32, i32)
-
-DEF_HELPER_5(vfmadd_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfmadd_d, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfmsub_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfmsub_d, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfnmadd_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfnmadd_d, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfnmsub_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfnmsub_d, void, env, i32, i32, i32, i32)
-
-DEF_HELPER_4(vfmax_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmax_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmin_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmin_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vfmaxa_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmaxa_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmina_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfmina_d, void, env, i32, i32, i32)
-
-DEF_HELPER_3(vflogb_s, void, env, i32, i32)
-DEF_HELPER_3(vflogb_d, void, env, i32, i32)
-
-DEF_HELPER_3(vfclass_s, void, env, i32, i32)
-DEF_HELPER_3(vfclass_d, void, env, i32, i32)
-
-DEF_HELPER_3(vfsqrt_s, void, env, i32, i32)
-DEF_HELPER_3(vfsqrt_d, void, env, i32, i32)
-DEF_HELPER_3(vfrecip_s, void, env, i32, i32)
-DEF_HELPER_3(vfrecip_d, void, env, i32, i32)
-DEF_HELPER_3(vfrsqrt_s, void, env, i32, i32)
-DEF_HELPER_3(vfrsqrt_d, void, env, i32, i32)
-
-DEF_HELPER_3(vfcvtl_s_h, void, env, i32, i32)
-DEF_HELPER_3(vfcvth_s_h, void, env, i32, i32)
-DEF_HELPER_3(vfcvtl_d_s, void, env, i32, i32)
-DEF_HELPER_3(vfcvth_d_s, void, env, i32, i32)
-DEF_HELPER_4(vfcvt_h_s, void, env, i32, i32, i32)
-DEF_HELPER_4(vfcvt_s_d, void, env, i32, i32, i32)
-
-DEF_HELPER_3(vfrintrne_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrne_d, void, env, i32, i32)
-DEF_HELPER_3(vfrintrz_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrz_d, void, env, i32, i32)
-DEF_HELPER_3(vfrintrp_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrp_d, void, env, i32, i32)
-DEF_HELPER_3(vfrintrm_s, void, env, i32, i32)
-DEF_HELPER_3(vfrintrm_d, void, env, i32, i32)
-DEF_HELPER_3(vfrint_s, void, env, i32, i32)
-DEF_HELPER_3(vfrint_d, void, env, i32, i32)
-
-DEF_HELPER_3(vftintrne_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrne_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrp_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrp_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrm_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrm_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftint_w_s, void, env, i32, i32)
-DEF_HELPER_3(vftint_l_d, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_wu_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrz_lu_d, void, env, i32, i32)
-DEF_HELPER_3(vftint_wu_s, void, env, i32, i32)
-DEF_HELPER_3(vftint_lu_d, void, env, i32, i32)
-DEF_HELPER_4(vftintrne_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vftintrz_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vftintrp_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vftintrm_w_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vftint_w_d, void, env, i32, i32, i32)
-DEF_HELPER_3(vftintrnel_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrneh_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrzl_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrzh_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrpl_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrph_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrml_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintrmh_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftintl_l_s, void, env, i32, i32)
-DEF_HELPER_3(vftinth_l_s, void, env, i32, i32)
-
-DEF_HELPER_3(vffint_s_w, void, env, i32, i32)
-DEF_HELPER_3(vffint_d_l, void, env, i32, i32)
-DEF_HELPER_3(vffint_s_wu, void, env, i32, i32)
-DEF_HELPER_3(vffint_d_lu, void, env, i32, i32)
-DEF_HELPER_3(vffintl_d_w, void, env, i32, i32)
-DEF_HELPER_3(vffinth_d_w, void, env, i32, i32)
-DEF_HELPER_4(vffint_s_l, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_4(vfrstp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vfrstp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vfrstpi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vfrstpi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_5(vfadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmul_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmul_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfdiv_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfdiv_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_6(vfmadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfmadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfmsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfmsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfnmadd_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfnmadd_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfnmsub_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_6(vfnmsub_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_5(vfmax_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmax_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmin_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmin_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_5(vfmaxa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmaxa_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmina_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfmina_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vflogb_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vflogb_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfclass_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfclass_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfsqrt_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfsqrt_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrecip_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrecip_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrsqrt_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrsqrt_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfcvtl_s_h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfcvth_s_h, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfcvtl_d_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfcvth_d_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfcvt_h_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vfcvt_s_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vfrintrne_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrne_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrz_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrz_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrp_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrp_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrm_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrintrm_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrint_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vfrint_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vftintrne_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrne_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrz_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrz_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrp_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrp_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrm_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrm_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftint_w_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftint_l_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrz_wu_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrz_lu_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftint_wu_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftint_lu_d, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vftintrne_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vftintrz_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vftintrp_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vftintrm_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vftint_w_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrnel_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrneh_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrzl_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrzh_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrpl_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrph_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrml_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintrmh_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftintl_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vftinth_l_s, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+
+DEF_HELPER_FLAGS_4(vffint_s_w, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vffint_d_l, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vffint_s_wu, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vffint_d_lu, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vffintl_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_4(vffinth_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, env, i32)
+DEF_HELPER_FLAGS_5(vffint_s_l, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, env, i32)
 
 DEF_HELPER_FLAGS_4(vseqi_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vseqi_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
@@ -655,45 +655,45 @@ DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
 DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
 
-DEF_HELPER_4(vpackev_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackev_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackev_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackev_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackod_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackod_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackod_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vpackod_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vpickev_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickev_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickev_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickev_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickod_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickod_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickod_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vpickod_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vilvl_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvl_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvl_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvl_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvh_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvh_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvh_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vilvh_d, void, env, i32, i32, i32)
-
-DEF_HELPER_5(vshuf_b, void, env, i32, i32, i32, i32)
-DEF_HELPER_4(vshuf_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vshuf_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vshuf_d, void, env, i32, i32, i32)
-DEF_HELPER_4(vshuf4i_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vshuf4i_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vshuf4i_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vshuf4i_d, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vpermi_w, void, env, i32, i32, i32)
-
-DEF_HELPER_4(vextrins_b, void, env, i32, i32, i32)
-DEF_HELPER_4(vextrins_h, void, env, i32, i32, i32)
-DEF_HELPER_4(vextrins_w, void, env, i32, i32, i32)
-DEF_HELPER_4(vextrins_d, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_4(vpackev_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackev_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackev_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackev_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackod_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackod_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackod_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpackod_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vpickev_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickev_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickev_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickev_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickod_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickod_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickod_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vpickod_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_4(vilvl_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvl_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvl_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvl_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvh_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvh_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvh_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vilvh_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+
+DEF_HELPER_FLAGS_5(vshuf_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vshuf_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vshuf_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vshuf_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_4(vshuf4i_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vshuf4i_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vshuf4i_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vshuf4i_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vpermi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
+DEF_HELPER_FLAGS_4(vextrins_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vextrins_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vextrins_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vextrins_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/lsx_helper.c b/target/loongarch/vec_helper.c
similarity index 71%
rename from target/loongarch/lsx_helper.c
rename to target/loongarch/vec_helper.c
index b231a2798b..d01903018a 100644
--- a/target/loongarch/lsx_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -18,13 +18,12 @@
 #define DO_SUB(a, b)  (a - b)
 
 #define DO_ODD_EVEN(NAME, BIT, E1, E2, DO_OP)                        \
-void HELPER(NAME)(CPULoongArchState *env,                            \
-                  uint32_t vd, uint32_t vj, uint32_t vk)             \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)       \
 {                                                                    \
     int i;                                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                                 \
-    VReg *Vj = &(env->fpr[vj].vreg);                                 \
-    VReg *Vk = &(env->fpr[vk].vreg);                                 \
+    VReg *Vd = (VReg *)vd;                                           \
+    VReg *Vj = (VReg *)vj;                                           \
+    VReg *Vk = (VReg *)vk;                                           \
     typedef __typeof(Vd->E1(0)) TD;                                  \
                                                                      \
     for (i = 0; i < LSX_LEN/BIT; i++) {                              \
@@ -36,12 +35,11 @@ DO_ODD_EVEN(vhaddw_h_b, 16, H, B, DO_ADD)
 DO_ODD_EVEN(vhaddw_w_h, 32, W, H, DO_ADD)
 DO_ODD_EVEN(vhaddw_d_w, 64, D, W, DO_ADD)
 
-void HELPER(vhaddw_q_d)(CPULoongArchState *env,
-                        uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vhaddw_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     Vd->Q(0) = int128_add(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(0)));
 }
@@ -50,12 +48,11 @@ DO_ODD_EVEN(vhsubw_h_b, 16, H, B, DO_SUB)
 DO_ODD_EVEN(vhsubw_w_h, 32, W, H, DO_SUB)
 DO_ODD_EVEN(vhsubw_d_w, 64, D, W, DO_SUB)
 
-void HELPER(vhsubw_q_d)(CPULoongArchState *env,
-                        uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vhsubw_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     Vd->Q(0) = int128_sub(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(0)));
 }
@@ -64,12 +61,11 @@ DO_ODD_EVEN(vhaddw_hu_bu, 16, UH, UB, DO_ADD)
 DO_ODD_EVEN(vhaddw_wu_hu, 32, UW, UH, DO_ADD)
 DO_ODD_EVEN(vhaddw_du_wu, 64, UD, UW, DO_ADD)
 
-void HELPER(vhaddw_qu_du)(CPULoongArchState *env,
-                          uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vhaddw_qu_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
                           int128_make64((uint64_t)Vk->D(0)));
@@ -79,12 +75,11 @@ DO_ODD_EVEN(vhsubw_hu_bu, 16, UH, UB, DO_SUB)
 DO_ODD_EVEN(vhsubw_wu_hu, 32, UW, UH, DO_SUB)
 DO_ODD_EVEN(vhsubw_du_wu, 64, UD, UW, DO_SUB)
 
-void HELPER(vhsubw_qu_du)(CPULoongArchState *env,
-                          uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vhsubw_qu_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(1)),
                           int128_make64((uint64_t)Vk->D(0)));
@@ -539,7 +534,7 @@ VMADDWEV_U_S(vmaddwev_w_hu_h, 32, W, UW, H, UH, DO_MUL)
 VMADDWEV_U_S(vmaddwev_d_wu_w, 64, D, UD, W, UW, DO_MUL)
 
 #define VMADDWOD_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)  \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                           \
     int i;                                                  \
     VReg *Vd = (VReg *)vd;                                  \
@@ -549,8 +544,8 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
     typedef __typeof(Vd->EU1(0)) TU1;                       \
                                                             \
     for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->ES1(i) += DO_OP((TU1)Vj->EU2(2 * i + 1),         \
-                            (TS1)Vk->ES2(2 * i + 1));        \
+        Vd->ES1(i) += DO_OP((TU1)Vj->EU2(2 * i + 1),        \
+                            (TS1)Vk->ES2(2 * i + 1));       \
     }                                                       \
 }
 
@@ -565,17 +560,17 @@ VMADDWOD_U_S(vmaddwod_d_wu_w, 64, D, UD, W, UW, DO_MUL)
 #define DO_REM(N, M)  (unlikely(M == 0) ? 0 :\
         unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
 
-#define VDIV(NAME, BIT, E, DO_OP)                           \
-void HELPER(NAME)(CPULoongArchState *env,                   \
-                  uint32_t vd, uint32_t vj, uint32_t vk)    \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = &(env->fpr[vd].vreg);                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                        \
-    VReg *Vk = &(env->fpr[vk].vreg);                        \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));               \
-    }                                                       \
+#define VDIV(NAME, BIT, E, DO_OP)                              \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));                  \
+    }                                                          \
 }
 
 VDIV(vdiv_b, 8, B, DO_DIV)
@@ -632,30 +627,30 @@ VSAT_U(vsat_hu, 16, UH)
 VSAT_U(vsat_wu, 32, UW)
 VSAT_U(vsat_du, 64, UD)
 
-#define VEXTH(NAME, BIT, E1, E2)                                    \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
-        Vd->E1(i) = Vj->E2(i + LSX_LEN/BIT);                        \
-    }                                                               \
+#define VEXTH(NAME, BIT, E1, E2)                     \
+void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
+{                                                    \
+    int i;                                           \
+    VReg *Vd = (VReg *)vd;                           \
+    VReg *Vj = (VReg *)vj;                           \
+                                                     \
+    for (i = 0; i < LSX_LEN/BIT; i++) {              \
+        Vd->E1(i) = Vj->E2(i + LSX_LEN/BIT);         \
+    }                                                \
 }
 
-void HELPER(vexth_q_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vexth_q_d)(void *vd, void *vj, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     Vd->Q(0) = int128_makes64(Vj->D(1));
 }
 
-void HELPER(vexth_qu_du)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vexth_qu_du)(void *vd, void *vj, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     Vd->Q(0) = int128_make64((uint64_t)Vj->D(1));
 }
@@ -684,11 +679,11 @@ static uint64_t do_vmskltz_b(int64_t val)
     return c >> 56;
 }
 
-void HELPER(vmskltz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmskltz_b)(void *vd, void *vj, uint32_t desc)
 {
     uint16_t temp = 0;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp = do_vmskltz_b(Vj->D(0));
     temp |= (do_vmskltz_b(Vj->D(1)) << 8);
@@ -705,11 +700,11 @@ static uint64_t do_vmskltz_h(int64_t val)
     return c >> 60;
 }
 
-void HELPER(vmskltz_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmskltz_h)(void *vd, void *vj, uint32_t desc)
 {
     uint16_t temp = 0;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp = do_vmskltz_h(Vj->D(0));
     temp |= (do_vmskltz_h(Vj->D(1)) << 4);
@@ -725,11 +720,11 @@ static uint64_t do_vmskltz_w(int64_t val)
     return c >> 62;
 }
 
-void HELPER(vmskltz_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmskltz_w)(void *vd, void *vj, uint32_t desc)
 {
     uint16_t temp = 0;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp = do_vmskltz_w(Vj->D(0));
     temp |= (do_vmskltz_w(Vj->D(1)) << 2);
@@ -741,11 +736,11 @@ static uint64_t do_vmskltz_d(int64_t val)
 {
     return (uint64_t)val >> 63;
 }
-void HELPER(vmskltz_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmskltz_d)(void *vd, void *vj, uint32_t desc)
 {
     uint16_t temp = 0;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp = do_vmskltz_d(Vj->D(0));
     temp |= (do_vmskltz_d(Vj->D(1)) << 1);
@@ -753,11 +748,11 @@ void HELPER(vmskltz_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     Vd->D(1) = 0;
 }
 
-void HELPER(vmskgez_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmskgez_b)(void *vd, void *vj, uint32_t desc)
 {
     uint16_t temp = 0;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp =  do_vmskltz_b(Vj->D(0));
     temp |= (do_vmskltz_b(Vj->D(1)) << 8);
@@ -775,11 +770,11 @@ static uint64_t do_vmskez_b(uint64_t a)
     return c >> 56;
 }
 
-void HELPER(vmsknz_b)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vmsknz_b)(void vd, void vj, uint32_t desc)
 {
     uint16_t temp = 0;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp = do_vmskez_b(Vj->D(0));
     temp |= (do_vmskez_b(Vj->D(1)) << 8);
@@ -798,36 +793,35 @@ void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
     }
 }
 
-#define VSLLWIL(NAME, BIT, E1, E2)                        \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int i;                                                \
-    VReg temp;                                            \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-    typedef __typeof(temp.E1(0)) TD;                      \
-                                                          \
-    temp.D(0) = 0;                                        \
-    temp.D(1) = 0;                                        \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
-        temp.E1(i) = (TD)Vj->E2(i) << (imm % BIT);        \
-    }                                                     \
-    *Vd = temp;                                           \
+#define VSLLWIL(NAME, BIT, E1, E2)                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg temp;                                                     \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    typedef __typeof(temp.E1(0)) TD;                               \
+                                                                   \
+    temp.D(0) = 0;                                                 \
+    temp.D(1) = 0;                                                 \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+        temp.E1(i) = (TD)Vj->E2(i) << (imm % BIT);                 \
+    }                                                              \
+    *Vd = temp;                                                    \
 }
 
-void HELPER(vextl_q_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vextl_q_d)(void *vd, void *vj, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     Vd->Q(0) = int128_makes64(Vj->D(0));
 }
 
-void HELPER(vextl_qu_du)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vextl_qu_du)(void *vd, void *vj, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     Vd->Q(0) = int128_make64(Vj->D(0));
 }
@@ -855,13 +849,12 @@ do_vsrlr(W, uint32_t)
 do_vsrlr(D, uint64_t)
 
 #define VSRLR(NAME, BIT, T, E)                                  \
-void HELPER(NAME)(CPULoongArchState *env,                       \
-                  uint32_t vd, uint32_t vj, uint32_t vk)        \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)  \
 {                                                               \
     int i;                                                      \
-    VReg *Vd = &(env->fpr[vd].vreg);                            \
-    VReg *Vj = &(env->fpr[vj].vreg);                            \
-    VReg *Vk = &(env->fpr[vk].vreg);                            \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    VReg *Vk = (VReg *)vk;                                      \
                                                                 \
     for (i = 0; i < LSX_LEN/BIT; i++) {                         \
         Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
@@ -873,17 +866,16 @@ VSRLR(vsrlr_h, 16, uint16_t, H)
 VSRLR(vsrlr_w, 32, uint32_t, W)
 VSRLR(vsrlr_d, 64, uint64_t, D)
 
-#define VSRLRI(NAME, BIT, E)                              \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int i;                                                \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-                                                          \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
-        Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), imm);         \
-    }                                                     \
+#define VSRLRI(NAME, BIT, E)                                       \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+        Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), imm);                  \
+    }                                                              \
 }
 
 VSRLRI(vsrlri_b, 8, B)
@@ -907,13 +899,12 @@ do_vsrar(W, int32_t)
 do_vsrar(D, int64_t)
 
 #define VSRAR(NAME, BIT, T, E)                                  \
-void HELPER(NAME)(CPULoongArchState *env,                       \
-                  uint32_t vd, uint32_t vj, uint32_t vk)        \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)  \
 {                                                               \
     int i;                                                      \
-    VReg *Vd = &(env->fpr[vd].vreg);                            \
-    VReg *Vj = &(env->fpr[vj].vreg);                            \
-    VReg *Vk = &(env->fpr[vk].vreg);                            \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    VReg *Vk = (VReg *)vk;                                      \
                                                                 \
     for (i = 0; i < LSX_LEN/BIT; i++) {                         \
         Vd->E(i) = do_vsrar_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
@@ -925,17 +916,16 @@ VSRAR(vsrar_h, 16, uint16_t, H)
 VSRAR(vsrar_w, 32, uint32_t, W)
 VSRAR(vsrar_d, 64, uint64_t, D)
 
-#define VSRARI(NAME, BIT, E)                              \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int i;                                                \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-                                                          \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
-        Vd->E(i) = do_vsrar_ ## E(Vj->E(i), imm);         \
-    }                                                     \
+#define VSRARI(NAME, BIT, E)                                       \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+        Vd->E(i) = do_vsrar_ ## E(Vj->E(i), imm);                  \
+    }                                                              \
 }
 
 VSRARI(vsrari_b, 8, B)
@@ -946,13 +936,12 @@ VSRARI(vsrari_d, 64, D)
 #define R_SHIFT(a, b) (a >> b)
 
 #define VSRLN(NAME, BIT, T, E1, E2)                             \
-void HELPER(NAME)(CPULoongArchState *env,                       \
-                  uint32_t vd, uint32_t vj, uint32_t vk)        \
+void HELPER(NAME)(void *vd, void *v, void *vk, uint32_t desc)   \
 {                                                               \
     int i;                                                      \
-    VReg *Vd = &(env->fpr[vd].vreg);                            \
-    VReg *Vj = &(env->fpr[vj].vreg);                            \
-    VReg *Vk = &(env->fpr[vk].vreg);                            \
+    VReg *Vd = (VReg *)vd;                                      \
+    VReg *Vj = (VReg *)vj;                                      \
+    VReg *Vk = (VReg *)vk;                                      \
                                                                 \
     for (i = 0; i < LSX_LEN/BIT; i++) {                         \
         Vd->E1(i) = R_SHIFT((T)Vj->E2(i),((T)Vk->E2(i)) % BIT); \
@@ -964,50 +953,47 @@ VSRLN(vsrln_b_h, 16, uint16_t, B, H)
 VSRLN(vsrln_h_w, 32, uint32_t, H, W)
 VSRLN(vsrln_w_d, 64, uint64_t, W, D)
 
-#define VSRAN(NAME, BIT, T, E1, E2)                           \
-void HELPER(NAME)(CPULoongArchState *env,                     \
-                  uint32_t vd, uint32_t vj, uint32_t vk)      \
-{                                                             \
-    int i;                                                    \
-    VReg *Vd = &(env->fpr[vd].vreg);                          \
-    VReg *Vj = &(env->fpr[vj].vreg);                          \
-    VReg *Vk = &(env->fpr[vk].vreg);                          \
-                                                              \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                       \
-        Vd->E1(i) = R_SHIFT(Vj->E2(i), ((T)Vk->E2(i)) % BIT); \
-    }                                                         \
-    Vd->D(1) = 0;                                             \
+#define VSRAN(NAME, BIT, T, E1, E2)                            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        Vd->E1(i) = R_SHIFT(Vj->E2(i), ((T)Vk->E2(i)) % BIT);  \
+    }                                                          \
+    Vd->D(1) = 0;                                              \
 }
 
 VSRAN(vsran_b_h, 16, uint16_t, B, H)
 VSRAN(vsran_h_w, 32, uint32_t, H, W)
 VSRAN(vsran_w_d, 64, uint64_t, W, D)
 
-#define VSRLNI(NAME, BIT, T, E1, E2)                         \
-void HELPER(NAME)(CPULoongArchState *env,                    \
-                  uint32_t vd, uint32_t vj, uint32_t imm)    \
-{                                                            \
-    int i, max;                                              \
-    VReg temp;                                               \
-    VReg *Vd = &(env->fpr[vd].vreg);                         \
-    VReg *Vj = &(env->fpr[vj].vreg);                         \
-                                                             \
-    temp.D(0) = 0;                                           \
-    temp.D(1) = 0;                                           \
-    max = LSX_LEN/BIT;                                       \
-    for (i = 0; i < max; i++) {                              \
-        temp.E1(i) = R_SHIFT((T)Vj->E2(i), imm);             \
-        temp.E1(i + max) = R_SHIFT((T)Vd->E2(i), imm);       \
-    }                                                        \
-    *Vd = temp;                                              \
-}
-
-void HELPER(vsrlni_d_q)(CPULoongArchState *env,
-                        uint32_t vd, uint32_t vj, uint32_t imm)
+#define VSRLNI(NAME, BIT, T, E1, E2)                               \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i, max;                                                    \
+    VReg temp;                                                     \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    temp.D(0) = 0;                                                 \
+    temp.D(1) = 0;                                                 \
+    max = LSX_LEN/BIT;                                             \
+    for (i = 0; i < max; i++) {                                    \
+        temp.E1(i) = R_SHIFT((T)Vj->E2(i), imm);                   \
+        temp.E1(i + max) = R_SHIFT((T)Vd->E2(i), imm);             \
+    }                                                              \
+    *Vd = temp;                                                    \
+}
+
+void HELPER(vsrlni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp.D(0) = 0;
     temp.D(1) = 0;
@@ -1020,31 +1006,29 @@ VSRLNI(vsrlni_b_h, 16, uint16_t, B, H)
 VSRLNI(vsrlni_h_w, 32, uint32_t, H, W)
 VSRLNI(vsrlni_w_d, 64, uint64_t, W, D)
 
-#define VSRANI(NAME, BIT, E1, E2)                         \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int i, max;                                           \
-    VReg temp;                                            \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-                                                          \
-    temp.D(0) = 0;                                        \
-    temp.D(1) = 0;                                        \
-    max = LSX_LEN/BIT;                                    \
-    for (i = 0; i < max; i++) {                           \
-        temp.E1(i) = R_SHIFT(Vj->E2(i), imm);             \
-        temp.E1(i + max) = R_SHIFT(Vd->E2(i), imm);       \
-    }                                                     \
-    *Vd = temp;                                           \
-}
-
-void HELPER(vsrani_d_q)(CPULoongArchState *env,
-                        uint32_t vd, uint32_t vj, uint32_t imm)
+#define VSRANI(NAME, BIT, E1, E2)                                  \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i, max;                                                    \
+    VReg temp;                                                     \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    temp.D(0) = 0;                                                 \
+    temp.D(1) = 0;                                                 \
+    max = LSX_LEN/BIT;                                             \
+    for (i = 0; i < max; i++) {                                    \
+        temp.E1(i) = R_SHIFT(Vj->E2(i), imm);                      \
+        temp.E1(i + max) = R_SHIFT(Vd->E2(i), imm);                \
+    }                                                              \
+    *Vd = temp;                                                    \
+}
+
+void HELPER(vsrani_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp.D(0) = 0;
     temp.D(1) = 0;
@@ -1058,13 +1042,12 @@ VSRANI(vsrani_h_w, 32, H, W)
 VSRANI(vsrani_w_d, 64, W, D)
 
 #define VSRLRN(NAME, BIT, T, E1, E2)                                \
-void HELPER(NAME)(CPULoongArchState *env,                           \
-                  uint32_t vd, uint32_t vj, uint32_t vk)            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)      \
 {                                                                   \
     int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-    VReg *Vk = &(env->fpr[vk].vreg);                                \
+    VReg *Vd = (VReg *)vd;                                          \
+    VReg *Vj = (VReg *)vj;                                          \
+    VReg *Vk = (VReg *)vk;                                          \
                                                                     \
     for (i = 0; i < LSX_LEN/BIT; i++) {                             \
         Vd->E1(i) = do_vsrlr_ ## E2(Vj->E2(i), ((T)Vk->E2(i))%BIT); \
@@ -1077,13 +1060,12 @@ VSRLRN(vsrlrn_h_w, 32, uint32_t, H, W)
 VSRLRN(vsrlrn_w_d, 64, uint64_t, W, D)
 
 #define VSRARN(NAME, BIT, T, E1, E2)                                \
-void HELPER(NAME)(CPULoongArchState *env,                           \
-                  uint32_t vd, uint32_t vj, uint32_t vk)            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)      \
 {                                                                   \
     int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-    VReg *Vk = &(env->fpr[vk].vreg);                                \
+    VReg *Vd = (VReg *)vd;                                          \
+    VReg *Vj = (VReg *)vj;                                          \
+    VReg *Vk = (VReg *)vk;                                          \
                                                                     \
     for (i = 0; i < LSX_LEN/BIT; i++) {                             \
         Vd->E1(i) = do_vsrar_ ## E2(Vj->E2(i), ((T)Vk->E2(i))%BIT); \
@@ -1095,31 +1077,29 @@ VSRARN(vsrarn_b_h, 16, uint8_t,  B, H)
 VSRARN(vsrarn_h_w, 32, uint16_t, H, W)
 VSRARN(vsrarn_w_d, 64, uint32_t, W, D)
 
-#define VSRLRNI(NAME, BIT, E1, E2)                          \
-void HELPER(NAME)(CPULoongArchState *env,                   \
-                  uint32_t vd, uint32_t vj, uint32_t imm)   \
-{                                                           \
-    int i, max;                                             \
-    VReg temp;                                              \
-    VReg *Vd = &(env->fpr[vd].vreg);                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                        \
-                                                            \
-    temp.D(0) = 0;                                          \
-    temp.D(1) = 0;                                          \
-    max = LSX_LEN/BIT;                                      \
-    for (i = 0; i < max; i++) {                             \
-        temp.E1(i) = do_vsrlr_ ## E2(Vj->E2(i), imm);       \
-        temp.E1(i + max) = do_vsrlr_ ## E2(Vd->E2(i), imm); \
-    }                                                       \
-    *Vd = temp;                                             \
-}
-
-void HELPER(vsrlrni_d_q)(CPULoongArchState *env,
-                         uint32_t vd, uint32_t vj, uint32_t imm)
+#define VSRLRNI(NAME, BIT, E1, E2)                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i, max;                                                    \
+    VReg temp;                                                     \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    temp.D(0) = 0;                                                 \
+    temp.D(1) = 0;                                                 \
+    max = LSX_LEN/BIT;                                             \
+    for (i = 0; i < max; i++) {                                    \
+        temp.E1(i) = do_vsrlr_ ## E2(Vj->E2(i), imm);              \
+        temp.E1(i + max) = do_vsrlr_ ## E2(Vd->E2(i), imm);        \
+    }                                                              \
+    *Vd = temp;                                                    \
+}
+
+void HELPER(vsrlrni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
     Int128 r1, r2;
 
     if (imm == 0) {
@@ -1139,31 +1119,29 @@ VSRLRNI(vsrlrni_b_h, 16, B, H)
 VSRLRNI(vsrlrni_h_w, 32, H, W)
 VSRLRNI(vsrlrni_w_d, 64, W, D)
 
-#define VSRARNI(NAME, BIT, E1, E2)                          \
-void HELPER(NAME)(CPULoongArchState *env,                   \
-                  uint32_t vd, uint32_t vj, uint32_t imm)   \
-{                                                           \
-    int i, max;                                             \
-    VReg temp;                                              \
-    VReg *Vd = &(env->fpr[vd].vreg);                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                        \
-                                                            \
-    temp.D(0) = 0;                                          \
-    temp.D(1) = 0;                                          \
-    max = LSX_LEN/BIT;                                      \
-    for (i = 0; i < max; i++) {                             \
-        temp.E1(i) = do_vsrar_ ## E2(Vj->E2(i), imm);       \
-        temp.E1(i + max) = do_vsrar_ ## E2(Vd->E2(i), imm); \
-    }                                                       \
-    *Vd = temp;                                             \
-}
-
-void HELPER(vsrarni_d_q)(CPULoongArchState *env,
-                         uint32_t vd, uint32_t vj, uint32_t imm)
+#define VSRARNI(NAME, BIT, E1, E2)                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i, max;                                                    \
+    VReg temp;                                                     \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    temp.D(0) = 0;                                                 \
+    temp.D(1) = 0;                                                 \
+    max = LSX_LEN/BIT;                                             \
+    for (i = 0; i < max; i++) {                                    \
+        temp.E1(i) = do_vsrar_ ## E2(Vj->E2(i), imm);              \
+        temp.E1(i + max) = do_vsrar_ ## E2(Vd->E2(i), imm);        \
+    }                                                              \
+    *Vd = temp;                                                    \
+}
+
+void HELPER(vsrarni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
     Int128 r1, r2;
 
     if (imm == 0) {
@@ -1206,13 +1184,12 @@ SSRLNS(H, uint32_t, int32_t, uint16_t)
 SSRLNS(W, uint64_t, int64_t, uint32_t)
 
 #define VSSRLN(NAME, BIT, T, E1, E2)                                          \
-void HELPER(NAME)(CPULoongArchState *env,                                     \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
 {                                                                             \
     int i;                                                                    \
-    VReg *Vd = &(env->fpr[vd].vreg);                                          \
-    VReg *Vj = &(env->fpr[vj].vreg);                                          \
-    VReg *Vk = &(env->fpr[vk].vreg);                                          \
+    VReg *Vd = (VReg *)vd;                                                    \
+    VReg *Vj = (VReg *)vj;                                                    \
+    VReg *Vk = (VReg *)vk;                                                    \
                                                                               \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
         Vd->E1(i) = do_ssrlns_ ## E1(Vj->E2(i), (T)Vk->E2(i)% BIT, BIT/2 -1); \
@@ -1249,13 +1226,12 @@ SSRANS(H, int32_t, int16_t)
 SSRANS(W, int64_t, int32_t)
 
 #define VSSRAN(NAME, BIT, T, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                    \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                     \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)               \
 {                                                                            \
     int i;                                                                   \
-    VReg *Vd = &(env->fpr[vd].vreg);                                         \
-    VReg *Vj = &(env->fpr[vj].vreg);                                         \
-    VReg *Vk = &(env->fpr[vk].vreg);                                         \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+    VReg *Vk = (VReg *)vk;                                                   \
                                                                              \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
         Vd->E1(i) = do_ssrans_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
@@ -1290,13 +1266,12 @@ SSRLNU(H, uint32_t, uint16_t, int32_t)
 SSRLNU(W, uint64_t, uint32_t, int64_t)
 
 #define VSSRLNU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(CPULoongArchState *env,                                 \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
 {                                                                         \
     int i;                                                                \
-    VReg *Vd = &(env->fpr[vd].vreg);                                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                                      \
-    VReg *Vk = &(env->fpr[vk].vreg);                                      \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
                                                                           \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
         Vd->E1(i) = do_ssrlnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
@@ -1334,13 +1309,12 @@ SSRANU(H, uint32_t, uint16_t, int32_t)
 SSRANU(W, uint64_t, uint32_t, int64_t)
 
 #define VSSRANU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(CPULoongArchState *env,                                 \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
 {                                                                         \
     int i;                                                                \
-    VReg *Vd = &(env->fpr[vd].vreg);                                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                                      \
-    VReg *Vk = &(env->fpr[vk].vreg);                                      \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
                                                                           \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
         Vd->E1(i) = do_ssranu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
@@ -1353,13 +1327,12 @@ VSSRANU(vssran_hu_w, 32, uint32_t, H, W)
 VSSRANU(vssran_wu_d, 64, uint64_t, W, D)
 
 #define VSSRLNI(NAME, BIT, E1, E2)                                            \
-void HELPER(NAME)(CPULoongArchState *env,                                     \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                     \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)            \
 {                                                                             \
     int i;                                                                    \
     VReg temp;                                                                \
-    VReg *Vd = &(env->fpr[vd].vreg);                                          \
-    VReg *Vj = &(env->fpr[vj].vreg);                                          \
+    VReg *Vd = (VReg *)vd;                                                    \
+    VReg *Vj = (VReg *)vj;                                                    \
                                                                               \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
         temp.E1(i) = do_ssrlns_ ## E1(Vj->E2(i), imm, BIT/2 -1);              \
@@ -1368,12 +1341,11 @@ void HELPER(NAME)(CPULoongArchState *env,                                     \
     *Vd = temp;                                                               \
 }
 
-void HELPER(vssrlni_d_q)(CPULoongArchState *env,
-                         uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vssrlni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     Int128 shft_res1, shft_res2, mask;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     if (imm == 0) {
         shft_res1 = Vj->Q(0);
@@ -1402,13 +1374,12 @@ VSSRLNI(vssrlni_h_w, 32, H, W)
 VSSRLNI(vssrlni_w_d, 64, W, D)
 
 #define VSSRANI(NAME, BIT, E1, E2)                                             \
-void HELPER(NAME)(CPULoongArchState *env,                                      \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                      \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)             \
 {                                                                              \
     int i;                                                                     \
     VReg temp;                                                                 \
-    VReg *Vd = &(env->fpr[vd].vreg);                                           \
-    VReg *Vj = &(env->fpr[vj].vreg);                                           \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
                                                                                \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                        \
         temp.E1(i) = do_ssrans_ ## E1(Vj->E2(i), imm, BIT/2 -1);               \
@@ -1417,12 +1388,11 @@ void HELPER(NAME)(CPULoongArchState *env,                                      \
     *Vd = temp;                                                                \
 }
 
-void HELPER(vssrani_d_q)(CPULoongArchState *env,
-                         uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vssrani_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     Int128 shft_res1, shft_res2, mask, min;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd; 
+    VReg *Vj = (VReg *)vj; 
 
     if (imm == 0) {
         shft_res1 = Vj->Q(0);
@@ -1456,13 +1426,12 @@ VSSRANI(vssrani_h_w, 32, H, W)
 VSSRANI(vssrani_w_d, 64, W, D)
 
 #define VSSRLNUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                   \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                   \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)          \
 {                                                                           \
     int i;                                                                  \
     VReg temp;                                                              \
-    VReg *Vd = &(env->fpr[vd].vreg);                                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                                        \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
                                                                             \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
         temp.E1(i) = do_ssrlnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
@@ -1471,12 +1440,11 @@ void HELPER(NAME)(CPULoongArchState *env,                                   \
     *Vd = temp;                                                             \
 }
 
-void HELPER(vssrlni_du_q)(CPULoongArchState *env,
-                         uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vssrlni_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     Int128 shft_res1, shft_res2, mask;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     if (imm == 0) {
         shft_res1 = Vj->Q(0);
@@ -1505,13 +1473,12 @@ VSSRLNUI(vssrlni_hu_w, 32, H, W)
 VSSRLNUI(vssrlni_wu_d, 64, W, D)
 
 #define VSSRANUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                   \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                   \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)          \
 {                                                                           \
     int i;                                                                  \
     VReg temp;                                                              \
-    VReg *Vd = &(env->fpr[vd].vreg);                                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                                        \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
                                                                             \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
         temp.E1(i) = do_ssranu_ ## E1(Vj->E2(i), imm, BIT/2);               \
@@ -1520,12 +1487,11 @@ void HELPER(NAME)(CPULoongArchState *env,                                   \
     *Vd = temp;                                                             \
 }
 
-void HELPER(vssrani_du_q)(CPULoongArchState *env,
-                         uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vssrani_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     Int128 shft_res1, shft_res2, mask;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     if (imm == 0) {
         shft_res1 = Vj->Q(0);
@@ -1582,13 +1548,12 @@ SSRLRNS(H, W, uint32_t, int32_t, uint16_t)
 SSRLRNS(W, D, uint64_t, int64_t, uint32_t)
 
 #define VSSRLRN(NAME, BIT, T, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                     \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
 {                                                                             \
     int i;                                                                    \
-    VReg *Vd = &(env->fpr[vd].vreg);                                          \
-    VReg *Vj = &(env->fpr[vj].vreg);                                          \
-    VReg *Vk = &(env->fpr[vk].vreg);                                          \
+    VReg *Vd = (VReg *)vd;                                                    \
+    VReg *Vj = (VReg *)vj;                                                    \
+    VReg *Vk = (VReg *)vk;                                                    \
                                                                               \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
         Vd->E1(i) = do_ssrlrns_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
@@ -1622,13 +1587,12 @@ SSRARNS(H, W, int32_t, int16_t)
 SSRARNS(W, D, int64_t, int32_t)
 
 #define VSSRARN(NAME, BIT, T, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                     \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
 {                                                                             \
     int i;                                                                    \
-    VReg *Vd = &(env->fpr[vd].vreg);                                          \
-    VReg *Vj = &(env->fpr[vj].vreg);                                          \
-    VReg *Vk = &(env->fpr[vk].vreg);                                          \
+    VReg *Vd = (VReg *)vd;                                                    \
+    VReg *Vj = (VReg *)vj;                                                    \
+    VReg *Vk = (VReg *)vk;                                                    \
                                                                               \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
         Vd->E1(i) = do_ssrarns_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
@@ -1661,13 +1625,12 @@ SSRLRNU(H, W, uint32_t, uint16_t, int32_t)
 SSRLRNU(W, D, uint64_t, uint32_t, int64_t)
 
 #define VSSRLRNU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(CPULoongArchState *env,                                  \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)             \
 {                                                                          \
     int i;                                                                 \
-    VReg *Vd = &(env->fpr[vd].vreg);                                       \
-    VReg *Vj = &(env->fpr[vj].vreg);                                       \
-    VReg *Vk = &(env->fpr[vk].vreg);                                       \
+    VReg *Vd = (VReg *)vd;                                                 \
+    VReg *Vj = (VReg *)vj;                                                 \
+    VReg *Vk = (VReg *)vk;                                                 \
                                                                            \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                    \
         Vd->E1(i) = do_ssrlrnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
@@ -1703,13 +1666,12 @@ SSRARNU(H, W, uint32_t, uint16_t, int32_t)
 SSRARNU(W, D, uint64_t, uint32_t, int64_t)
 
 #define VSSRARNU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(CPULoongArchState *env,                                  \
-                  uint32_t vd, uint32_t vj, uint32_t vk)                   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)             \
 {                                                                          \
     int i;                                                                 \
-    VReg *Vd = &(env->fpr[vd].vreg);                                       \
-    VReg *Vj = &(env->fpr[vj].vreg);                                       \
-    VReg *Vk = &(env->fpr[vk].vreg);                                       \
+    VReg *Vd = (VReg *)vd;                                                 \
+    VReg *Vj = (VReg *)vj;                                                 \
+    VReg *Vk = (VReg *)vk;                                                 \
                                                                            \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                    \
         Vd->E1(i) = do_ssrarnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
@@ -1722,13 +1684,12 @@ VSSRARNU(vssrarn_hu_w, 32, uint32_t, H, W)
 VSSRARNU(vssrarn_wu_d, 64, uint64_t, W, D)
 
 #define VSSRLRNI(NAME, BIT, E1, E2)                                            \
-void HELPER(NAME)(CPULoongArchState *env,                                      \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                      \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)             \
 {                                                                              \
     int i;                                                                     \
     VReg temp;                                                                 \
-    VReg *Vd = &(env->fpr[vd].vreg);                                           \
-    VReg *Vj = &(env->fpr[vj].vreg);                                           \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
                                                                                \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                        \
         temp.E1(i) = do_ssrlrns_ ## E1(Vj->E2(i), imm, BIT/2 -1);              \
@@ -1738,12 +1699,11 @@ void HELPER(NAME)(CPULoongArchState *env,                                      \
 }
 
 #define VSSRLRNI_Q(NAME, sh)                                               \
-void HELPER(NAME)(CPULoongArchState *env,                                  \
-                          uint32_t vd, uint32_t vj, uint32_t imm)          \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)         \
 {                                                                          \
     Int128 shft_res1, shft_res2, mask, r1, r2;                             \
-    VReg *Vd = &(env->fpr[vd].vreg);                                       \
-    VReg *Vj = &(env->fpr[vj].vreg);                                       \
+    VReg *Vd = (VReg *)vd;                                                 \
+    VReg *Vj = (VReg *)vj;                                                 \
                                                                            \
     if (imm == 0) {                                                        \
         shft_res1 = Vj->Q(0);                                              \
@@ -1777,13 +1737,12 @@ VSSRLRNI(vssrlrni_w_d, 64, W, D)
 VSSRLRNI_Q(vssrlrni_d_q, 63)
 
 #define VSSRARNI(NAME, BIT, E1, E2)                                             \
-void HELPER(NAME)(CPULoongArchState *env,                                       \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                       \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)              \
 {                                                                               \
     int i;                                                                      \
     VReg temp;                                                                  \
-    VReg *Vd = &(env->fpr[vd].vreg);                                            \
-    VReg *Vj = &(env->fpr[vj].vreg);                                            \
+    VReg *Vd = (VReg *)vd;                                                      \
+    VReg *Vj = (VReg *)vj;                                                      \
                                                                                 \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                         \
         temp.E1(i) = do_ssrarns_ ## E1(Vj->E2(i), imm, BIT/2 -1);               \
@@ -1792,12 +1751,11 @@ void HELPER(NAME)(CPULoongArchState *env,
     *Vd = temp;                                                                 \
 }
 
-void HELPER(vssrarni_d_q)(CPULoongArchState *env,
-                          uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vssrarni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     if (imm == 0) {
         shft_res1 = Vj->Q(0);
@@ -1835,13 +1793,12 @@ VSSRARNI(vssrarni_h_w, 32, H, W)
 VSSRARNI(vssrarni_w_d, 64, W, D)
 
 #define VSSRLRNUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                    \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                    \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)           \
 {                                                                            \
     int i;                                                                   \
     VReg temp;                                                               \
-    VReg *Vd = &(env->fpr[vd].vreg);                                         \
-    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
                                                                              \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
         temp.E1(i) = do_ssrlrnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
@@ -1856,13 +1813,12 @@ VSSRLRNUI(vssrlrni_wu_d, 64, W, D)
 VSSRLRNI_Q(vssrlrni_du_q, 64)
 
 #define VSSRARNUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(CPULoongArchState *env,                                    \
-                  uint32_t vd, uint32_t vj, uint32_t imm)                    \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)           \
 {                                                                            \
     int i;                                                                   \
     VReg temp;                                                               \
-    VReg *Vd = &(env->fpr[vd].vreg);                                         \
-    VReg *Vj = &(env->fpr[vj].vreg);                                         \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
                                                                              \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
         temp.E1(i) = do_ssrarnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
@@ -1871,12 +1827,11 @@ void HELPER(NAME)(CPULoongArchState *env,                                    \
     *Vd = temp;                                                              \
 }
 
-void HELPER(vssrarni_du_q)(CPULoongArchState *env,
-                           uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vssrarni_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     if (imm == 0) {
         shft_res1 = Vj->Q(0);
@@ -1920,17 +1875,17 @@ VSSRARNUI(vssrarni_bu_h, 16, B, H)
 VSSRARNUI(vssrarni_hu_w, 32, H, W)
 VSSRARNUI(vssrarni_wu_d, 64, W, D)
 
-#define DO_2OP(NAME, BIT, E, DO_OP)                                 \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++)                               \
-    {                                                               \
-        Vd->E(i) = DO_OP(Vj->E(i));                                 \
-    }                                                               \
+#define DO_2OP(NAME, BIT, E, DO_OP)                  \
+void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
+{                                                    \
+    int i;                                           \
+    VReg *Vd = (VReg *)vd;                           \
+    VReg *Vj = (VReg *)vj;                           \
+                                                     \
+    for (i = 0; i < LSX_LEN/BIT; i++)                \
+    {                                                \
+        Vd->E(i) = DO_OP(Vj->E(i));                  \
+    }                                                \
 }
 
 #define DO_CLO_B(N)  (clz32(~N & 0xff) - 24)
@@ -1951,17 +1906,17 @@ DO_2OP(vclz_h, 16, UH, DO_CLZ_H)
 DO_2OP(vclz_w, 32, UW, DO_CLZ_W)
 DO_2OP(vclz_d, 64, UD, DO_CLZ_D)
 
-#define VPCNT(NAME, BIT, E, FN)                                     \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++)                               \
-    {                                                               \
-        Vd->E(i) = FN(Vj->E(i));                                    \
-    }                                                               \
+#define VPCNT(NAME, BIT, E, FN)                      \
+void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
+{                                                    \
+    int i;                                           \
+    VReg *Vd = (VReg *)vd;                           \
+    VReg *Vj = (VReg *)vj;                           \
+                                                     \
+    for (i = 0; i < LSX_LEN/BIT; i++)                \
+    {                                                \
+        Vd->E(i) = FN(Vj->E(i));                     \
+    }                                                \
 }
 
 VPCNT(vpcnt_b, 8, UB, ctpop8)
@@ -2024,42 +1979,40 @@ DO_BITI(vbitrevi_h, 16, UH, DO_BITREV)
 DO_BITI(vbitrevi_w, 32, UW, DO_BITREV)
 DO_BITI(vbitrevi_d, 64, UD, DO_BITREV)
 
-#define VFRSTP(NAME, BIT, MASK, E)                       \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i, m;                                            \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        if (Vj->E(i) < 0) {                              \
-            break;                                       \
-        }                                                \
-    }                                                    \
-    m = Vk->E(0) & MASK;                                 \
-    Vd->E(m) = i;                                        \
+#define VFRSTP(NAME, BIT, MASK, E)                             \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i, m;                                                  \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        if (Vj->E(i) < 0) {                                    \
+            break;                                             \
+        }                                                      \
+    }                                                          \
+    m = Vk->E(0) & MASK;                                       \
+    Vd->E(m) = i;                                              \
 }
 
 VFRSTP(vfrstp_b, 8, 0xf, B)
 VFRSTP(vfrstp_h, 16, 0x7, H)
 
-#define VFRSTPI(NAME, BIT, E)                             \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int i, m;                                             \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-                                                          \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
-        if (Vj->E(i) < 0) {                               \
-            break;                                        \
-        }                                                 \
-    }                                                     \
-    m = imm % (LSX_LEN/BIT);                              \
-    Vd->E(m) = i;                                         \
+#define VFRSTPI(NAME, BIT, E)                                     \
+void HELPER(NAME)(void *vd, void vj, uint64_t imm, uint32_t desc) \
+{                                                                 \
+    int i, m;                                                     \
+    VReg *Vd = (VReg *)vd;                                        \
+    VReg *Vj = (VReg *)vj;                                        \
+                                                                  \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                           \
+        if (Vj->E(i) < 0) {                                       \
+            break;                                                \
+        }                                                         \
+    }                                                             \
+    m = imm % (LSX_LEN/BIT);                                      \
+    Vd->E(m) = i;                                                 \
 }
 
 VFRSTPI(vfrstpi_b, 8,  B)
@@ -2097,13 +2050,13 @@ static inline void vec_clear_cause(CPULoongArchState *env)
 }
 
 #define DO_3OP_F(NAME, BIT, E, FN)                          \
-void HELPER(NAME)(CPULoongArchState *env,                   \
-                  uint32_t vd, uint32_t vj, uint32_t vk)    \
+void HELPER(NAME)(void *vd, void *vj, void *vk,             \
+                  CPULoongArchState *env, uint32_t desc)    \
 {                                                           \
     int i;                                                  \
-    VReg *Vd = &(env->fpr[vd].vreg);                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                        \
-    VReg *Vk = &(env->fpr[vk].vreg);                        \
+    VReg *Vd = (VReg *)vd;                                  \
+    VReg *Vj = (VReg *)vj;                                  \
+    VReg *Vk = (VReg *)vk;                                  \
                                                             \
     vec_clear_cause(env);                                   \
     for (i = 0; i < LSX_LEN/BIT; i++) {                     \
@@ -2130,14 +2083,14 @@ DO_3OP_F(vfmina_s, 32, UW, float32_minnummag)
 DO_3OP_F(vfmina_d, 64, UD, float64_minnummag)
 
 #define DO_4OP_F(NAME, BIT, E, FN, flags)                                    \
-void HELPER(NAME)(CPULoongArchState *env,                                    \
-                  uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)        \
+void HELPER(NAME)(void *vd, void *vj, void *vk, void *va,                    \
+                  CPULoongArchState *env, uint32_t desc)                     \
 {                                                                            \
     int i;                                                                   \
-    VReg *Vd = &(env->fpr[vd].vreg);                                         \
-    VReg *Vj = &(env->fpr[vj].vreg);                                         \
-    VReg *Vk = &(env->fpr[vk].vreg);                                         \
-    VReg *Va = &(env->fpr[va].vreg);                                         \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+    VReg *Vk = (VReg *)vk;                                                   \
+    VReg *Va = (VReg *)va;                                                   \
                                                                              \
     vec_clear_cause(env);                                                    \
     for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
@@ -2157,17 +2110,17 @@ DO_4OP_F(vfnmsub_s, 32, UW, float32_muladd,
 DO_4OP_F(vfnmsub_d, 64, UD, float64_muladd,
          float_muladd_negate_c | float_muladd_negate_result)
 
-#define DO_2OP_F(NAME, BIT, E, FN)                                  \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    vec_clear_cause(env);                                           \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
-        Vd->E(i) = FN(env, Vj->E(i));                               \
-    }                                                               \
+#define DO_2OP_F(NAME, BIT, E, FN)                                           \
+void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
+{                                                                            \
+    int i;                                                                   \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+                                                                             \
+    vec_clear_cause(env);                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        Vd->E(i) = FN(env, Vj->E(i));                                        \
+    }                                                                        \
 }
 
 #define FLOGB(BIT, T)                                            \
@@ -2188,16 +2141,16 @@ static T do_flogb_## BIT(CPULoongArchState *env, T fj)           \
 FLOGB(32, uint32_t)
 FLOGB(64, uint64_t)
 
-#define FCLASS(NAME, BIT, E, FN)                                    \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
-        Vd->E(i) = FN(env, Vj->E(i));                               \
-    }                                                               \
+#define FCLASS(NAME, BIT, E, FN)                                             \
+void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
+{                                                                            \
+    int i;                                                                   \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+                                                                             \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        Vd->E(i) = FN(env, Vj->E(i));                                        \
+    }                                                                        \
 }
 
 FCLASS(vfclass_s, 32, UW, helper_fclass_s)
@@ -2267,12 +2220,13 @@ static uint32_t float64_cvt_float32(uint64_t d, float_status *status)
     return float64_to_float32(d, status);
 }
 
-void HELPER(vfcvtl_s_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vfcvtl_s_h)(void *vd, void *vj,
+                        CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < LSX_LEN/32; i++) {
@@ -2282,12 +2236,13 @@ void HELPER(vfcvtl_s_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     *Vd = temp;
 }
 
-void HELPER(vfcvtl_d_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vfcvtl_d_s)(void *vd, void *vj,
+                        CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < LSX_LEN/64; i++) {
@@ -2297,12 +2252,13 @@ void HELPER(vfcvtl_d_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     *Vd = temp;
 }
 
-void HELPER(vfcvth_s_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vfcvth_s_h)(void *vd, void *vj,
+                        CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < LSX_LEN/32; i++) {
@@ -2312,12 +2268,13 @@ void HELPER(vfcvth_s_h)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     *Vd = temp;
 }
 
-void HELPER(vfcvth_d_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vfcvth_d_s)(void *vd, void *vj,
+                        CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < LSX_LEN/64; i++) {
@@ -2327,14 +2284,14 @@ void HELPER(vfcvth_d_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     *Vd = temp;
 }
 
-void HELPER(vfcvt_h_s)(CPULoongArchState *env,
-                       uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vfcvt_h_s)(void *vd, void *vj, void *vk,
+                       CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     vec_clear_cause(env);
     for(i = 0; i < LSX_LEN/32; i++) {
@@ -2345,14 +2302,14 @@ void HELPER(vfcvt_h_s)(CPULoongArchState *env,
     *Vd = temp;
 }
 
-void HELPER(vfcvt_s_d)(CPULoongArchState *env,
-                       uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vfcvt_s_d)(void *vd, void *vj, void *vk,
+                       CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     vec_clear_cause(env);
     for(i = 0; i < LSX_LEN/64; i++) {
@@ -2363,24 +2320,26 @@ void HELPER(vfcvt_s_d)(CPULoongArchState *env,
     *Vd = temp;
 }
 
-void HELPER(vfrint_s)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vfrint_s)(void *vd, void *vj,
+                      CPULoongArchState *env, uint32_t desc)
 {
     int i;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < 4; i++) {
         Vd->W(i) = float32_round_to_int(Vj->UW(i), &env->fp_status);
         vec_update_fcsr0(env, GETPC());
     }
-}
 
-void HELPER(vfrint_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+
+void HELPER(vfrint_d)(void *vd, void *vj,
+                      CPULoongArchState *env, uint32_t desc)
 {
     int i;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < 2; i++) {
@@ -2389,21 +2348,21 @@ void HELPER(vfrint_d)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     }
 }
 
-#define FCVT_2OP(NAME, BIT, E, MODE)                                        \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj)         \
-{                                                                           \
-    int i;                                                                  \
-    VReg *Vd = &(env->fpr[vd].vreg);                                        \
-    VReg *Vj = &(env->fpr[vj].vreg);                                        \
-                                                                            \
-    vec_clear_cause(env);                                                   \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
-        FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status); \
-        set_float_rounding_mode(MODE, &env->fp_status);                     \
-        Vd->E(i) = float## BIT ## _round_to_int(Vj->E(i), &env->fp_status); \
-        set_float_rounding_mode(old_mode, &env->fp_status);                 \
-        vec_update_fcsr0(env, GETPC());                                     \
-    }                                                                       \
+#define FCVT_2OP(NAME, BIT, E, MODE)                                         \
+void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
+{                                                                            \
+    int i;                                                                   \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+                                                                             \
+    vec_clear_cause(env);                                                    \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+        FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status);  \
+        set_float_rounding_mode(MODE, &env->fp_status);                      \
+        Vd->E(i) = float## BIT ## _round_to_int(Vj->E(i), &env->fp_status);  \
+        set_float_rounding_mode(old_mode, &env->fp_status);                  \
+        vec_update_fcsr0(env, GETPC());                                      \
+    }                                                                        \
 }
 
 FCVT_2OP(vfrintrne_s, 32, UW, float_round_nearest_even)
@@ -2482,22 +2441,22 @@ FTINT(rp_w_d, float64, int32, uint64_t, uint32_t, float_round_up)
 FTINT(rz_w_d, float64, int32, uint64_t, uint32_t, float_round_to_zero)
 FTINT(rne_w_d, float64, int32, uint64_t, uint32_t, float_round_nearest_even)
 
-#define FTINT_W_D(NAME, FN)                              \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    vec_clear_cause(env);                                \
-    for (i = 0; i < 2; i++) {                            \
-        temp.W(i + 2) = FN(env, Vj->UD(i));              \
-        temp.W(i) = FN(env, Vk->UD(i));                  \
-    }                                                    \
-    *Vd = temp;                                          \
+#define FTINT_W_D(NAME, FN)                             \
+void HELPER(NAME)(void *vd, void *vj, void *vk,         \
+                  CPULoongArchState *env,uint32_t desc) \
+{                                                       \
+    int i;                                              \
+    VReg temp;                                          \
+    VReg *Vd = (VReg *)vd;                              \
+    VReg *Vj = (VReg *)vj;                              \
+    VReg *Vk = (VReg *)vk;                              \
+                                                        \
+    vec_clear_cause(env);                               \
+    for (i = 0; i < 2; i++) {                           \
+        temp.W(i + 2) = FN(env, Vj->UD(i));             \
+        temp.W(i) = FN(env, Vk->UD(i));                 \
+    }                                                   \
+    *Vd = temp;                                         \
 }
 
 FTINT_W_D(vftint_w_d, do_float64_to_int32)
@@ -2515,19 +2474,19 @@ FTINT(rph_l_s, float32, int64, uint32_t, uint64_t, float_round_up)
 FTINT(rzh_l_s, float32, int64, uint32_t, uint64_t, float_round_to_zero)
 FTINT(rneh_l_s, float32, int64, uint32_t, uint64_t, float_round_nearest_even)
 
-#define FTINTL_L_S(NAME, FN)                                        \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg temp;                                                      \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    vec_clear_cause(env);                                           \
-    for (i = 0; i < 2; i++) {                                       \
-        temp.D(i) = FN(env, Vj->UW(i));                             \
-    }                                                               \
-    *Vd = temp;                                                     \
+#define FTINTL_L_S(NAME, FN)                                                 \
+void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
+{                                                                            \
+    int i;                                                                   \
+    VReg temp;                                                               \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+                                                                             \
+    vec_clear_cause(env);                                                    \
+    for (i = 0; i < 2; i++) {                                                \
+        temp.D(i) = FN(env, Vj->UW(i));                                      \
+    }                                                                        \
+    *Vd = temp;                                                              \
 }
 
 FTINTL_L_S(vftintl_l_s, do_float32_to_int64)
@@ -2536,19 +2495,19 @@ FTINTL_L_S(vftintrpl_l_s, do_ftintrpl_l_s)
 FTINTL_L_S(vftintrzl_l_s, do_ftintrzl_l_s)
 FTINTL_L_S(vftintrnel_l_s, do_ftintrnel_l_s)
 
-#define FTINTH_L_S(NAME, FN)                                        \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t vd, uint32_t vj) \
-{                                                                   \
-    int i;                                                          \
-    VReg temp;                                                      \
-    VReg *Vd = &(env->fpr[vd].vreg);                                \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    vec_clear_cause(env);                                           \
-    for (i = 0; i < 2; i++) {                                       \
-        temp.D(i) = FN(env, Vj->UW(i + 2));                         \
-    }                                                               \
-    *Vd = temp;                                                     \
+#define FTINTH_L_S(NAME, FN)                                                 \
+void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
+{                                                                            \
+    int i;                                                                   \
+    VReg temp;                                                               \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+                                                                             \
+    vec_clear_cause(env);                                                    \
+    for (i = 0; i < 2; i++) {                                                \
+        temp.D(i) = FN(env, Vj->UW(i + 2));                                  \
+    }                                                                        \
+    *Vd = temp;                                                              \
 }
 
 FTINTH_L_S(vftinth_l_s, do_float32_to_int64)
@@ -2577,12 +2536,13 @@ DO_2OP_F(vffint_d_l, 64, D, do_ffint_d_l)
 DO_2OP_F(vffint_s_wu, 32, UW, do_ffint_s_wu)
 DO_2OP_F(vffint_d_lu, 64, UD, do_ffint_d_lu)
 
-void HELPER(vffintl_d_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vffintl_d_w)(void *vd, void *vj,
+                         CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < 2; i++) {
@@ -2592,12 +2552,13 @@ void HELPER(vffintl_d_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     *Vd = temp;
 }
 
-void HELPER(vffinth_d_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
+void HELPER(vffinth_d_w)(void *vd, void *vj,
+                         CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     vec_clear_cause(env);
     for (i = 0; i < 2; i++) {
@@ -2607,14 +2568,14 @@ void HELPER(vffinth_d_w)(CPULoongArchState *env, uint32_t vd, uint32_t vj)
     *Vd = temp;
 }
 
-void HELPER(vffint_s_l)(CPULoongArchState *env,
-                        uint32_t vd, uint32_t vj, uint32_t vk)
+void HELPER(vffint_s_l)(void *vd, void *vj, void *vk,
+                        CPULoongArchState *env, uint32_t desc)
 {
     int i;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
 
     vec_clear_cause(env);
     for (i = 0; i < 2; i++) {
@@ -2768,21 +2729,20 @@ SETALLNEZ(vsetallnez_h, MO_16)
 SETALLNEZ(vsetallnez_w, MO_32)
 SETALLNEZ(vsetallnez_d, MO_64)
 
-#define VPACKEV(NAME, BIT, E)                            \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        temp.E(2 * i + 1) = Vj->E(2 * i);                \
-        temp.E(2 *i) = Vk->E(2 * i);                     \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VPACKEV(NAME, BIT, E)                                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        temp.E(2 * i + 1) = Vj->E(2 * i);                      \
+        temp.E(2 *i) = Vk->E(2 * i);                           \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VPACKEV(vpackev_b, 16, B)
@@ -2790,21 +2750,20 @@ VPACKEV(vpackev_h, 32, H)
 VPACKEV(vpackev_w, 64, W)
 VPACKEV(vpackev_d, 128, D)
 
-#define VPACKOD(NAME, BIT, E)                            \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        temp.E(2 * i + 1) = Vj->E(2 * i + 1);            \
-        temp.E(2 * i) = Vk->E(2 * i + 1);                \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VPACKOD(NAME, BIT, E)                                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        temp.E(2 * i + 1) = Vj->E(2 * i + 1);                  \
+        temp.E(2 * i) = Vk->E(2 * i + 1);                      \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VPACKOD(vpackod_b, 16, B)
@@ -2812,21 +2771,20 @@ VPACKOD(vpackod_h, 32, H)
 VPACKOD(vpackod_w, 64, W)
 VPACKOD(vpackod_d, 128, D)
 
-#define VPICKEV(NAME, BIT, E)                            \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i);          \
-        temp.E(i) = Vk->E(2 * i);                        \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VPICKEV(NAME, BIT, E)                                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i);                \
+        temp.E(i) = Vk->E(2 * i);                              \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VPICKEV(vpickev_b, 16, B)
@@ -2834,21 +2792,20 @@ VPICKEV(vpickev_h, 32, H)
 VPICKEV(vpickev_w, 64, W)
 VPICKEV(vpickev_d, 128, D)
 
-#define VPICKOD(NAME, BIT, E)                            \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i + 1);      \
-        temp.E(i) = Vk->E(2 * i + 1);                    \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VPICKOD(NAME, BIT, E)                                  \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i + 1);            \
+        temp.E(i) = Vk->E(2 * i + 1);                          \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VPICKOD(vpickod_b, 16, B)
@@ -2856,21 +2813,20 @@ VPICKOD(vpickod_h, 32, H)
 VPICKOD(vpickod_w, 64, W)
 VPICKOD(vpickod_d, 128, D)
 
-#define VILVL(NAME, BIT, E)                              \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        temp.E(2 * i + 1) = Vj->E(i);                    \
-        temp.E(2 * i) = Vk->E(i);                        \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VILVL(NAME, BIT, E)                                    \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        temp.E(2 * i + 1) = Vj->E(i);                          \
+        temp.E(2 * i) = Vk->E(i);                              \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VILVL(vilvl_b, 16, B)
@@ -2878,21 +2834,20 @@ VILVL(vilvl_h, 32, H)
 VILVL(vilvl_w, 64, W)
 VILVL(vilvl_d, 128, D)
 
-#define VILVH(NAME, BIT, E)                              \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i;                                               \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                  \
-        temp.E(2 * i + 1) = Vj->E(i + LSX_LEN/BIT);      \
-        temp.E(2 * i) = Vk->E(i + LSX_LEN/BIT);          \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VILVH(NAME, BIT, E)                                    \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+        temp.E(2 * i + 1) = Vj->E(i + LSX_LEN/BIT);            \
+        temp.E(2 * i) = Vk->E(i + LSX_LEN/BIT);                \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VILVH(vilvh_b, 16, B)
@@ -2900,15 +2855,14 @@ VILVH(vilvh_h, 32, H)
 VILVH(vilvh_w, 64, W)
 VILVH(vilvh_d, 128, D)
 
-void HELPER(vshuf_b)(CPULoongArchState *env,
-                     uint32_t vd, uint32_t vj, uint32_t vk, uint32_t va)
+void HELPER(vshuf_b)(void *vd, void *vj, void *vk, void *va, uint32_t desc)
 {
     int i, m;
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
-    VReg *Vk = &(env->fpr[vk].vreg);
-    VReg *Va = &(env->fpr[va].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+    VReg *Va = (VReg *)va;
 
     m = LSX_LEN/8;
     for (i = 0; i < m ; i++) {
@@ -2918,53 +2872,50 @@ void HELPER(vshuf_b)(CPULoongArchState *env,
     *Vd = temp;
 }
 
-#define VSHUF(NAME, BIT, E)                              \
-void HELPER(NAME)(CPULoongArchState *env,                \
-                  uint32_t vd, uint32_t vj, uint32_t vk) \
-{                                                        \
-    int i, m;                                            \
-    VReg temp;                                           \
-    VReg *Vd = &(env->fpr[vd].vreg);                     \
-    VReg *Vj = &(env->fpr[vj].vreg);                     \
-    VReg *Vk = &(env->fpr[vk].vreg);                     \
-                                                         \
-    m = LSX_LEN/BIT;                                     \
-    for (i = 0; i < m; i++) {                            \
-        uint64_t k  = ((uint8_t) Vd->E(i)) % (2 * m);    \
-        temp.E(i) = k < m ? Vk->E(k) : Vj->E(k - m);     \
-    }                                                    \
-    *Vd = temp;                                          \
+#define VSHUF(NAME, BIT, E)                                    \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i, m;                                                  \
+    VReg temp;                                                 \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+                                                               \
+    m = LSX_LEN/BIT;                                           \
+    for (i = 0; i < m; i++) {                                  \
+        uint64_t k  = ((uint8_t) Vd->E(i)) % (2 * m);          \
+        temp.E(i) = k < m ? Vk->E(k) : Vj->E(k - m);           \
+    }                                                          \
+    *Vd = temp;                                                \
 }
 
 VSHUF(vshuf_h, 16, H)
 VSHUF(vshuf_w, 32, W)
 VSHUF(vshuf_d, 64, D)
 
-#define VSHUF4I(NAME, BIT, E)                             \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int i;                                                \
-    VReg temp;                                            \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-                                                          \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                   \
-         temp.E(i) = Vj->E(((i) & 0xfc) + (((imm) >>      \
-                           (2 * ((i) & 0x03))) & 0x03));  \
-    }                                                     \
-    *Vd = temp;                                           \
+#define VSHUF4I(NAME, BIT, E)                                      \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg temp;                                                     \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+         temp.E(i) = Vj->E(((i) & 0xfc) + (((imm) >>               \
+                           (2 * ((i) & 0x03))) & 0x03));           \
+    }                                                              \
+    *Vd = temp;                                                    \
 }
 
 VSHUF4I(vshuf4i_b, 8, B)
 VSHUF4I(vshuf4i_h, 16, H)
 VSHUF4I(vshuf4i_w, 32, W)
 
-void HELPER(vshuf4i_d)(CPULoongArchState *env,
-                       uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vshuf4i_d)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     VReg temp;
     temp.D(0) = (imm & 2 ? Vj : Vd)->D(imm & 1);
@@ -2972,12 +2923,11 @@ void HELPER(vshuf4i_d)(CPULoongArchState *env,
     *Vd = temp;
 }
 
-void HELPER(vpermi_w)(CPULoongArchState *env,
-                      uint32_t vd, uint32_t vj, uint32_t imm)
+void HELPER(vpermi_w)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     VReg temp;
-    VReg *Vd = &(env->fpr[vd].vreg);
-    VReg *Vj = &(env->fpr[vj].vreg);
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
 
     temp.W(0) = Vj->W(imm & 0x3);
     temp.W(1) = Vj->W((imm >> 2) & 0x3);
@@ -2986,17 +2936,16 @@ void HELPER(vpermi_w)(CPULoongArchState *env,
     *Vd = temp;
 }
 
-#define VEXTRINS(NAME, BIT, E, MASK)                      \
-void HELPER(NAME)(CPULoongArchState *env,                 \
-                  uint32_t vd, uint32_t vj, uint32_t imm) \
-{                                                         \
-    int ins, extr;                                        \
-    VReg *Vd = &(env->fpr[vd].vreg);                      \
-    VReg *Vj = &(env->fpr[vj].vreg);                      \
-                                                          \
-    ins = (imm >> 4) & MASK;                              \
-    extr = imm & MASK;                                    \
-    Vd->E(ins) = Vj->E(extr);                             \
+#define VEXTRINS(NAME, BIT, E, MASK)                               \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int ins, extr;                                                 \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+                                                                   \
+    ins = (imm >> 4) & MASK;                                       \
+    extr = imm & MASK;                                             \
+    Vd->E(ins) = Vj->E(extr);                                      \
 }
 
 VEXTRINS(vextrins_b, 8, B, 0xf)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 86a0d4d6b9..5653a556bf 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -4,53 +4,90 @@
  * Copyright (c) 2022-2023 Loongson Technology Corporation Limited
  */
 
-static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
-                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
-                                  TCGv_i32, TCGv_i32))
+static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a, int oprsz,
+                     gen_helper_gvec_4 *fn)
 {
-    TCGv_i32 vd = tcg_constant_i32(a->vd);
-    TCGv_i32 vj = tcg_constant_i32(a->vj);
-    TCGv_i32 vk = tcg_constant_i32(a->vk);
-    TCGv_i32 va = tcg_constant_i32(a->va);
-
     CHECK_VEC;
-    func(cpu_env, vd, vj, vk, va);
+
+    tcg_gen_gvec_4_ool(vec_full_offset(a->vd),
+                       vec_full_offset(a->vj),
+                       vec_full_offset(a->vk),
+                       vec_full_offset(a->va),
+                       oprsz, ctx->vl / 8, oprsz, fn);
     return true;
 }
 
-static bool gen_vvv(DisasContext *ctx, arg_vvv *a,
-                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+static bool gen_vvv(DisasContext *ctx, arg_vvv *a, int oprsz,
+                    gen_helper_gvec_3 * fn)
 {
-    TCGv_i32 vd = tcg_constant_i32(a->vd);
-    TCGv_i32 vj = tcg_constant_i32(a->vj);
-    TCGv_i32 vk = tcg_constant_i32(a->vk);
+    CHECK_VEC;
+
+    tcg_gen_gvec_3_ool(vec_full_offset(a->vd),
+                       vec_full_offset(a->vj),
+                       vec_full_offset(a->vk),
+                       oprsz, ctx->vl / 8, oprsz, fn);
+    return true;
+}
 
+static bool gen_vv(DisasContext *ctx, arg_vv *a, int oprsz,
+                   gen_helper_gvec_2 * fn)
+{
     CHECK_VEC;
 
-    func(cpu_env, vd, vj, vk);
+    tcg_gen_gvec_2_ool(vec_full_offset(a->vd),
+                       vec_full_offset(a->vj),
+                       oprsz, ctx->vl / 8, oprsz, fn);
     return true;
 }
 
-static bool gen_vv(DisasContext *ctx, arg_vv *a,
-                   void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+static bool gen_vvvv_f(DisasContext *ctx, arg_vvvv *a, int oprsz,
+                       gen_helper_gvec_4_ptr *fn)
 {
-    TCGv_i32 vd = tcg_constant_i32(a->vd);
-    TCGv_i32 vj = tcg_constant_i32(a->vj);
+    CHECK_VEC;
+
+    tcg_gen_gvec_4_ptr(vec_full_offset(a->vd),
+                       vec_full_offset(a->vj),
+                       vec_full_offset(a->vk),
+                       vec_full_offset(a->va),
+                       cpu_env,
+                       oprsz, ctx->vl / 8, oprsz, fn);
+    return true;
+}
 
+static bool gen_vvv_f(DisasContext *ctx, arg_vvv *a, int oprsz,
+                      gen_helper_gvec_3_ptr * fn)
+{
     CHECK_VEC;
-    func(cpu_env, vd, vj);
+
+    tcg_gen_gvec_3_ptr(vec_full_offset(a->vd),
+                       vec_full_offset(a->vj),
+                       vec_full_offset(a->vk),
+                       cpu_env,
+                       oprsz, ctx->vl / 8, oprsz, fn);
     return true;
 }
 
-static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a,
-                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
+static bool gen_vv_f(DisasContext *ctx, arg_vv *a, int oprsz,
+                     gen_helper_gvec_2_ptr * fn)
 {
-    TCGv_i32 vd = tcg_constant_i32(a->vd);
-    TCGv_i32 vj = tcg_constant_i32(a->vj);
-    TCGv_i32 imm = tcg_constant_i32(a->imm);
+    CHECK_VEC;
 
+    tcg_gen_gvec_2_ptr(vec_full_offset(a->vd),
+                       vec_full_offset(a->vj),
+                       cpu_env,
+                       oprsz, ctx->vl / 8, oprsz, fn);
+    return true;
+}
+
+static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a, int oprsz,
+                     gen_helper_gvec_2i * fn)
+{
     CHECK_VEC;
-    func(cpu_env, vd, vj, imm);
+
+    tcg_gen_gvec_2i_ool(vec_full_offset(a->vd),
+                        vec_full_offset(a->vj),
+                        tcg_constant_i64(a->imm),
+                        oprsz, ctx->vl / 8, oprsz, fn);
     return true;
 }
 
@@ -199,22 +236,22 @@ TRANS(vssub_hu, LSX, gvec_vvv, 16, MO_16, tcg_gen_gvec_ussub)
 TRANS(vssub_wu, LSX, gvec_vvv, 16, MO_32, tcg_gen_gvec_ussub)
 TRANS(vssub_du, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_ussub)
 
-TRANS(vhaddw_h_b, LSX, gen_vvv, gen_helper_vhaddw_h_b)
-TRANS(vhaddw_w_h, LSX, gen_vvv, gen_helper_vhaddw_w_h)
-TRANS(vhaddw_d_w, LSX, gen_vvv, gen_helper_vhaddw_d_w)
-TRANS(vhaddw_q_d, LSX, gen_vvv, gen_helper_vhaddw_q_d)
-TRANS(vhaddw_hu_bu, LSX, gen_vvv, gen_helper_vhaddw_hu_bu)
-TRANS(vhaddw_wu_hu, LSX, gen_vvv, gen_helper_vhaddw_wu_hu)
-TRANS(vhaddw_du_wu, LSX, gen_vvv, gen_helper_vhaddw_du_wu)
-TRANS(vhaddw_qu_du, LSX, gen_vvv, gen_helper_vhaddw_qu_du)
-TRANS(vhsubw_h_b, LSX, gen_vvv, gen_helper_vhsubw_h_b)
-TRANS(vhsubw_w_h, LSX, gen_vvv, gen_helper_vhsubw_w_h)
-TRANS(vhsubw_d_w, LSX, gen_vvv, gen_helper_vhsubw_d_w)
-TRANS(vhsubw_q_d, LSX, gen_vvv, gen_helper_vhsubw_q_d)
-TRANS(vhsubw_hu_bu, LSX, gen_vvv, gen_helper_vhsubw_hu_bu)
-TRANS(vhsubw_wu_hu, LSX, gen_vvv, gen_helper_vhsubw_wu_hu)
-TRANS(vhsubw_du_wu, LSX, gen_vvv, gen_helper_vhsubw_du_wu)
-TRANS(vhsubw_qu_du, LSX, gen_vvv, gen_helper_vhsubw_qu_du)
+TRANS(vhaddw_h_b, LSX, gen_vvv, 16, gen_helper_vhaddw_h_b)
+TRANS(vhaddw_w_h, LSX, gen_vvv, 16, gen_helper_vhaddw_w_h)
+TRANS(vhaddw_d_w, LSX, gen_vvv, 16, gen_helper_vhaddw_d_w)
+TRANS(vhaddw_q_d, LSX, gen_vvv, 16, gen_helper_vhaddw_q_d)
+TRANS(vhaddw_hu_bu, LSX, gen_vvv, 16, gen_helper_vhaddw_hu_bu)
+TRANS(vhaddw_wu_hu, LSX, gen_vvv, 16, gen_helper_vhaddw_wu_hu)
+TRANS(vhaddw_du_wu, LSX, gen_vvv, 16, gen_helper_vhaddw_du_wu)
+TRANS(vhaddw_qu_du, LSX, gen_vvv, 16, gen_helper_vhaddw_qu_du)
+TRANS(vhsubw_h_b, LSX, gen_vvv, 16, gen_helper_vhsubw_h_b)
+TRANS(vhsubw_w_h, LSX, gen_vvv, 16, gen_helper_vhsubw_w_h)
+TRANS(vhsubw_d_w, LSX, gen_vvv, 16, gen_helper_vhsubw_d_w)
+TRANS(vhsubw_q_d, LSX, gen_vvv, 16, gen_helper_vhsubw_q_d)
+TRANS(vhsubw_hu_bu, LSX, gen_vvv, 16, gen_helper_vhsubw_hu_bu)
+TRANS(vhsubw_wu_hu, LSX, gen_vvv, 16, gen_helper_vhsubw_wu_hu)
+TRANS(vhsubw_du_wu, LSX, gen_vvv, 16, gen_helper_vhsubw_du_wu)
+TRANS(vhsubw_qu_du, LSX, gen_vvv, 16, gen_helper_vhsubw_qu_du)
 
 static void gen_vaddwev_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2726,22 +2763,22 @@ TRANS(vmaddwod_h_bu_b, LSX, gvec_vvv, 16, MO_8, do_vmaddwod_u_s)
 TRANS(vmaddwod_w_hu_h, LSX, gvec_vvv, 16, MO_16, do_vmaddwod_u_s)
 TRANS(vmaddwod_d_wu_w, LSX, gvec_vvv, 16, MO_32, do_vmaddwod_u_s)
 
-TRANS(vdiv_b, LSX, gen_vvv, gen_helper_vdiv_b)
-TRANS(vdiv_h, LSX, gen_vvv, gen_helper_vdiv_h)
-TRANS(vdiv_w, LSX, gen_vvv, gen_helper_vdiv_w)
-TRANS(vdiv_d, LSX, gen_vvv, gen_helper_vdiv_d)
-TRANS(vdiv_bu, LSX, gen_vvv, gen_helper_vdiv_bu)
-TRANS(vdiv_hu, LSX, gen_vvv, gen_helper_vdiv_hu)
-TRANS(vdiv_wu, LSX, gen_vvv, gen_helper_vdiv_wu)
-TRANS(vdiv_du, LSX, gen_vvv, gen_helper_vdiv_du)
-TRANS(vmod_b, LSX, gen_vvv, gen_helper_vmod_b)
-TRANS(vmod_h, LSX, gen_vvv, gen_helper_vmod_h)
-TRANS(vmod_w, LSX, gen_vvv, gen_helper_vmod_w)
-TRANS(vmod_d, LSX, gen_vvv, gen_helper_vmod_d)
-TRANS(vmod_bu, LSX, gen_vvv, gen_helper_vmod_bu)
-TRANS(vmod_hu, LSX, gen_vvv, gen_helper_vmod_hu)
-TRANS(vmod_wu, LSX, gen_vvv, gen_helper_vmod_wu)
-TRANS(vmod_du, LSX, gen_vvv, gen_helper_vmod_du)
+TRANS(vdiv_b, LSX, gen_vvv, 16, gen_helper_vdiv_b)
+TRANS(vdiv_h, LSX, gen_vvv, 16, gen_helper_vdiv_h)
+TRANS(vdiv_w, LSX, gen_vvv, 16, gen_helper_vdiv_w)
+TRANS(vdiv_d, LSX, gen_vvv, 16, gen_helper_vdiv_d)
+TRANS(vdiv_bu, LSX, gen_vvv, 16, gen_helper_vdiv_bu)
+TRANS(vdiv_hu, LSX, gen_vvv, 16, gen_helper_vdiv_hu)
+TRANS(vdiv_wu, LSX, gen_vvv, 16, gen_helper_vdiv_wu)
+TRANS(vdiv_du, LSX, gen_vvv, 16, gen_helper_vdiv_du)
+TRANS(vmod_b, LSX, gen_vvv, 16, gen_helper_vmod_b)
+TRANS(vmod_h, LSX, gen_vvv, 16, gen_helper_vmod_h)
+TRANS(vmod_w, LSX, gen_vvv, 16, gen_helper_vmod_w)
+TRANS(vmod_d, LSX, gen_vvv, 16, gen_helper_vmod_d)
+TRANS(vmod_bu, LSX, gen_vvv, 16, gen_helper_vmod_bu)
+TRANS(vmod_hu, LSX, gen_vvv, 16, gen_helper_vmod_hu)
+TRANS(vmod_wu, LSX, gen_vvv, 16, gen_helper_vmod_wu)
+TRANS(vmod_du, LSX, gen_vvv, 16, gen_helper_vmod_du)
 
 static void gen_vsat_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec max)
 {
@@ -2844,14 +2881,14 @@ TRANS(vsat_hu, LSX, gvec_vv_i, 16, MO_16, do_vsat_u)
 TRANS(vsat_wu, LSX, gvec_vv_i, 16, MO_32, do_vsat_u)
 TRANS(vsat_du, LSX, gvec_vv_i, 16, MO_64, do_vsat_u)
 
-TRANS(vexth_h_b, LSX, gen_vv, gen_helper_vexth_h_b)
-TRANS(vexth_w_h, LSX, gen_vv, gen_helper_vexth_w_h)
-TRANS(vexth_d_w, LSX, gen_vv, gen_helper_vexth_d_w)
-TRANS(vexth_q_d, LSX, gen_vv, gen_helper_vexth_q_d)
-TRANS(vexth_hu_bu, LSX, gen_vv, gen_helper_vexth_hu_bu)
-TRANS(vexth_wu_hu, LSX, gen_vv, gen_helper_vexth_wu_hu)
-TRANS(vexth_du_wu, LSX, gen_vv, gen_helper_vexth_du_wu)
-TRANS(vexth_qu_du, LSX, gen_vv, gen_helper_vexth_qu_du)
+TRANS(vexth_h_b, LSX, gen_vv, 16, gen_helper_vexth_h_b)
+TRANS(vexth_w_h, LSX, gen_vv, 16, gen_helper_vexth_w_h)
+TRANS(vexth_d_w, LSX, gen_vv, 16, gen_helper_vexth_d_w)
+TRANS(vexth_q_d, LSX, gen_vv, 16, gen_helper_vexth_q_d)
+TRANS(vexth_hu_bu, LSX, gen_vv, 16, gen_helper_vexth_hu_bu)
+TRANS(vexth_wu_hu, LSX, gen_vv, 16, gen_helper_vexth_wu_hu)
+TRANS(vexth_du_wu, LSX, gen_vv, 16, gen_helper_vexth_du_wu)
+TRANS(vexth_qu_du, LSX, gen_vv, 16, gen_helper_vexth_qu_du)
 
 static void gen_vsigncov(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
@@ -2906,12 +2943,12 @@ TRANS(vsigncov_h, LSX, gvec_vvv, 16, MO_16, do_vsigncov)
 TRANS(vsigncov_w, LSX, gvec_vvv, 16, MO_32, do_vsigncov)
 TRANS(vsigncov_d, LSX, gvec_vvv, 16, MO_64, do_vsigncov)
 
-TRANS(vmskltz_b, LSX, gen_vv, gen_helper_vmskltz_b)
-TRANS(vmskltz_h, LSX, gen_vv, gen_helper_vmskltz_h)
-TRANS(vmskltz_w, LSX, gen_vv, gen_helper_vmskltz_w)
-TRANS(vmskltz_d, LSX, gen_vv, gen_helper_vmskltz_d)
-TRANS(vmskgez_b, LSX, gen_vv, gen_helper_vmskgez_b)
-TRANS(vmsknz_b, LSX, gen_vv, gen_helper_vmsknz_b)
+TRANS(vmskltz_b, LSX, gen_vv, 16, gen_helper_vmskltz_b)
+TRANS(vmskltz_h, LSX, gen_vv, 16, gen_helper_vmskltz_h)
+TRANS(vmskltz_w, LSX, gen_vv, 16, gen_helper_vmskltz_w)
+TRANS(vmskltz_d, LSX, gen_vv, 16, gen_helper_vmskltz_d)
+TRANS(vmskgez_b, LSX, gen_vv, 16, gen_helper_vmskgez_b)
+TRANS(vmsknz_b, LSX, gen_vv, 16, gen_helper_vmsknz_b)
 
 #define EXPAND_BYTE(bit)  ((uint64_t)(bit ? 0xff : 0))
 
@@ -3151,138 +3188,138 @@ TRANS(vrotri_h, LSX, gvec_vv_i, 16, MO_16, tcg_gen_gvec_rotri)
 TRANS(vrotri_w, LSX, gvec_vv_i, 16, MO_32, tcg_gen_gvec_rotri)
 TRANS(vrotri_d, LSX, gvec_vv_i, 16, MO_64, tcg_gen_gvec_rotri)
 
-TRANS(vsllwil_h_b, LSX, gen_vv_i, gen_helper_vsllwil_h_b)
-TRANS(vsllwil_w_h, LSX, gen_vv_i, gen_helper_vsllwil_w_h)
-TRANS(vsllwil_d_w, LSX, gen_vv_i, gen_helper_vsllwil_d_w)
-TRANS(vextl_q_d, LSX, gen_vv, gen_helper_vextl_q_d)
-TRANS(vsllwil_hu_bu, LSX, gen_vv_i, gen_helper_vsllwil_hu_bu)
-TRANS(vsllwil_wu_hu, LSX, gen_vv_i, gen_helper_vsllwil_wu_hu)
-TRANS(vsllwil_du_wu, LSX, gen_vv_i, gen_helper_vsllwil_du_wu)
-TRANS(vextl_qu_du, LSX, gen_vv, gen_helper_vextl_qu_du)
-
-TRANS(vsrlr_b, LSX, gen_vvv, gen_helper_vsrlr_b)
-TRANS(vsrlr_h, LSX, gen_vvv, gen_helper_vsrlr_h)
-TRANS(vsrlr_w, LSX, gen_vvv, gen_helper_vsrlr_w)
-TRANS(vsrlr_d, LSX, gen_vvv, gen_helper_vsrlr_d)
-TRANS(vsrlri_b, LSX, gen_vv_i, gen_helper_vsrlri_b)
-TRANS(vsrlri_h, LSX, gen_vv_i, gen_helper_vsrlri_h)
-TRANS(vsrlri_w, LSX, gen_vv_i, gen_helper_vsrlri_w)
-TRANS(vsrlri_d, LSX, gen_vv_i, gen_helper_vsrlri_d)
-
-TRANS(vsrar_b, LSX, gen_vvv, gen_helper_vsrar_b)
-TRANS(vsrar_h, LSX, gen_vvv, gen_helper_vsrar_h)
-TRANS(vsrar_w, LSX, gen_vvv, gen_helper_vsrar_w)
-TRANS(vsrar_d, LSX, gen_vvv, gen_helper_vsrar_d)
-TRANS(vsrari_b, LSX, gen_vv_i, gen_helper_vsrari_b)
-TRANS(vsrari_h, LSX, gen_vv_i, gen_helper_vsrari_h)
-TRANS(vsrari_w, LSX, gen_vv_i, gen_helper_vsrari_w)
-TRANS(vsrari_d, LSX, gen_vv_i, gen_helper_vsrari_d)
-
-TRANS(vsrln_b_h, LSX, gen_vvv, gen_helper_vsrln_b_h)
-TRANS(vsrln_h_w, LSX, gen_vvv, gen_helper_vsrln_h_w)
-TRANS(vsrln_w_d, LSX, gen_vvv, gen_helper_vsrln_w_d)
-TRANS(vsran_b_h, LSX, gen_vvv, gen_helper_vsran_b_h)
-TRANS(vsran_h_w, LSX, gen_vvv, gen_helper_vsran_h_w)
-TRANS(vsran_w_d, LSX, gen_vvv, gen_helper_vsran_w_d)
-
-TRANS(vsrlni_b_h, LSX, gen_vv_i, gen_helper_vsrlni_b_h)
-TRANS(vsrlni_h_w, LSX, gen_vv_i, gen_helper_vsrlni_h_w)
-TRANS(vsrlni_w_d, LSX, gen_vv_i, gen_helper_vsrlni_w_d)
-TRANS(vsrlni_d_q, LSX, gen_vv_i, gen_helper_vsrlni_d_q)
-TRANS(vsrani_b_h, LSX, gen_vv_i, gen_helper_vsrani_b_h)
-TRANS(vsrani_h_w, LSX, gen_vv_i, gen_helper_vsrani_h_w)
-TRANS(vsrani_w_d, LSX, gen_vv_i, gen_helper_vsrani_w_d)
-TRANS(vsrani_d_q, LSX, gen_vv_i, gen_helper_vsrani_d_q)
-
-TRANS(vsrlrn_b_h, LSX, gen_vvv, gen_helper_vsrlrn_b_h)
-TRANS(vsrlrn_h_w, LSX, gen_vvv, gen_helper_vsrlrn_h_w)
-TRANS(vsrlrn_w_d, LSX, gen_vvv, gen_helper_vsrlrn_w_d)
-TRANS(vsrarn_b_h, LSX, gen_vvv, gen_helper_vsrarn_b_h)
-TRANS(vsrarn_h_w, LSX, gen_vvv, gen_helper_vsrarn_h_w)
-TRANS(vsrarn_w_d, LSX, gen_vvv, gen_helper_vsrarn_w_d)
-
-TRANS(vsrlrni_b_h, LSX, gen_vv_i, gen_helper_vsrlrni_b_h)
-TRANS(vsrlrni_h_w, LSX, gen_vv_i, gen_helper_vsrlrni_h_w)
-TRANS(vsrlrni_w_d, LSX, gen_vv_i, gen_helper_vsrlrni_w_d)
-TRANS(vsrlrni_d_q, LSX, gen_vv_i, gen_helper_vsrlrni_d_q)
-TRANS(vsrarni_b_h, LSX, gen_vv_i, gen_helper_vsrarni_b_h)
-TRANS(vsrarni_h_w, LSX, gen_vv_i, gen_helper_vsrarni_h_w)
-TRANS(vsrarni_w_d, LSX, gen_vv_i, gen_helper_vsrarni_w_d)
-TRANS(vsrarni_d_q, LSX, gen_vv_i, gen_helper_vsrarni_d_q)
-
-TRANS(vssrln_b_h, LSX, gen_vvv, gen_helper_vssrln_b_h)
-TRANS(vssrln_h_w, LSX, gen_vvv, gen_helper_vssrln_h_w)
-TRANS(vssrln_w_d, LSX, gen_vvv, gen_helper_vssrln_w_d)
-TRANS(vssran_b_h, LSX, gen_vvv, gen_helper_vssran_b_h)
-TRANS(vssran_h_w, LSX, gen_vvv, gen_helper_vssran_h_w)
-TRANS(vssran_w_d, LSX, gen_vvv, gen_helper_vssran_w_d)
-TRANS(vssrln_bu_h, LSX, gen_vvv, gen_helper_vssrln_bu_h)
-TRANS(vssrln_hu_w, LSX, gen_vvv, gen_helper_vssrln_hu_w)
-TRANS(vssrln_wu_d, LSX, gen_vvv, gen_helper_vssrln_wu_d)
-TRANS(vssran_bu_h, LSX, gen_vvv, gen_helper_vssran_bu_h)
-TRANS(vssran_hu_w, LSX, gen_vvv, gen_helper_vssran_hu_w)
-TRANS(vssran_wu_d, LSX, gen_vvv, gen_helper_vssran_wu_d)
-
-TRANS(vssrlni_b_h, LSX, gen_vv_i, gen_helper_vssrlni_b_h)
-TRANS(vssrlni_h_w, LSX, gen_vv_i, gen_helper_vssrlni_h_w)
-TRANS(vssrlni_w_d, LSX, gen_vv_i, gen_helper_vssrlni_w_d)
-TRANS(vssrlni_d_q, LSX, gen_vv_i, gen_helper_vssrlni_d_q)
-TRANS(vssrani_b_h, LSX, gen_vv_i, gen_helper_vssrani_b_h)
-TRANS(vssrani_h_w, LSX, gen_vv_i, gen_helper_vssrani_h_w)
-TRANS(vssrani_w_d, LSX, gen_vv_i, gen_helper_vssrani_w_d)
-TRANS(vssrani_d_q, LSX, gen_vv_i, gen_helper_vssrani_d_q)
-TRANS(vssrlni_bu_h, LSX, gen_vv_i, gen_helper_vssrlni_bu_h)
-TRANS(vssrlni_hu_w, LSX, gen_vv_i, gen_helper_vssrlni_hu_w)
-TRANS(vssrlni_wu_d, LSX, gen_vv_i, gen_helper_vssrlni_wu_d)
-TRANS(vssrlni_du_q, LSX, gen_vv_i, gen_helper_vssrlni_du_q)
-TRANS(vssrani_bu_h, LSX, gen_vv_i, gen_helper_vssrani_bu_h)
-TRANS(vssrani_hu_w, LSX, gen_vv_i, gen_helper_vssrani_hu_w)
-TRANS(vssrani_wu_d, LSX, gen_vv_i, gen_helper_vssrani_wu_d)
-TRANS(vssrani_du_q, LSX, gen_vv_i, gen_helper_vssrani_du_q)
-
-TRANS(vssrlrn_b_h, LSX, gen_vvv, gen_helper_vssrlrn_b_h)
-TRANS(vssrlrn_h_w, LSX, gen_vvv, gen_helper_vssrlrn_h_w)
-TRANS(vssrlrn_w_d, LSX, gen_vvv, gen_helper_vssrlrn_w_d)
-TRANS(vssrarn_b_h, LSX, gen_vvv, gen_helper_vssrarn_b_h)
-TRANS(vssrarn_h_w, LSX, gen_vvv, gen_helper_vssrarn_h_w)
-TRANS(vssrarn_w_d, LSX, gen_vvv, gen_helper_vssrarn_w_d)
-TRANS(vssrlrn_bu_h, LSX, gen_vvv, gen_helper_vssrlrn_bu_h)
-TRANS(vssrlrn_hu_w, LSX, gen_vvv, gen_helper_vssrlrn_hu_w)
-TRANS(vssrlrn_wu_d, LSX, gen_vvv, gen_helper_vssrlrn_wu_d)
-TRANS(vssrarn_bu_h, LSX, gen_vvv, gen_helper_vssrarn_bu_h)
-TRANS(vssrarn_hu_w, LSX, gen_vvv, gen_helper_vssrarn_hu_w)
-TRANS(vssrarn_wu_d, LSX, gen_vvv, gen_helper_vssrarn_wu_d)
-
-TRANS(vssrlrni_b_h, LSX, gen_vv_i, gen_helper_vssrlrni_b_h)
-TRANS(vssrlrni_h_w, LSX, gen_vv_i, gen_helper_vssrlrni_h_w)
-TRANS(vssrlrni_w_d, LSX, gen_vv_i, gen_helper_vssrlrni_w_d)
-TRANS(vssrlrni_d_q, LSX, gen_vv_i, gen_helper_vssrlrni_d_q)
-TRANS(vssrarni_b_h, LSX, gen_vv_i, gen_helper_vssrarni_b_h)
-TRANS(vssrarni_h_w, LSX, gen_vv_i, gen_helper_vssrarni_h_w)
-TRANS(vssrarni_w_d, LSX, gen_vv_i, gen_helper_vssrarni_w_d)
-TRANS(vssrarni_d_q, LSX, gen_vv_i, gen_helper_vssrarni_d_q)
-TRANS(vssrlrni_bu_h, LSX, gen_vv_i, gen_helper_vssrlrni_bu_h)
-TRANS(vssrlrni_hu_w, LSX, gen_vv_i, gen_helper_vssrlrni_hu_w)
-TRANS(vssrlrni_wu_d, LSX, gen_vv_i, gen_helper_vssrlrni_wu_d)
-TRANS(vssrlrni_du_q, LSX, gen_vv_i, gen_helper_vssrlrni_du_q)
-TRANS(vssrarni_bu_h, LSX, gen_vv_i, gen_helper_vssrarni_bu_h)
-TRANS(vssrarni_hu_w, LSX, gen_vv_i, gen_helper_vssrarni_hu_w)
-TRANS(vssrarni_wu_d, LSX, gen_vv_i, gen_helper_vssrarni_wu_d)
-TRANS(vssrarni_du_q, LSX, gen_vv_i, gen_helper_vssrarni_du_q)
-
-TRANS(vclo_b, LSX, gen_vv, gen_helper_vclo_b)
-TRANS(vclo_h, LSX, gen_vv, gen_helper_vclo_h)
-TRANS(vclo_w, LSX, gen_vv, gen_helper_vclo_w)
-TRANS(vclo_d, LSX, gen_vv, gen_helper_vclo_d)
-TRANS(vclz_b, LSX, gen_vv, gen_helper_vclz_b)
-TRANS(vclz_h, LSX, gen_vv, gen_helper_vclz_h)
-TRANS(vclz_w, LSX, gen_vv, gen_helper_vclz_w)
-TRANS(vclz_d, LSX, gen_vv, gen_helper_vclz_d)
-
-TRANS(vpcnt_b, LSX, gen_vv, gen_helper_vpcnt_b)
-TRANS(vpcnt_h, LSX, gen_vv, gen_helper_vpcnt_h)
-TRANS(vpcnt_w, LSX, gen_vv, gen_helper_vpcnt_w)
-TRANS(vpcnt_d, LSX, gen_vv, gen_helper_vpcnt_d)
+TRANS(vsllwil_h_b, LSX, gen_vv_i, 16, gen_helper_vsllwil_h_b)
+TRANS(vsllwil_w_h, LSX, gen_vv_i, 16, gen_helper_vsllwil_w_h)
+TRANS(vsllwil_d_w, LSX, gen_vv_i, 16, gen_helper_vsllwil_d_w)
+TRANS(vextl_q_d, LSX, gen_vv, 16, gen_helper_vextl_q_d)
+TRANS(vsllwil_hu_bu, LSX, gen_vv_i, 16, gen_helper_vsllwil_hu_bu)
+TRANS(vsllwil_wu_hu, LSX, gen_vv_i, 16, gen_helper_vsllwil_wu_hu)
+TRANS(vsllwil_du_wu, LSX, gen_vv_i, 16, gen_helper_vsllwil_du_wu)
+TRANS(vextl_qu_du, LSX, gen_vv, 16, gen_helper_vextl_qu_du)
+
+TRANS(vsrlr_b, LSX, gen_vvv, 16, gen_helper_vsrlr_b)
+TRANS(vsrlr_h, LSX, gen_vvv, 16, gen_helper_vsrlr_h)
+TRANS(vsrlr_w, LSX, gen_vvv, 16, gen_helper_vsrlr_w)
+TRANS(vsrlr_d, LSX, gen_vvv, 16, gen_helper_vsrlr_d)
+TRANS(vsrlri_b, LSX, gen_vv_i, 16, gen_helper_vsrlri_b)
+TRANS(vsrlri_h, LSX, gen_vv_i, 16, gen_helper_vsrlri_h)
+TRANS(vsrlri_w, LSX, gen_vv_i, 16, gen_helper_vsrlri_w)
+TRANS(vsrlri_d, LSX, gen_vv_i, 16, gen_helper_vsrlri_d)
+
+TRANS(vsrar_b, LSX, gen_vvv, 16, gen_helper_vsrar_b)
+TRANS(vsrar_h, LSX, gen_vvv, 16, gen_helper_vsrar_h)
+TRANS(vsrar_w, LSX, gen_vvv, 16, gen_helper_vsrar_w)
+TRANS(vsrar_d, LSX, gen_vvv, 16, gen_helper_vsrar_d)
+TRANS(vsrari_b, LSX, gen_vv_i, 16, gen_helper_vsrari_b)
+TRANS(vsrari_h, LSX, gen_vv_i, 16, gen_helper_vsrari_h)
+TRANS(vsrari_w, LSX, gen_vv_i, 16, gen_helper_vsrari_w)
+TRANS(vsrari_d, LSX, gen_vv_i, 16, gen_helper_vsrari_d)
+
+TRANS(vsrln_b_h, LSX, gen_vvv, 16, gen_helper_vsrln_b_h)
+TRANS(vsrln_h_w, LSX, gen_vvv, 16, gen_helper_vsrln_h_w)
+TRANS(vsrln_w_d, LSX, gen_vvv, 16, gen_helper_vsrln_w_d)
+TRANS(vsran_b_h, LSX, gen_vvv, 16, gen_helper_vsran_b_h)
+TRANS(vsran_h_w, LSX, gen_vvv, 16, gen_helper_vsran_h_w)
+TRANS(vsran_w_d, LSX, gen_vvv, 16, gen_helper_vsran_w_d)
+
+TRANS(vsrlni_b_h, LSX, gen_vv_i, 16, gen_helper_vsrlni_b_h)
+TRANS(vsrlni_h_w, LSX, gen_vv_i, 16, gen_helper_vsrlni_h_w)
+TRANS(vsrlni_w_d, LSX, gen_vv_i, 16, gen_helper_vsrlni_w_d)
+TRANS(vsrlni_d_q, LSX, gen_vv_i, 16, gen_helper_vsrlni_d_q)
+TRANS(vsrani_b_h, LSX, gen_vv_i, 16, gen_helper_vsrani_b_h)
+TRANS(vsrani_h_w, LSX, gen_vv_i, 16, gen_helper_vsrani_h_w)
+TRANS(vsrani_w_d, LSX, gen_vv_i, 16, gen_helper_vsrani_w_d)
+TRANS(vsrani_d_q, LSX, gen_vv_i, 16, gen_helper_vsrani_d_q)
+
+TRANS(vsrlrn_b_h, LSX, gen_vvv, 16, gen_helper_vsrlrn_b_h)
+TRANS(vsrlrn_h_w, LSX, gen_vvv, 16, gen_helper_vsrlrn_h_w)
+TRANS(vsrlrn_w_d, LSX, gen_vvv, 16, gen_helper_vsrlrn_w_d)
+TRANS(vsrarn_b_h, LSX, gen_vvv, 16, gen_helper_vsrarn_b_h)
+TRANS(vsrarn_h_w, LSX, gen_vvv, 16, gen_helper_vsrarn_h_w)
+TRANS(vsrarn_w_d, LSX, gen_vvv, 16, gen_helper_vsrarn_w_d)
+
+TRANS(vsrlrni_b_h, LSX, gen_vv_i, 16, gen_helper_vsrlrni_b_h)
+TRANS(vsrlrni_h_w, LSX, gen_vv_i, 16, gen_helper_vsrlrni_h_w)
+TRANS(vsrlrni_w_d, LSX, gen_vv_i, 16, gen_helper_vsrlrni_w_d)
+TRANS(vsrlrni_d_q, LSX, gen_vv_i, 16, gen_helper_vsrlrni_d_q)
+TRANS(vsrarni_b_h, LSX, gen_vv_i, 16, gen_helper_vsrarni_b_h)
+TRANS(vsrarni_h_w, LSX, gen_vv_i, 16, gen_helper_vsrarni_h_w)
+TRANS(vsrarni_w_d, LSX, gen_vv_i, 16, gen_helper_vsrarni_w_d)
+TRANS(vsrarni_d_q, LSX, gen_vv_i, 16, gen_helper_vsrarni_d_q)
+
+TRANS(vssrln_b_h, LSX, gen_vvv, 16, gen_helper_vssrln_b_h)
+TRANS(vssrln_h_w, LSX, gen_vvv, 16, gen_helper_vssrln_h_w)
+TRANS(vssrln_w_d, LSX, gen_vvv, 16, gen_helper_vssrln_w_d)
+TRANS(vssran_b_h, LSX, gen_vvv, 16, gen_helper_vssran_b_h)
+TRANS(vssran_h_w, LSX, gen_vvv, 16, gen_helper_vssran_h_w)
+TRANS(vssran_w_d, LSX, gen_vvv, 16, gen_helper_vssran_w_d)
+TRANS(vssrln_bu_h, LSX, gen_vvv, 16, gen_helper_vssrln_bu_h)
+TRANS(vssrln_hu_w, LSX, gen_vvv, 16, gen_helper_vssrln_hu_w)
+TRANS(vssrln_wu_d, LSX, gen_vvv, 16, gen_helper_vssrln_wu_d)
+TRANS(vssran_bu_h, LSX, gen_vvv, 16, gen_helper_vssran_bu_h)
+TRANS(vssran_hu_w, LSX, gen_vvv, 16, gen_helper_vssran_hu_w)
+TRANS(vssran_wu_d, LSX, gen_vvv, 16, gen_helper_vssran_wu_d)
+
+TRANS(vssrlni_b_h, LSX, gen_vv_i, 16, gen_helper_vssrlni_b_h)
+TRANS(vssrlni_h_w, LSX, gen_vv_i, 16, gen_helper_vssrlni_h_w)
+TRANS(vssrlni_w_d, LSX, gen_vv_i, 16, gen_helper_vssrlni_w_d)
+TRANS(vssrlni_d_q, LSX, gen_vv_i, 16, gen_helper_vssrlni_d_q)
+TRANS(vssrani_b_h, LSX, gen_vv_i, 16, gen_helper_vssrani_b_h)
+TRANS(vssrani_h_w, LSX, gen_vv_i, 16, gen_helper_vssrani_h_w)
+TRANS(vssrani_w_d, LSX, gen_vv_i, 16, gen_helper_vssrani_w_d)
+TRANS(vssrani_d_q, LSX, gen_vv_i, 16, gen_helper_vssrani_d_q)
+TRANS(vssrlni_bu_h, LSX, gen_vv_i, 16, gen_helper_vssrlni_bu_h)
+TRANS(vssrlni_hu_w, LSX, gen_vv_i, 16, gen_helper_vssrlni_hu_w)
+TRANS(vssrlni_wu_d, LSX, gen_vv_i, 16, gen_helper_vssrlni_wu_d)
+TRANS(vssrlni_du_q, LSX, gen_vv_i, 16, gen_helper_vssrlni_du_q)
+TRANS(vssrani_bu_h, LSX, gen_vv_i, 16, gen_helper_vssrani_bu_h)
+TRANS(vssrani_hu_w, LSX, gen_vv_i, 16, gen_helper_vssrani_hu_w)
+TRANS(vssrani_wu_d, LSX, gen_vv_i, 16, gen_helper_vssrani_wu_d)
+TRANS(vssrani_du_q, LSX, gen_vv_i, 16, gen_helper_vssrani_du_q)
+
+TRANS(vssrlrn_b_h, LSX, gen_vvv, 16, gen_helper_vssrlrn_b_h)
+TRANS(vssrlrn_h_w, LSX, gen_vvv, 16, gen_helper_vssrlrn_h_w)
+TRANS(vssrlrn_w_d, LSX, gen_vvv, 16, gen_helper_vssrlrn_w_d)
+TRANS(vssrarn_b_h, LSX, gen_vvv, 16, gen_helper_vssrarn_b_h)
+TRANS(vssrarn_h_w, LSX, gen_vvv, 16, gen_helper_vssrarn_h_w)
+TRANS(vssrarn_w_d, LSX, gen_vvv, 16, gen_helper_vssrarn_w_d)
+TRANS(vssrlrn_bu_h, LSX, gen_vvv, 16, gen_helper_vssrlrn_bu_h)
+TRANS(vssrlrn_hu_w, LSX, gen_vvv, 16, gen_helper_vssrlrn_hu_w)
+TRANS(vssrlrn_wu_d, LSX, gen_vvv, 16, gen_helper_vssrlrn_wu_d)
+TRANS(vssrarn_bu_h, LSX, gen_vvv, 16, gen_helper_vssrarn_bu_h)
+TRANS(vssrarn_hu_w, LSX, gen_vvv, 16, gen_helper_vssrarn_hu_w)
+TRANS(vssrarn_wu_d, LSX, gen_vvv, 16, gen_helper_vssrarn_wu_d)
+
+TRANS(vssrlrni_b_h, LSX, gen_vv_i, 16, gen_helper_vssrlrni_b_h)
+TRANS(vssrlrni_h_w, LSX, gen_vv_i, 16, gen_helper_vssrlrni_h_w)
+TRANS(vssrlrni_w_d, LSX, gen_vv_i, 16, gen_helper_vssrlrni_w_d)
+TRANS(vssrlrni_d_q, LSX, gen_vv_i, 16, gen_helper_vssrlrni_d_q)
+TRANS(vssrarni_b_h, LSX, gen_vv_i, 16, gen_helper_vssrarni_b_h)
+TRANS(vssrarni_h_w, LSX, gen_vv_i, 16, gen_helper_vssrarni_h_w)
+TRANS(vssrarni_w_d, LSX, gen_vv_i, 16, gen_helper_vssrarni_w_d)
+TRANS(vssrarni_d_q, LSX, gen_vv_i, 16, gen_helper_vssrarni_d_q)
+TRANS(vssrlrni_bu_h, LSX, gen_vv_i, 16, gen_helper_vssrlrni_bu_h)
+TRANS(vssrlrni_hu_w, LSX, gen_vv_i, 16, gen_helper_vssrlrni_hu_w)
+TRANS(vssrlrni_wu_d, LSX, gen_vv_i, 16, gen_helper_vssrlrni_wu_d)
+TRANS(vssrlrni_du_q, LSX, gen_vv_i, 16, gen_helper_vssrlrni_du_q)
+TRANS(vssrarni_bu_h, LSX, gen_vv_i, 16, gen_helper_vssrarni_bu_h)
+TRANS(vssrarni_hu_w, LSX, gen_vv_i, 16, gen_helper_vssrarni_hu_w)
+TRANS(vssrarni_wu_d, LSX, gen_vv_i, 16, gen_helper_vssrarni_wu_d)
+TRANS(vssrarni_du_q, LSX, gen_vv_i, 16, gen_helper_vssrarni_du_q)
+
+TRANS(vclo_b, LSX, gen_vv, 16, gen_helper_vclo_b)
+TRANS(vclo_h, LSX, gen_vv, 16, gen_helper_vclo_h)
+TRANS(vclo_w, LSX, gen_vv, 16, gen_helper_vclo_w)
+TRANS(vclo_d, LSX, gen_vv, 16, gen_helper_vclo_d)
+TRANS(vclz_b, LSX, gen_vv, 16, gen_helper_vclz_b)
+TRANS(vclz_h, LSX, gen_vv, 16, gen_helper_vclz_h)
+TRANS(vclz_w, LSX, gen_vv, 16, gen_helper_vclz_w)
+TRANS(vclz_d, LSX, gen_vv, 16, gen_helper_vclz_d)
+
+TRANS(vpcnt_b, LSX, gen_vv, 16, gen_helper_vpcnt_b)
+TRANS(vpcnt_h, LSX, gen_vv, 16, gen_helper_vpcnt_h)
+TRANS(vpcnt_w, LSX, gen_vv, 16, gen_helper_vpcnt_w)
+TRANS(vpcnt_d, LSX, gen_vv, 16, gen_helper_vpcnt_d)
 
 static void do_vbit(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b,
                     void (*func)(unsigned, TCGv_vec, TCGv_vec, TCGv_vec))
@@ -3589,107 +3626,107 @@ TRANS(vbitrevi_h, LSX, gvec_vv_i, 16, MO_16, do_vbitrevi)
 TRANS(vbitrevi_w, LSX, gvec_vv_i, 16, MO_32, do_vbitrevi)
 TRANS(vbitrevi_d, LSX, gvec_vv_i, 16, MO_64, do_vbitrevi)
 
-TRANS(vfrstp_b, LSX, gen_vvv, gen_helper_vfrstp_b)
-TRANS(vfrstp_h, LSX, gen_vvv, gen_helper_vfrstp_h)
-TRANS(vfrstpi_b, LSX, gen_vv_i, gen_helper_vfrstpi_b)
-TRANS(vfrstpi_h, LSX, gen_vv_i, gen_helper_vfrstpi_h)
-
-TRANS(vfadd_s, LSX, gen_vvv, gen_helper_vfadd_s)
-TRANS(vfadd_d, LSX, gen_vvv, gen_helper_vfadd_d)
-TRANS(vfsub_s, LSX, gen_vvv, gen_helper_vfsub_s)
-TRANS(vfsub_d, LSX, gen_vvv, gen_helper_vfsub_d)
-TRANS(vfmul_s, LSX, gen_vvv, gen_helper_vfmul_s)
-TRANS(vfmul_d, LSX, gen_vvv, gen_helper_vfmul_d)
-TRANS(vfdiv_s, LSX, gen_vvv, gen_helper_vfdiv_s)
-TRANS(vfdiv_d, LSX, gen_vvv, gen_helper_vfdiv_d)
-
-TRANS(vfmadd_s, LSX, gen_vvvv, gen_helper_vfmadd_s)
-TRANS(vfmadd_d, LSX, gen_vvvv, gen_helper_vfmadd_d)
-TRANS(vfmsub_s, LSX, gen_vvvv, gen_helper_vfmsub_s)
-TRANS(vfmsub_d, LSX, gen_vvvv, gen_helper_vfmsub_d)
-TRANS(vfnmadd_s, LSX, gen_vvvv, gen_helper_vfnmadd_s)
-TRANS(vfnmadd_d, LSX, gen_vvvv, gen_helper_vfnmadd_d)
-TRANS(vfnmsub_s, LSX, gen_vvvv, gen_helper_vfnmsub_s)
-TRANS(vfnmsub_d, LSX, gen_vvvv, gen_helper_vfnmsub_d)
-
-TRANS(vfmax_s, LSX, gen_vvv, gen_helper_vfmax_s)
-TRANS(vfmax_d, LSX, gen_vvv, gen_helper_vfmax_d)
-TRANS(vfmin_s, LSX, gen_vvv, gen_helper_vfmin_s)
-TRANS(vfmin_d, LSX, gen_vvv, gen_helper_vfmin_d)
-
-TRANS(vfmaxa_s, LSX, gen_vvv, gen_helper_vfmaxa_s)
-TRANS(vfmaxa_d, LSX, gen_vvv, gen_helper_vfmaxa_d)
-TRANS(vfmina_s, LSX, gen_vvv, gen_helper_vfmina_s)
-TRANS(vfmina_d, LSX, gen_vvv, gen_helper_vfmina_d)
-
-TRANS(vflogb_s, LSX, gen_vv, gen_helper_vflogb_s)
-TRANS(vflogb_d, LSX, gen_vv, gen_helper_vflogb_d)
-
-TRANS(vfclass_s, LSX, gen_vv, gen_helper_vfclass_s)
-TRANS(vfclass_d, LSX, gen_vv, gen_helper_vfclass_d)
-
-TRANS(vfsqrt_s, LSX, gen_vv, gen_helper_vfsqrt_s)
-TRANS(vfsqrt_d, LSX, gen_vv, gen_helper_vfsqrt_d)
-TRANS(vfrecip_s, LSX, gen_vv, gen_helper_vfrecip_s)
-TRANS(vfrecip_d, LSX, gen_vv, gen_helper_vfrecip_d)
-TRANS(vfrsqrt_s, LSX, gen_vv, gen_helper_vfrsqrt_s)
-TRANS(vfrsqrt_d, LSX, gen_vv, gen_helper_vfrsqrt_d)
-
-TRANS(vfcvtl_s_h, LSX, gen_vv, gen_helper_vfcvtl_s_h)
-TRANS(vfcvth_s_h, LSX, gen_vv, gen_helper_vfcvth_s_h)
-TRANS(vfcvtl_d_s, LSX, gen_vv, gen_helper_vfcvtl_d_s)
-TRANS(vfcvth_d_s, LSX, gen_vv, gen_helper_vfcvth_d_s)
-TRANS(vfcvt_h_s, LSX, gen_vvv, gen_helper_vfcvt_h_s)
-TRANS(vfcvt_s_d, LSX, gen_vvv, gen_helper_vfcvt_s_d)
-
-TRANS(vfrintrne_s, LSX, gen_vv, gen_helper_vfrintrne_s)
-TRANS(vfrintrne_d, LSX, gen_vv, gen_helper_vfrintrne_d)
-TRANS(vfrintrz_s, LSX, gen_vv, gen_helper_vfrintrz_s)
-TRANS(vfrintrz_d, LSX, gen_vv, gen_helper_vfrintrz_d)
-TRANS(vfrintrp_s, LSX, gen_vv, gen_helper_vfrintrp_s)
-TRANS(vfrintrp_d, LSX, gen_vv, gen_helper_vfrintrp_d)
-TRANS(vfrintrm_s, LSX, gen_vv, gen_helper_vfrintrm_s)
-TRANS(vfrintrm_d, LSX, gen_vv, gen_helper_vfrintrm_d)
-TRANS(vfrint_s, LSX, gen_vv, gen_helper_vfrint_s)
-TRANS(vfrint_d, LSX, gen_vv, gen_helper_vfrint_d)
-
-TRANS(vftintrne_w_s, LSX, gen_vv, gen_helper_vftintrne_w_s)
-TRANS(vftintrne_l_d, LSX, gen_vv, gen_helper_vftintrne_l_d)
-TRANS(vftintrz_w_s, LSX, gen_vv, gen_helper_vftintrz_w_s)
-TRANS(vftintrz_l_d, LSX, gen_vv, gen_helper_vftintrz_l_d)
-TRANS(vftintrp_w_s, LSX, gen_vv, gen_helper_vftintrp_w_s)
-TRANS(vftintrp_l_d, LSX, gen_vv, gen_helper_vftintrp_l_d)
-TRANS(vftintrm_w_s, LSX, gen_vv, gen_helper_vftintrm_w_s)
-TRANS(vftintrm_l_d, LSX, gen_vv, gen_helper_vftintrm_l_d)
-TRANS(vftint_w_s, LSX, gen_vv, gen_helper_vftint_w_s)
-TRANS(vftint_l_d, LSX, gen_vv, gen_helper_vftint_l_d)
-TRANS(vftintrz_wu_s, LSX, gen_vv, gen_helper_vftintrz_wu_s)
-TRANS(vftintrz_lu_d, LSX, gen_vv, gen_helper_vftintrz_lu_d)
-TRANS(vftint_wu_s, LSX, gen_vv, gen_helper_vftint_wu_s)
-TRANS(vftint_lu_d, LSX, gen_vv, gen_helper_vftint_lu_d)
-TRANS(vftintrne_w_d, LSX, gen_vvv, gen_helper_vftintrne_w_d)
-TRANS(vftintrz_w_d, LSX, gen_vvv, gen_helper_vftintrz_w_d)
-TRANS(vftintrp_w_d, LSX, gen_vvv, gen_helper_vftintrp_w_d)
-TRANS(vftintrm_w_d, LSX, gen_vvv, gen_helper_vftintrm_w_d)
-TRANS(vftint_w_d, LSX, gen_vvv, gen_helper_vftint_w_d)
-TRANS(vftintrnel_l_s, LSX, gen_vv, gen_helper_vftintrnel_l_s)
-TRANS(vftintrneh_l_s, LSX, gen_vv, gen_helper_vftintrneh_l_s)
-TRANS(vftintrzl_l_s, LSX, gen_vv, gen_helper_vftintrzl_l_s)
-TRANS(vftintrzh_l_s, LSX, gen_vv, gen_helper_vftintrzh_l_s)
-TRANS(vftintrpl_l_s, LSX, gen_vv, gen_helper_vftintrpl_l_s)
-TRANS(vftintrph_l_s, LSX, gen_vv, gen_helper_vftintrph_l_s)
-TRANS(vftintrml_l_s, LSX, gen_vv, gen_helper_vftintrml_l_s)
-TRANS(vftintrmh_l_s, LSX, gen_vv, gen_helper_vftintrmh_l_s)
-TRANS(vftintl_l_s, LSX, gen_vv, gen_helper_vftintl_l_s)
-TRANS(vftinth_l_s, LSX, gen_vv, gen_helper_vftinth_l_s)
-
-TRANS(vffint_s_w, LSX, gen_vv, gen_helper_vffint_s_w)
-TRANS(vffint_d_l, LSX, gen_vv, gen_helper_vffint_d_l)
-TRANS(vffint_s_wu, LSX, gen_vv, gen_helper_vffint_s_wu)
-TRANS(vffint_d_lu, LSX, gen_vv, gen_helper_vffint_d_lu)
-TRANS(vffintl_d_w, LSX, gen_vv, gen_helper_vffintl_d_w)
-TRANS(vffinth_d_w, LSX, gen_vv, gen_helper_vffinth_d_w)
-TRANS(vffint_s_l, LSX, gen_vvv, gen_helper_vffint_s_l)
+TRANS(vfrstp_b, LSX, gen_vvv, 16, gen_helper_vfrstp_b)
+TRANS(vfrstp_h, LSX, gen_vvv, 16, gen_helper_vfrstp_h)
+TRANS(vfrstpi_b, LSX, gen_vv_i, 16, gen_helper_vfrstpi_b)
+TRANS(vfrstpi_h, LSX, gen_vv_i, 16, gen_helper_vfrstpi_h)
+
+TRANS(vfadd_s, LSX, gen_vvv_f, 16, gen_helper_vfadd_s)
+TRANS(vfadd_d, LSX, gen_vvv_f, 16, gen_helper_vfadd_d)
+TRANS(vfsub_s, LSX, gen_vvv_f, 16, gen_helper_vfsub_s)
+TRANS(vfsub_d, LSX, gen_vvv_f, 16, gen_helper_vfsub_d)
+TRANS(vfmul_s, LSX, gen_vvv_f, 16, gen_helper_vfmul_s)
+TRANS(vfmul_d, LSX, gen_vvv_f, 16, gen_helper_vfmul_d)
+TRANS(vfdiv_s, LSX, gen_vvv_f, 16, gen_helper_vfdiv_s)
+TRANS(vfdiv_d, LSX, gen_vvv_f, 16, gen_helper_vfdiv_d)
+
+TRANS(vfmadd_s, LSX, gen_vvvv_f, 16, gen_helper_vfmadd_s)
+TRANS(vfmadd_d, LSX, gen_vvvv_f, 16, gen_helper_vfmadd_d)
+TRANS(vfmsub_s, LSX, gen_vvvv_f, 16, gen_helper_vfmsub_s)
+TRANS(vfmsub_d, LSX, gen_vvvv_f, 16, gen_helper_vfmsub_d)
+TRANS(vfnmadd_s, LSX, gen_vvvv_f, 16, gen_helper_vfnmadd_s)
+TRANS(vfnmadd_d, LSX, gen_vvvv_f, 16, gen_helper_vfnmadd_d)
+TRANS(vfnmsub_s, LSX, gen_vvvv_f, 16, gen_helper_vfnmsub_s)
+TRANS(vfnmsub_d, LSX, gen_vvvv_f, 16, gen_helper_vfnmsub_d)
+
+TRANS(vfmax_s, LSX, gen_vvv_f, 16, gen_helper_vfmax_s)
+TRANS(vfmax_d, LSX, gen_vvv_f, 16, gen_helper_vfmax_d)
+TRANS(vfmin_s, LSX, gen_vvv_f, 16, gen_helper_vfmin_s)
+TRANS(vfmin_d, LSX, gen_vvv_f, 16, gen_helper_vfmin_d)
+
+TRANS(vfmaxa_s, LSX, gen_vvv_f, 16, gen_helper_vfmaxa_s)
+TRANS(vfmaxa_d, LSX, gen_vvv_f, 16, gen_helper_vfmaxa_d)
+TRANS(vfmina_s, LSX, gen_vvv_f, 16, gen_helper_vfmina_s)
+TRANS(vfmina_d, LSX, gen_vvv_f, 16, gen_helper_vfmina_d)
+
+TRANS(vflogb_s, LSX, gen_vv_f, 16, gen_helper_vflogb_s)
+TRANS(vflogb_d, LSX, gen_vv_f, 16, gen_helper_vflogb_d)
+
+TRANS(vfclass_s, LSX, gen_vv_f, 16, gen_helper_vfclass_s)
+TRANS(vfclass_d, LSX, gen_vv_f, 16, gen_helper_vfclass_d)
+
+TRANS(vfsqrt_s, LSX, gen_vv_f, 16, gen_helper_vfsqrt_s)
+TRANS(vfsqrt_d, LSX, gen_vv_f, 16, gen_helper_vfsqrt_d)
+TRANS(vfrecip_s, LSX, gen_vv_f, 16, gen_helper_vfrecip_s)
+TRANS(vfrecip_d, LSX, gen_vv_f, 16, gen_helper_vfrecip_d)
+TRANS(vfrsqrt_s, LSX, gen_vv_f, 16, gen_helper_vfrsqrt_s)
+TRANS(vfrsqrt_d, LSX, gen_vv_f, 16, gen_helper_vfrsqrt_d)
+
+TRANS(vfcvtl_s_h, LSX, gen_vv_f, 16, gen_helper_vfcvtl_s_h)
+TRANS(vfcvth_s_h, LSX, gen_vv_f, 16, gen_helper_vfcvth_s_h)
+TRANS(vfcvtl_d_s, LSX, gen_vv_f, 16, gen_helper_vfcvtl_d_s)
+TRANS(vfcvth_d_s, LSX, gen_vv_f, 16, gen_helper_vfcvth_d_s)
+TRANS(vfcvt_h_s, LSX, gen_vvv_f, 16, gen_helper_vfcvt_h_s)
+TRANS(vfcvt_s_d, LSX, gen_vvv_f, 16, gen_helper_vfcvt_s_d)
+
+TRANS(vfrintrne_s, LSX, gen_vv_f, 16, gen_helper_vfrintrne_s)
+TRANS(vfrintrne_d, LSX, gen_vv_f, 16, gen_helper_vfrintrne_d)
+TRANS(vfrintrz_s, LSX, gen_vv_f, 16, gen_helper_vfrintrz_s)
+TRANS(vfrintrz_d, LSX, gen_vv_f, 16, gen_helper_vfrintrz_d)
+TRANS(vfrintrp_s, LSX, gen_vv_f, 16, gen_helper_vfrintrp_s)
+TRANS(vfrintrp_d, LSX, gen_vv_f, 16, gen_helper_vfrintrp_d)
+TRANS(vfrintrm_s, LSX, gen_vv_f, 16, gen_helper_vfrintrm_s)
+TRANS(vfrintrm_d, LSX, gen_vv_f, 16, gen_helper_vfrintrm_d)
+TRANS(vfrint_s, LSX, gen_vv_f, 16, gen_helper_vfrint_s)
+TRANS(vfrint_d, LSX, gen_vv_f, 16, gen_helper_vfrint_d)
+
+TRANS(vftintrne_w_s, LSX, gen_vv_f, 16, gen_helper_vftintrne_w_s)
+TRANS(vftintrne_l_d, LSX, gen_vv_f, 16, gen_helper_vftintrne_l_d)
+TRANS(vftintrz_w_s, LSX, gen_vv_f, 16, gen_helper_vftintrz_w_s)
+TRANS(vftintrz_l_d, LSX, gen_vv_f, 16, gen_helper_vftintrz_l_d)
+TRANS(vftintrp_w_s, LSX, gen_vv_f, 16, gen_helper_vftintrp_w_s)
+TRANS(vftintrp_l_d, LSX, gen_vv_f, 16, gen_helper_vftintrp_l_d)
+TRANS(vftintrm_w_s, LSX, gen_vv_f, 16, gen_helper_vftintrm_w_s)
+TRANS(vftintrm_l_d, LSX, gen_vv_f, 16, gen_helper_vftintrm_l_d)
+TRANS(vftint_w_s, LSX, gen_vv_f, 16, gen_helper_vftint_w_s)
+TRANS(vftint_l_d, LSX, gen_vv_f, 16, gen_helper_vftint_l_d)
+TRANS(vftintrz_wu_s, LSX, gen_vv_f, 16, gen_helper_vftintrz_wu_s)
+TRANS(vftintrz_lu_d, LSX, gen_vv_f, 16, gen_helper_vftintrz_lu_d)
+TRANS(vftint_wu_s, LSX, gen_vv_f, 16, gen_helper_vftint_wu_s)
+TRANS(vftint_lu_d, LSX, gen_vv_f, 16, gen_helper_vftint_lu_d)
+TRANS(vftintrne_w_d, LSX, gen_vvv_f, 16, gen_helper_vftintrne_w_d)
+TRANS(vftintrz_w_d, LSX, gen_vvv_f, 16, gen_helper_vftintrz_w_d)
+TRANS(vftintrp_w_d, LSX, gen_vvv_f, 16, gen_helper_vftintrp_w_d)
+TRANS(vftintrm_w_d, LSX, gen_vvv_f, 16, gen_helper_vftintrm_w_d)
+TRANS(vftint_w_d, LSX, gen_vvv_f, 16, gen_helper_vftint_w_d)
+TRANS(vftintrnel_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrnel_l_s)
+TRANS(vftintrneh_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrneh_l_s)
+TRANS(vftintrzl_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrzl_l_s)
+TRANS(vftintrzh_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrzh_l_s)
+TRANS(vftintrpl_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrpl_l_s)
+TRANS(vftintrph_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrph_l_s)
+TRANS(vftintrml_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrml_l_s)
+TRANS(vftintrmh_l_s, LSX, gen_vv_f, 16, gen_helper_vftintrmh_l_s)
+TRANS(vftintl_l_s, LSX, gen_vv_f, 16, gen_helper_vftintl_l_s)
+TRANS(vftinth_l_s, LSX, gen_vv_f, 16, gen_helper_vftinth_l_s)
+
+TRANS(vffint_s_w, LSX, gen_vv_f, 16, gen_helper_vffint_s_w)
+TRANS(vffint_d_l, LSX, gen_vv_f, 16, gen_helper_vffint_d_l)
+TRANS(vffint_s_wu, LSX, gen_vv_f, 16, gen_helper_vffint_s_wu)
+TRANS(vffint_d_lu, LSX, gen_vv_f, 16, gen_helper_vffint_d_lu)
+TRANS(vffintl_d_w, LSX, gen_vv_f, 16, gen_helper_vffintl_d_w)
+TRANS(vffinth_d_w, LSX, gen_vv_f, 16, gen_helper_vffinth_d_w)
+TRANS(vffint_s_l, LSX, gen_vvv_f, 16, gen_helper_vffint_s_l)
 
 static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond)
 {
@@ -4335,48 +4372,48 @@ static bool trans_vbsrl_v(DisasContext *ctx, arg_vv_i *a)
     return true;
 }
 
-TRANS(vpackev_b, LSX, gen_vvv, gen_helper_vpackev_b)
-TRANS(vpackev_h, LSX, gen_vvv, gen_helper_vpackev_h)
-TRANS(vpackev_w, LSX, gen_vvv, gen_helper_vpackev_w)
-TRANS(vpackev_d, LSX, gen_vvv, gen_helper_vpackev_d)
-TRANS(vpackod_b, LSX, gen_vvv, gen_helper_vpackod_b)
-TRANS(vpackod_h, LSX, gen_vvv, gen_helper_vpackod_h)
-TRANS(vpackod_w, LSX, gen_vvv, gen_helper_vpackod_w)
-TRANS(vpackod_d, LSX, gen_vvv, gen_helper_vpackod_d)
-
-TRANS(vpickev_b, LSX, gen_vvv, gen_helper_vpickev_b)
-TRANS(vpickev_h, LSX, gen_vvv, gen_helper_vpickev_h)
-TRANS(vpickev_w, LSX, gen_vvv, gen_helper_vpickev_w)
-TRANS(vpickev_d, LSX, gen_vvv, gen_helper_vpickev_d)
-TRANS(vpickod_b, LSX, gen_vvv, gen_helper_vpickod_b)
-TRANS(vpickod_h, LSX, gen_vvv, gen_helper_vpickod_h)
-TRANS(vpickod_w, LSX, gen_vvv, gen_helper_vpickod_w)
-TRANS(vpickod_d, LSX, gen_vvv, gen_helper_vpickod_d)
-
-TRANS(vilvl_b, LSX, gen_vvv, gen_helper_vilvl_b)
-TRANS(vilvl_h, LSX, gen_vvv, gen_helper_vilvl_h)
-TRANS(vilvl_w, LSX, gen_vvv, gen_helper_vilvl_w)
-TRANS(vilvl_d, LSX, gen_vvv, gen_helper_vilvl_d)
-TRANS(vilvh_b, LSX, gen_vvv, gen_helper_vilvh_b)
-TRANS(vilvh_h, LSX, gen_vvv, gen_helper_vilvh_h)
-TRANS(vilvh_w, LSX, gen_vvv, gen_helper_vilvh_w)
-TRANS(vilvh_d, LSX, gen_vvv, gen_helper_vilvh_d)
-
-TRANS(vshuf_b, LSX, gen_vvvv, gen_helper_vshuf_b)
-TRANS(vshuf_h, LSX, gen_vvv, gen_helper_vshuf_h)
-TRANS(vshuf_w, LSX, gen_vvv, gen_helper_vshuf_w)
-TRANS(vshuf_d, LSX, gen_vvv, gen_helper_vshuf_d)
-TRANS(vshuf4i_b, LSX, gen_vv_i, gen_helper_vshuf4i_b)
-TRANS(vshuf4i_h, LSX, gen_vv_i, gen_helper_vshuf4i_h)
-TRANS(vshuf4i_w, LSX, gen_vv_i, gen_helper_vshuf4i_w)
-TRANS(vshuf4i_d, LSX, gen_vv_i, gen_helper_vshuf4i_d)
-
-TRANS(vpermi_w, LSX, gen_vv_i, gen_helper_vpermi_w)
-
-TRANS(vextrins_b, LSX, gen_vv_i, gen_helper_vextrins_b)
-TRANS(vextrins_h, LSX, gen_vv_i, gen_helper_vextrins_h)
-TRANS(vextrins_w, LSX, gen_vv_i, gen_helper_vextrins_w)
-TRANS(vextrins_d, LSX, gen_vv_i, gen_helper_vextrins_d)
+TRANS(vpackev_b, LSX, gen_vvv, 16, gen_helper_vpackev_b)
+TRANS(vpackev_h, LSX, gen_vvv, 16, gen_helper_vpackev_h)
+TRANS(vpackev_w, LSX, gen_vvv, 16, gen_helper_vpackev_w)
+TRANS(vpackev_d, LSX, gen_vvv, 16, gen_helper_vpackev_d)
+TRANS(vpackod_b, LSX, gen_vvv, 16, gen_helper_vpackod_b)
+TRANS(vpackod_h, LSX, gen_vvv, 16, gen_helper_vpackod_h)
+TRANS(vpackod_w, LSX, gen_vvv, 16, gen_helper_vpackod_w)
+TRANS(vpackod_d, LSX, gen_vvv, 16, gen_helper_vpackod_d)
+
+TRANS(vpickev_b, LSX, gen_vvv, 16, gen_helper_vpickev_b)
+TRANS(vpickev_h, LSX, gen_vvv, 16, gen_helper_vpickev_h)
+TRANS(vpickev_w, LSX, gen_vvv, 16, gen_helper_vpickev_w)
+TRANS(vpickev_d, LSX, gen_vvv, 16, gen_helper_vpickev_d)
+TRANS(vpickod_b, LSX, gen_vvv, 16, gen_helper_vpickod_b)
+TRANS(vpickod_h, LSX, gen_vvv, 16, gen_helper_vpickod_h)
+TRANS(vpickod_w, LSX, gen_vvv, 16, gen_helper_vpickod_w)
+TRANS(vpickod_d, LSX, gen_vvv, 16, gen_helper_vpickod_d)
+
+TRANS(vilvl_b, LSX, gen_vvv, 16, gen_helper_vilvl_b)
+TRANS(vilvl_h, LSX, gen_vvv, 16, gen_helper_vilvl_h)
+TRANS(vilvl_w, LSX, gen_vvv, 16, gen_helper_vilvl_w)
+TRANS(vilvl_d, LSX, gen_vvv, 16, gen_helper_vilvl_d)
+TRANS(vilvh_b, LSX, gen_vvv, 16, gen_helper_vilvh_b)
+TRANS(vilvh_h, LSX, gen_vvv, 16, gen_helper_vilvh_h)
+TRANS(vilvh_w, LSX, gen_vvv, 16, gen_helper_vilvh_w)
+TRANS(vilvh_d, LSX, gen_vvv, 16, gen_helper_vilvh_d)
+
+TRANS(vshuf_b, LSX, gen_vvvv, 16, gen_helper_vshuf_b)
+TRANS(vshuf_h, LSX, gen_vvv, 16, gen_helper_vshuf_h)
+TRANS(vshuf_w, LSX, gen_vvv, 16, gen_helper_vshuf_w)
+TRANS(vshuf_d, LSX, gen_vvv, 16, gen_helper_vshuf_d)
+TRANS(vshuf4i_b, LSX, gen_vv_i, 16, gen_helper_vshuf4i_b)
+TRANS(vshuf4i_h, LSX, gen_vv_i, 16, gen_helper_vshuf4i_h)
+TRANS(vshuf4i_w, LSX, gen_vv_i, 16, gen_helper_vshuf4i_w)
+TRANS(vshuf4i_d, LSX, gen_vv_i, 16, gen_helper_vshuf4i_d)
+
+TRANS(vpermi_w, LSX, gen_vv_i, 16, gen_helper_vpermi_w)
+
+TRANS(vextrins_b, LSX, gen_vv_i, 16, gen_helper_vextrins_b)
+TRANS(vextrins_h, LSX, gen_vv_i, 16, gen_helper_vextrins_h)
+TRANS(vextrins_w, LSX, gen_vv_i, 16, gen_helper_vextrins_w)
+TRANS(vextrins_d, LSX, gen_vv_i, 16, gen_helper_vextrins_d)
 
 static bool trans_vld(DisasContext *ctx, arg_vr_i *a)
 {
diff --git a/target/loongarch/meson.build b/target/loongarch/meson.build
index b7a27df5a9..7fbf045a5d 100644
--- a/target/loongarch/meson.build
+++ b/target/loongarch/meson.build
@@ -11,7 +11,7 @@ loongarch_tcg_ss.add(files(
   'op_helper.c',
   'translate.c',
   'gdbstub.c',
-  'lsx_helper.c',
+  'vec_helper.c',
 ))
 loongarch_tcg_ss.add(zlib)
 
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c
  2023-08-30  8:48 ` [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c Song Gao
@ 2023-08-30 18:06   ` Richard Henderson
  2023-08-31  7:17     ` gaosong
  0 siblings, 1 reply; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 18:06 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> Use gen_helper_gvec_* series function.
> and rename lsx_helper.c to vec_helper.c.
> 
> Signed-off-by: Song Gao <gaosong@loongson.cn>
> ---
>   target/loongarch/helper.h                     |  642 ++++----
>   .../loongarch/{lsx_helper.c => vec_helper.c}  | 1297 ++++++++---------

These changes are fine, but should be split.

The helper changes can be done with only minimal changes

>   target/loongarch/insn_trans/trans_lsx.c.inc   |  731 +++++-----

here, rather than to 700+ lines at once.

> -static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
> -                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
> -                                  TCGv_i32, TCGv_i32))
> +static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a, int oprsz,
> +                     gen_helper_gvec_4 *fn)

If you omit the oprsz argument within this patch,

> +    tcg_gen_gvec_4_ool(vec_full_offset(a->vd),
> +                       vec_full_offset(a->vj),
> +                       vec_full_offset(a->vk),
> +                       vec_full_offset(a->va),
> +                       oprsz, ctx->vl / 8, oprsz, fn);

hard-coding 16 here instead,

> -TRANS(vhaddw_h_b, LSX, gen_vvv, gen_helper_vhaddw_h_b)
> +TRANS(vhaddw_h_b, LSX, gen_vvv, 16, gen_helper_vhaddw_h_b)

then you do not need all of these changes.

At which point I'll refer you back to my comments vs patches 5 and 6, wherein separate 
gen_vvv and gen_xxx helpers would avoid the need to replicate 16 across all of these lines.


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c
  2023-08-30 18:06   ` Richard Henderson
@ 2023-08-31  7:17     ` gaosong
  0 siblings, 0 replies; 86+ messages in thread
From: gaosong @ 2023-08-31  7:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

在 2023/8/31 上午2:06, Richard Henderson 写道:
> On 8/30/23 01:48, Song Gao wrote:
>> Use gen_helper_gvec_* series function.
>> and rename lsx_helper.c to vec_helper.c.
>>
>> Signed-off-by: Song Gao <gaosong@loongson.cn>
>> ---
>>   target/loongarch/helper.h                     |  642 ++++----
>>   .../loongarch/{lsx_helper.c => vec_helper.c}  | 1297 ++++++++---------
> 
> These changes are fine, but should be split.
> 
> The helper changes can be done with only minimal changes
> 
>>   target/loongarch/insn_trans/trans_lsx.c.inc   |  731 +++++-----
> 
> here, rather than to 700+ lines at once.
> 
OK, It seems that need more patches do this.
>> -static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a,
>> -                     void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32,
>> -                                  TCGv_i32, TCGv_i32))
>> +static bool gen_vvvv(DisasContext *ctx, arg_vvvv *a, int oprsz,
>> +                     gen_helper_gvec_4 *fn)
> 
> If you omit the oprsz argument within this patch,
> 
>> +    tcg_gen_gvec_4_ool(vec_full_offset(a->vd),
>> +                       vec_full_offset(a->vj),
>> +                       vec_full_offset(a->vk),
>> +                       vec_full_offset(a->va),
>> +                       oprsz, ctx->vl / 8, oprsz, fn);
> 
> hard-coding 16 here instead,
> 
>> -TRANS(vhaddw_h_b, LSX, gen_vvv, gen_helper_vhaddw_h_b)
>> +TRANS(vhaddw_h_b, LSX, gen_vvv, 16, gen_helper_vhaddw_h_b)
> 
> then you do not need all of these changes.
> 
> At which point I'll refer you back to my comments vs patches 5 and 6, 
> wherein separate gen_vvv and gen_xxx helpers would avoid the need to 
> replicate 16 across all of these lines.
>
Got it

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (9 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 18:12   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 12/48] target/loongarch: Implement xvaddw/xvsubw Song Gao
                   ` (36 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVHADDW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU};
- XVHSUBW.{H.B/W.H/D.W/Q.D/HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  3 ++
 target/loongarch/insns.decode                | 18 ++++++++++
 target/loongarch/disas.c                     | 17 +++++++++
 target/loongarch/vec_helper.c                | 36 ++++++++++++++------
 target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++
 5 files changed, 81 insertions(+), 10 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 512f2fd83f..5332dff83c 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -47,4 +47,7 @@
 #define Q(x)  Q[x]
 #endif /* HOST_BIG_ENDIAN */
 
+#define DO_ADD(a, b)  (a + b)
+#define DO_SUB(a, b)  (a - b)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 32f857ff7c..ba0b36f4a7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1343,6 +1343,24 @@ xvssub_hu        0111 01000100 11001 ..... ..... .....    @vvv
 xvssub_wu        0111 01000100 11010 ..... ..... .....    @vvv
 xvssub_du        0111 01000100 11011 ..... ..... .....    @vvv
 
+xvhaddw_h_b      0111 01000101 01000 ..... ..... .....    @vvv
+xvhaddw_w_h      0111 01000101 01001 ..... ..... .....    @vvv
+xvhaddw_d_w      0111 01000101 01010 ..... ..... .....    @vvv
+xvhaddw_q_d      0111 01000101 01011 ..... ..... .....    @vvv
+xvhaddw_hu_bu    0111 01000101 10000 ..... ..... .....    @vvv
+xvhaddw_wu_hu    0111 01000101 10001 ..... ..... .....    @vvv
+xvhaddw_du_wu    0111 01000101 10010 ..... ..... .....    @vvv
+xvhaddw_qu_du    0111 01000101 10011 ..... ..... .....    @vvv
+
+xvhsubw_h_b      0111 01000101 01100 ..... ..... .....    @vvv
+xvhsubw_w_h      0111 01000101 01101 ..... ..... .....    @vvv
+xvhsubw_d_w      0111 01000101 01110 ..... ..... .....    @vvv
+xvhsubw_q_d      0111 01000101 01111 ..... ..... .....    @vvv
+xvhsubw_hu_bu    0111 01000101 10100 ..... ..... .....    @vvv
+xvhsubw_wu_hu    0111 01000101 10101 ..... ..... .....    @vvv
+xvhsubw_du_wu    0111 01000101 10110 ..... ..... .....    @vvv
+xvhsubw_qu_du    0111 01000101 10111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0fd88a56c1..e188220519 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1765,6 +1765,23 @@ INSN_LASX(xvssub_hu,         vvv)
 INSN_LASX(xvssub_wu,         vvv)
 INSN_LASX(xvssub_du,         vvv)
 
+INSN_LASX(xvhaddw_h_b,       vvv)
+INSN_LASX(xvhaddw_w_h,       vvv)
+INSN_LASX(xvhaddw_d_w,       vvv)
+INSN_LASX(xvhaddw_q_d,       vvv)
+INSN_LASX(xvhaddw_hu_bu,     vvv)
+INSN_LASX(xvhaddw_wu_hu,     vvv)
+INSN_LASX(xvhaddw_du_wu,     vvv)
+INSN_LASX(xvhaddw_qu_du,     vvv)
+INSN_LASX(xvhsubw_h_b,       vvv)
+INSN_LASX(xvhsubw_w_h,       vvv)
+INSN_LASX(xvhsubw_d_w,       vvv)
+INSN_LASX(xvhsubw_q_d,       vvv)
+INSN_LASX(xvhsubw_hu_bu,     vvv)
+INSN_LASX(xvhsubw_wu_hu,     vvv)
+INSN_LASX(xvhsubw_du_wu,     vvv)
+INSN_LASX(xvhsubw_qu_du,     vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index d01903018a..b6c0b3fda8 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -14,9 +14,6 @@
 #include "tcg/tcg.h"
 #include "vec.h"
 
-#define DO_ADD(a, b)  (a + b)
-#define DO_SUB(a, b)  (a - b)
-
 #define DO_ODD_EVEN(NAME, BIT, E1, E2, DO_OP)                        \
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)       \
 {                                                                    \
@@ -25,8 +22,9 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)       \
     VReg *Vj = (VReg *)vj;                                           \
     VReg *Vk = (VReg *)vk;                                           \
     typedef __typeof(Vd->E1(0)) TD;                                  \
+    int oprsz = simd_oprsz(desc);                                    \
                                                                      \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                              \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                        \
         Vd->E1(i) = DO_OP((TD)Vj->E2(2 * i + 1), (TD)Vk->E2(2 * i)); \
     }                                                                \
 }
@@ -37,11 +35,16 @@ DO_ODD_EVEN(vhaddw_d_w, 64, D, W, DO_ADD)
 
 void HELPER(vhaddw_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(0)));
+    for (i = 0; i < oprsz / 16 ; i++) {
+        Vd->Q(i) = int128_add(int128_makes64(Vj->D(2 * i + 1)),
+                              int128_makes64(Vk->D(2 * i)));
+    }
 }
 
 DO_ODD_EVEN(vhsubw_h_b, 16, H, B, DO_SUB)
@@ -50,11 +53,16 @@ DO_ODD_EVEN(vhsubw_d_w, 64, D, W, DO_SUB)
 
 void HELPER(vhsubw_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_sub(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_sub(int128_makes64(Vj->D(2 * i + 1)),
+                              int128_makes64(Vk->D(2 * i)));
+    }
 }
 
 DO_ODD_EVEN(vhaddw_hu_bu, 16, UH, UB, DO_ADD)
@@ -63,12 +71,16 @@ DO_ODD_EVEN(vhaddw_du_wu, 64, UD, UW, DO_ADD)
 
 void HELPER(vhaddw_qu_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
-                          int128_make64((uint64_t)Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i ++) {
+        Vd->Q(i) = int128_add(int128_make64(Vj->UD(2 * i + 1)),
+                              int128_make64(Vk->UD(2 * i)));
+    }
 }
 
 DO_ODD_EVEN(vhsubw_hu_bu, 16, UH, UB, DO_SUB)
@@ -77,12 +89,16 @@ DO_ODD_EVEN(vhsubw_du_wu, 64, UD, UW, DO_SUB)
 
 void HELPER(vhsubw_qu_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(1)),
-                          int128_make64((uint64_t)Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_sub(int128_make64(Vj->UD(2 * i + 1)),
+                              int128_make64(Vk->UD(2 * i)));
+    }
 }
 
 #define DO_EVEN(NAME, BIT, E1, E2, DO_OP)                        \
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index c818a09312..90c9ccce4f 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -82,6 +82,23 @@ TRANS(xvssub_hu, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_ussub)
 TRANS(xvssub_wu, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_ussub)
 TRANS(xvssub_du, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_ussub)
 
+TRANS(xvhaddw_h_b, LASX, gen_vvv, 32, gen_helper_vhaddw_h_b)
+TRANS(xvhaddw_w_h, LASX, gen_vvv, 32, gen_helper_vhaddw_w_h)
+TRANS(xvhaddw_d_w, LASX, gen_vvv, 32, gen_helper_vhaddw_d_w)
+TRANS(xvhaddw_q_d, LASX, gen_vvv, 32, gen_helper_vhaddw_q_d)
+TRANS(xvhaddw_hu_bu, LASX, gen_vvv, 32, gen_helper_vhaddw_hu_bu)
+TRANS(xvhaddw_wu_hu, LASX, gen_vvv, 32, gen_helper_vhaddw_wu_hu)
+TRANS(xvhaddw_du_wu, LASX, gen_vvv, 32, gen_helper_vhaddw_du_wu)
+TRANS(xvhaddw_qu_du, LASX, gen_vvv, 32, gen_helper_vhaddw_qu_du)
+TRANS(xvhsubw_h_b, LASX, gen_vvv, 32, gen_helper_vhsubw_h_b)
+TRANS(xvhsubw_w_h, LASX, gen_vvv, 32, gen_helper_vhsubw_w_h)
+TRANS(xvhsubw_d_w, LASX, gen_vvv, 32, gen_helper_vhsubw_d_w)
+TRANS(xvhsubw_q_d, LASX, gen_vvv, 32, gen_helper_vhsubw_q_d)
+TRANS(xvhsubw_hu_bu, LASX, gen_vvv, 32, gen_helper_vhsubw_hu_bu)
+TRANS(xvhsubw_wu_hu, LASX, gen_vvv, 32, gen_helper_vhsubw_wu_hu)
+TRANS(xvhsubw_du_wu, LASX, gen_vvv, 32, gen_helper_vhsubw_du_wu)
+TRANS(xvhsubw_qu_du, LASX, gen_vvv, 32, gen_helper_vhsubw_qu_du)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw
  2023-08-30  8:48 ` [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
@ 2023-08-30 18:12   ` Richard Henderson
  2023-08-31  7:17     ` gaosong
  0 siblings, 1 reply; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 18:12 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -47,4 +47,7 @@
>   #define Q(x)  Q[x]
>   #endif /* HOST_BIG_ENDIAN */
>   
> +#define DO_ADD(a, b)  (a + b)
> +#define DO_SUB(a, b)  (a - b)
> +

Why are these moved?


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw
  2023-08-30 18:12   ` Richard Henderson
@ 2023-08-31  7:17     ` gaosong
  2023-08-31 15:06       ` Richard Henderson
  0 siblings, 1 reply; 86+ messages in thread
From: gaosong @ 2023-08-31  7:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel

在 2023/8/31 上午2:12, Richard Henderson 写道:
> On 8/30/23 01:48, Song Gao wrote:
>> --- a/target/loongarch/vec.h
>> +++ b/target/loongarch/vec.h
>> @@ -47,4 +47,7 @@
>>   #define Q(x)  Q[x]
>>   #endif /* HOST_BIG_ENDIAN */
>> +#define DO_ADD(a, b)  (a + b)
>> +#define DO_SUB(a, b)  (a - b)
>> +
> 
> Why are these moved?
> 
I want to move simple macros together.

Thanks.
Song Gao



^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw
  2023-08-31  7:17     ` gaosong
@ 2023-08-31 15:06       ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-31 15:06 UTC (permalink / raw)
  To: gaosong, qemu-devel

On 8/31/23 00:17, gaosong wrote:
> 在 2023/8/31 上午2:12, Richard Henderson 写道:
>> On 8/30/23 01:48, Song Gao wrote:
>>> --- a/target/loongarch/vec.h
>>> +++ b/target/loongarch/vec.h
>>> @@ -47,4 +47,7 @@
>>>   #define Q(x)  Q[x]
>>>   #endif /* HOST_BIG_ENDIAN */
>>> +#define DO_ADD(a, b)  (a + b)
>>> +#define DO_SUB(a, b)  (a - b)
>>> +
>>
>> Why are these moved?
>>
> I want to move simple macros together.

Ok.


r~



^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 12/48] target/loongarch: Implement xvaddw/xvsubw
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (10 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 13/48] target/loongarch: Implement xavg/xvagr Song Gao
                   ` (35 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVSUBW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  45 +++++++
 target/loongarch/disas.c                     |  43 +++++++
 target/loongarch/vec_helper.c                | 121 +++++++++++++------
 target/loongarch/insn_trans/trans_lasx.c.inc |  45 +++++++
 4 files changed, 220 insertions(+), 34 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ba0b36f4a7..e1d8b30179 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1361,6 +1361,51 @@ xvhsubw_wu_hu    0111 01000101 10101 ..... ..... .....    @vvv
 xvhsubw_du_wu    0111 01000101 10110 ..... ..... .....    @vvv
 xvhsubw_qu_du    0111 01000101 10111 ..... ..... .....    @vvv
 
+xvaddwev_h_b     0111 01000001 11100 ..... ..... .....    @vvv
+xvaddwev_w_h     0111 01000001 11101 ..... ..... .....    @vvv
+xvaddwev_d_w     0111 01000001 11110 ..... ..... .....    @vvv
+xvaddwev_q_d     0111 01000001 11111 ..... ..... .....    @vvv
+xvaddwod_h_b     0111 01000010 00100 ..... ..... .....    @vvv
+xvaddwod_w_h     0111 01000010 00101 ..... ..... .....    @vvv
+xvaddwod_d_w     0111 01000010 00110 ..... ..... .....    @vvv
+xvaddwod_q_d     0111 01000010 00111 ..... ..... .....    @vvv
+
+xvsubwev_h_b     0111 01000010 00000 ..... ..... .....    @vvv
+xvsubwev_w_h     0111 01000010 00001 ..... ..... .....    @vvv
+xvsubwev_d_w     0111 01000010 00010 ..... ..... .....    @vvv
+xvsubwev_q_d     0111 01000010 00011 ..... ..... .....    @vvv
+xvsubwod_h_b     0111 01000010 01000 ..... ..... .....    @vvv
+xvsubwod_w_h     0111 01000010 01001 ..... ..... .....    @vvv
+xvsubwod_d_w     0111 01000010 01010 ..... ..... .....    @vvv
+xvsubwod_q_d     0111 01000010 01011 ..... ..... .....    @vvv
+
+xvaddwev_h_bu    0111 01000010 11100 ..... ..... .....    @vvv
+xvaddwev_w_hu    0111 01000010 11101 ..... ..... .....    @vvv
+xvaddwev_d_wu    0111 01000010 11110 ..... ..... .....    @vvv
+xvaddwev_q_du    0111 01000010 11111 ..... ..... .....    @vvv
+xvaddwod_h_bu    0111 01000011 00100 ..... ..... .....    @vvv
+xvaddwod_w_hu    0111 01000011 00101 ..... ..... .....    @vvv
+xvaddwod_d_wu    0111 01000011 00110 ..... ..... .....    @vvv
+xvaddwod_q_du    0111 01000011 00111 ..... ..... .....    @vvv
+
+xvsubwev_h_bu    0111 01000011 00000 ..... ..... .....    @vvv
+xvsubwev_w_hu    0111 01000011 00001 ..... ..... .....    @vvv
+xvsubwev_d_wu    0111 01000011 00010 ..... ..... .....    @vvv
+xvsubwev_q_du    0111 01000011 00011 ..... ..... .....    @vvv
+xvsubwod_h_bu    0111 01000011 01000 ..... ..... .....    @vvv
+xvsubwod_w_hu    0111 01000011 01001 ..... ..... .....    @vvv
+xvsubwod_d_wu    0111 01000011 01010 ..... ..... .....    @vvv
+xvsubwod_q_du    0111 01000011 01011 ..... ..... .....    @vvv
+
+xvaddwev_h_bu_b  0111 01000011 11100 ..... ..... .....    @vvv
+xvaddwev_w_hu_h  0111 01000011 11101 ..... ..... .....    @vvv
+xvaddwev_d_wu_w  0111 01000011 11110 ..... ..... .....    @vvv
+xvaddwev_q_du_d  0111 01000011 11111 ..... ..... .....    @vvv
+xvaddwod_h_bu_b  0111 01000100 00000 ..... ..... .....    @vvv
+xvaddwod_w_hu_h  0111 01000100 00001 ..... ..... .....    @vvv
+xvaddwod_d_wu_w  0111 01000100 00010 ..... ..... .....    @vvv
+xvaddwod_q_du_d  0111 01000100 00011 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e188220519..6972e33833 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1782,6 +1782,49 @@ INSN_LASX(xvhsubw_wu_hu,     vvv)
 INSN_LASX(xvhsubw_du_wu,     vvv)
 INSN_LASX(xvhsubw_qu_du,     vvv)
 
+INSN_LASX(xvaddwev_h_b,      vvv)
+INSN_LASX(xvaddwev_w_h,      vvv)
+INSN_LASX(xvaddwev_d_w,      vvv)
+INSN_LASX(xvaddwev_q_d,      vvv)
+INSN_LASX(xvaddwod_h_b,      vvv)
+INSN_LASX(xvaddwod_w_h,      vvv)
+INSN_LASX(xvaddwod_d_w,      vvv)
+INSN_LASX(xvaddwod_q_d,      vvv)
+INSN_LASX(xvsubwev_h_b,      vvv)
+INSN_LASX(xvsubwev_w_h,      vvv)
+INSN_LASX(xvsubwev_d_w,      vvv)
+INSN_LASX(xvsubwev_q_d,      vvv)
+INSN_LASX(xvsubwod_h_b,      vvv)
+INSN_LASX(xvsubwod_w_h,      vvv)
+INSN_LASX(xvsubwod_d_w,      vvv)
+INSN_LASX(xvsubwod_q_d,      vvv)
+
+INSN_LASX(xvaddwev_h_bu,     vvv)
+INSN_LASX(xvaddwev_w_hu,     vvv)
+INSN_LASX(xvaddwev_d_wu,     vvv)
+INSN_LASX(xvaddwev_q_du,     vvv)
+INSN_LASX(xvaddwod_h_bu,     vvv)
+INSN_LASX(xvaddwod_w_hu,     vvv)
+INSN_LASX(xvaddwod_d_wu,     vvv)
+INSN_LASX(xvaddwod_q_du,     vvv)
+INSN_LASX(xvsubwev_h_bu,     vvv)
+INSN_LASX(xvsubwev_w_hu,     vvv)
+INSN_LASX(xvsubwev_d_wu,     vvv)
+INSN_LASX(xvsubwev_q_du,     vvv)
+INSN_LASX(xvsubwod_h_bu,     vvv)
+INSN_LASX(xvsubwod_w_hu,     vvv)
+INSN_LASX(xvsubwod_d_wu,     vvv)
+INSN_LASX(xvsubwod_q_du,     vvv)
+
+INSN_LASX(xvaddwev_h_bu_b,   vvv)
+INSN_LASX(xvaddwev_w_hu_h,   vvv)
+INSN_LASX(xvaddwev_d_wu_w,   vvv)
+INSN_LASX(xvaddwev_q_du_d,   vvv)
+INSN_LASX(xvaddwod_h_bu_b,   vvv)
+INSN_LASX(xvaddwod_w_hu_h,   vvv)
+INSN_LASX(xvaddwod_d_wu_w,   vvv)
+INSN_LASX(xvaddwod_q_du_d,   vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index b6c0b3fda8..fffc67ce93 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -13,6 +13,7 @@
 #include "internals.h"
 #include "tcg/tcg.h"
 #include "vec.h"
+#include "tcg/tcg-gvec-desc.h"
 
 #define DO_ODD_EVEN(NAME, BIT, E1, E2, DO_OP)                        \
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)       \
@@ -102,133 +103,173 @@ void HELPER(vhsubw_qu_du)(void *vd, void *vj, void *vk, uint32_t desc)
 }
 
 #define DO_EVEN(NAME, BIT, E1, E2, DO_OP)                        \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)   \
 {                                                                \
     int i;                                                       \
     VReg *Vd = (VReg *)vd;                                       \
     VReg *Vj = (VReg *)vj;                                       \
     VReg *Vk = (VReg *)vk;                                       \
     typedef __typeof(Vd->E1(0)) TD;                              \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                          \
+    int oprsz = simd_oprsz(desc);                                \
+                                                                 \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                    \
         Vd->E1(i) = DO_OP((TD)Vj->E2(2 * i) ,(TD)Vk->E2(2 * i)); \
     }                                                            \
 }
 
 #define DO_ODD(NAME, BIT, E1, E2, DO_OP)                                 \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)              \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)           \
 {                                                                        \
     int i;                                                               \
     VReg *Vd = (VReg *)vd;                                               \
     VReg *Vj = (VReg *)vj;                                               \
     VReg *Vk = (VReg *)vk;                                               \
     typedef __typeof(Vd->E1(0)) TD;                                      \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                  \
+    int oprsz = simd_oprsz(desc);                                        \
+                                                                         \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                            \
         Vd->E1(i) = DO_OP((TD)Vj->E2(2 * i + 1), (TD)Vk->E2(2 * i + 1)); \
     }                                                                    \
 }
 
-void HELPER(vaddwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vaddwev_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_makes64(Vj->D(0)), int128_makes64(Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_add(int128_makes64(Vj->D(2 * i)),
+                              int128_makes64(Vk->D(2 * i)));
+    }
 }
 
 DO_EVEN(vaddwev_h_b, 16, H, B, DO_ADD)
 DO_EVEN(vaddwev_w_h, 32, W, H, DO_ADD)
 DO_EVEN(vaddwev_d_w, 64, D, W, DO_ADD)
 
-void HELPER(vaddwod_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vaddwod_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(1)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_add(int128_makes64(Vj->D(2 * i +1)),
+                              int128_makes64(Vk->D(2 * i +1)));
+    }
 }
 
 DO_ODD(vaddwod_h_b, 16, H, B, DO_ADD)
 DO_ODD(vaddwod_w_h, 32, W, H, DO_ADD)
 DO_ODD(vaddwod_d_w, 64, D, W, DO_ADD)
 
-void HELPER(vsubwev_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vsubwev_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_sub(int128_makes64(Vj->D(0)), int128_makes64(Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_sub(int128_makes64(Vj->D(2 * i)),
+                              int128_makes64(Vk->D(2 * i)));
+    }
 }
 
 DO_EVEN(vsubwev_h_b, 16, H, B, DO_SUB)
 DO_EVEN(vsubwev_w_h, 32, W, H, DO_SUB)
 DO_EVEN(vsubwev_d_w, 64, D, W, DO_SUB)
 
-void HELPER(vsubwod_q_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vsubwod_q_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_sub(int128_makes64(Vj->D(1)), int128_makes64(Vk->D(1)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_sub(int128_makes64(Vj->D(2 * i + 1)),
+                              int128_makes64(Vk->D(2 * i + 1)));
+    }
 }
 
 DO_ODD(vsubwod_h_b, 16, H, B, DO_SUB)
 DO_ODD(vsubwod_w_h, 32, W, H, DO_SUB)
 DO_ODD(vsubwod_d_w, 64, D, W, DO_SUB)
 
-void HELPER(vaddwev_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vaddwev_q_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(0)),
-                          int128_make64((uint64_t)Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_add(int128_make64(Vj->UD(2 * i)),
+                              int128_make64(Vk->UD(2 * i)));
+    }
 }
 
 DO_EVEN(vaddwev_h_bu, 16, UH, UB, DO_ADD)
 DO_EVEN(vaddwev_w_hu, 32, UW, UH, DO_ADD)
 DO_EVEN(vaddwev_d_wu, 64, UD, UW, DO_ADD)
 
-void HELPER(vaddwod_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vaddwod_q_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
-                          int128_make64((uint64_t)Vk->D(1)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_add(int128_make64(Vj->UD(2 * i + 1)),
+                              int128_make64(Vk->UD(2 * i + 1)));
+    }
 }
 
 DO_ODD(vaddwod_h_bu, 16, UH, UB, DO_ADD)
 DO_ODD(vaddwod_w_hu, 32, UW, UH, DO_ADD)
 DO_ODD(vaddwod_d_wu, 64, UD, UW, DO_ADD)
 
-void HELPER(vsubwev_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vsubwev_q_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(0)),
-                          int128_make64((uint64_t)Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_sub(int128_make64(Vj->UD(2 * i)),
+                              int128_make64(Vk->UD(2 * i)));
+    }
 }
 
 DO_EVEN(vsubwev_h_bu, 16, UH, UB, DO_SUB)
 DO_EVEN(vsubwev_w_hu, 32, UW, UH, DO_SUB)
 DO_EVEN(vsubwev_d_wu, 64, UD, UW, DO_SUB)
 
-void HELPER(vsubwod_q_du)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vsubwod_q_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_sub(int128_make64((uint64_t)Vj->D(1)),
-                          int128_make64((uint64_t)Vk->D(1)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_sub(int128_make64(Vj->UD(2 * i + 1)),
+                              int128_make64(Vk->UD(2 * i + 1)));
+    }
 }
 
 DO_ODD(vsubwod_h_bu, 16, UH, UB, DO_SUB)
@@ -236,7 +277,7 @@ DO_ODD(vsubwod_w_hu, 32, UW, UH, DO_SUB)
 DO_ODD(vsubwod_d_wu, 64, UD, UW, DO_SUB)
 
 #define DO_EVEN_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)             \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)           \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)        \
 {                                                                     \
     int i;                                                            \
     VReg *Vd = (VReg *)vd;                                            \
@@ -244,13 +285,15 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)           \
     VReg *Vk = (VReg *)vk;                                            \
     typedef __typeof(Vd->ES1(0)) TDS;                                 \
     typedef __typeof(Vd->EU1(0)) TDU;                                 \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                               \
+    int oprsz = simd_oprsz(desc);                                     \
+                                                                      \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                         \
         Vd->ES1(i) = DO_OP((TDU)Vj->EU2(2 * i) ,(TDS)Vk->ES2(2 * i)); \
     }                                                                 \
 }
 
 #define DO_ODD_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)                      \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)                   \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
 {                                                                             \
     int i;                                                                    \
     VReg *Vd = (VReg *)vd;                                                    \
@@ -258,33 +301,43 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)                   \
     VReg *Vk = (VReg *)vk;                                                    \
     typedef __typeof(Vd->ES1(0)) TDS;                                         \
     typedef __typeof(Vd->EU1(0)) TDU;                                         \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
+    int oprsz = simd_oprsz(desc);                                             \
+                                                                              \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                                 \
         Vd->ES1(i) = DO_OP((TDU)Vj->EU2(2 * i + 1), (TDS)Vk->ES2(2 * i + 1)); \
     }                                                                         \
 }
 
-void HELPER(vaddwev_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vaddwev_q_du_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(0)),
-                          int128_makes64(Vk->D(0)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_add(int128_make64(Vj->UD(2 * i)),
+                              int128_makes64(Vk->D(2 * i)));
+    }
 }
 
 DO_EVEN_U_S(vaddwev_h_bu_b, 16, H, UH, B, UB, DO_ADD)
 DO_EVEN_U_S(vaddwev_w_hu_h, 32, W, UW, H, UH, DO_ADD)
 DO_EVEN_U_S(vaddwev_d_wu_w, 64, D, UD, W, UW, DO_ADD)
 
-void HELPER(vaddwod_q_du_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vaddwod_q_du_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_add(int128_make64((uint64_t)Vj->D(1)),
-                          int128_makes64(Vk->D(1)));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_add(int128_make64(Vj->UD(2 * i + 1)),
+                              int128_makes64(Vk->D(2 * i + 1)));
+    }
 }
 
 DO_ODD_U_S(vaddwod_h_bu_b, 16, H, UH, B, UB, DO_ADD)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 90c9ccce4f..922222bd78 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -99,6 +99,51 @@ TRANS(xvhsubw_wu_hu, LASX, gen_vvv, 32, gen_helper_vhsubw_wu_hu)
 TRANS(xvhsubw_du_wu, LASX, gen_vvv, 32, gen_helper_vhsubw_du_wu)
 TRANS(xvhsubw_qu_du, LASX, gen_vvv, 32, gen_helper_vhsubw_qu_du)
 
+TRANS(xvaddwev_h_b, LASX, gvec_vvv, 32, MO_8, do_vaddwev_s)
+TRANS(xvaddwev_w_h, LASX, gvec_vvv, 32, MO_16, do_vaddwev_s)
+TRANS(xvaddwev_d_w, LASX, gvec_vvv, 32, MO_32, do_vaddwev_s)
+TRANS(xvaddwev_q_d, LASX, gvec_vvv, 32, MO_64, do_vaddwev_s)
+TRANS(xvaddwod_h_b, LASX, gvec_vvv, 32, MO_8, do_vaddwod_s)
+TRANS(xvaddwod_w_h, LASX, gvec_vvv, 32, MO_16, do_vaddwod_s)
+TRANS(xvaddwod_d_w, LASX, gvec_vvv, 32, MO_32, do_vaddwod_s)
+TRANS(xvaddwod_q_d, LASX, gvec_vvv, 32, MO_64, do_vaddwod_s)
+
+TRANS(xvsubwev_h_b, LASX, gvec_vvv, 32, MO_8, do_vsubwev_s)
+TRANS(xvsubwev_w_h, LASX, gvec_vvv, 32, MO_16, do_vsubwev_s)
+TRANS(xvsubwev_d_w, LASX, gvec_vvv, 32, MO_32, do_vsubwev_s)
+TRANS(xvsubwev_q_d, LASX, gvec_vvv, 32, MO_64, do_vsubwev_s)
+TRANS(xvsubwod_h_b, LASX, gvec_vvv, 32, MO_8, do_vsubwod_s)
+TRANS(xvsubwod_w_h, LASX, gvec_vvv, 32, MO_16, do_vsubwod_s)
+TRANS(xvsubwod_d_w, LASX, gvec_vvv, 32, MO_32, do_vsubwod_s)
+TRANS(xvsubwod_q_d, LASX, gvec_vvv, 32, MO_64, do_vsubwod_s)
+
+TRANS(xvaddwev_h_bu, LASX, gvec_vvv, 32, MO_8, do_vaddwev_u)
+TRANS(xvaddwev_w_hu, LASX, gvec_vvv, 32, MO_16, do_vaddwev_u)
+TRANS(xvaddwev_d_wu, LASX, gvec_vvv, 32, MO_32, do_vaddwev_u)
+TRANS(xvaddwev_q_du, LASX, gvec_vvv, 32, MO_64, do_vaddwev_u)
+TRANS(xvaddwod_h_bu, LASX, gvec_vvv, 32, MO_8, do_vaddwod_u)
+TRANS(xvaddwod_w_hu, LASX, gvec_vvv, 32, MO_16, do_vaddwod_u)
+TRANS(xvaddwod_d_wu, LASX, gvec_vvv, 32, MO_32, do_vaddwod_u)
+TRANS(xvaddwod_q_du, LASX, gvec_vvv, 32, MO_64, do_vaddwod_u)
+
+TRANS(xvsubwev_h_bu, LASX, gvec_vvv, 32, MO_8, do_vsubwev_u)
+TRANS(xvsubwev_w_hu, LASX, gvec_vvv, 32, MO_16, do_vsubwev_u)
+TRANS(xvsubwev_d_wu, LASX, gvec_vvv, 32, MO_32, do_vsubwev_u)
+TRANS(xvsubwev_q_du, LASX, gvec_vvv, 32, MO_64, do_vsubwev_u)
+TRANS(xvsubwod_h_bu, LASX, gvec_vvv, 32, MO_8, do_vsubwod_u)
+TRANS(xvsubwod_w_hu, LASX, gvec_vvv, 32, MO_16, do_vsubwod_u)
+TRANS(xvsubwod_d_wu, LASX, gvec_vvv, 32, MO_32, do_vsubwod_u)
+TRANS(xvsubwod_q_du, LASX, gvec_vvv, 32, MO_64, do_vsubwod_u)
+
+TRANS(xvaddwev_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vaddwev_u_s)
+TRANS(xvaddwev_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vaddwev_u_s)
+TRANS(xvaddwev_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vaddwev_u_s)
+TRANS(xvaddwev_q_du_d, LASX, gvec_vvv, 32, MO_64, do_vaddwev_u_s)
+TRANS(xvaddwod_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vaddwod_u_s)
+TRANS(xvaddwod_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vaddwod_u_s)
+TRANS(xvaddwod_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vaddwod_u_s)
+TRANS(xvaddwod_q_du_d, LASX, gvec_vvv, 32, MO_64, do_vaddwod_u_s)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 13/48] target/loongarch: Implement xavg/xvagr
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (11 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 12/48] target/loongarch: Implement xvaddw/xvsubw Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 18:14   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 14/48] target/loongarch: Implement xvabsd Song Gao
                   ` (34 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVAVG.{B/H/W/D/}[U];
- XVAVGR.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  3 +++
 target/loongarch/insns.decode                | 17 +++++++++++++
 target/loongarch/disas.c                     | 17 +++++++++++++
 target/loongarch/vec_helper.c                | 25 ++++++++++----------
 target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++++++
 5 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 5332dff83c..6ac6b22f20 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -50,4 +50,7 @@
 #define DO_ADD(a, b)  (a + b)
 #define DO_SUB(a, b)  (a - b)
 
+#define DO_VAVG(a, b)  ((a >> 1) + (b >> 1) + (a & b & 1))
+#define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index e1d8b30179..a2cb39750d 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1406,6 +1406,23 @@ xvaddwod_w_hu_h  0111 01000100 00001 ..... ..... .....    @vvv
 xvaddwod_d_wu_w  0111 01000100 00010 ..... ..... .....    @vvv
 xvaddwod_q_du_d  0111 01000100 00011 ..... ..... .....    @vvv
 
+xvavg_b          0111 01000110 01000 ..... ..... .....    @vvv
+xvavg_h          0111 01000110 01001 ..... ..... .....    @vvv
+xvavg_w          0111 01000110 01010 ..... ..... .....    @vvv
+xvavg_d          0111 01000110 01011 ..... ..... .....    @vvv
+xvavg_bu         0111 01000110 01100 ..... ..... .....    @vvv
+xvavg_hu         0111 01000110 01101 ..... ..... .....    @vvv
+xvavg_wu         0111 01000110 01110 ..... ..... .....    @vvv
+xvavg_du         0111 01000110 01111 ..... ..... .....    @vvv
+xvavgr_b         0111 01000110 10000 ..... ..... .....    @vvv
+xvavgr_h         0111 01000110 10001 ..... ..... .....    @vvv
+xvavgr_w         0111 01000110 10010 ..... ..... .....    @vvv
+xvavgr_d         0111 01000110 10011 ..... ..... .....    @vvv
+xvavgr_bu        0111 01000110 10100 ..... ..... .....    @vvv
+xvavgr_hu        0111 01000110 10101 ..... ..... .....    @vvv
+xvavgr_wu        0111 01000110 10110 ..... ..... .....    @vvv
+xvavgr_du        0111 01000110 10111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6972e33833..8296aafa98 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1825,6 +1825,23 @@ INSN_LASX(xvaddwod_w_hu_h,   vvv)
 INSN_LASX(xvaddwod_d_wu_w,   vvv)
 INSN_LASX(xvaddwod_q_du_d,   vvv)
 
+INSN_LASX(xvavg_b,           vvv)
+INSN_LASX(xvavg_h,           vvv)
+INSN_LASX(xvavg_w,           vvv)
+INSN_LASX(xvavg_d,           vvv)
+INSN_LASX(xvavg_bu,          vvv)
+INSN_LASX(xvavg_hu,          vvv)
+INSN_LASX(xvavg_wu,          vvv)
+INSN_LASX(xvavg_du,          vvv)
+INSN_LASX(xvavgr_b,          vvv)
+INSN_LASX(xvavgr_h,          vvv)
+INSN_LASX(xvavgr_w,          vvv)
+INSN_LASX(xvavgr_d,          vvv)
+INSN_LASX(xvavgr_bu,         vvv)
+INSN_LASX(xvavgr_hu,         vvv)
+INSN_LASX(xvavgr_wu,         vvv)
+INSN_LASX(xvavgr_du,         vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index fffc67ce93..a5d425e965 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -344,19 +344,18 @@ DO_ODD_U_S(vaddwod_h_bu_b, 16, H, UH, B, UB, DO_ADD)
 DO_ODD_U_S(vaddwod_w_hu_h, 32, W, UW, H, UH, DO_ADD)
 DO_ODD_U_S(vaddwod_d_wu_w, 64, D, UD, W, UW, DO_ADD)
 
-#define DO_VAVG(a, b)  ((a >> 1) + (b >> 1) + (a & b & 1))
-#define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
-
-#define DO_3OP(NAME, BIT, E, DO_OP)                         \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));               \
-    }                                                       \
+#define DO_3OP(NAME, BIT, E, DO_OP)                            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));                  \
+    }                                                          \
 }
 
 DO_3OP(vavg_b, 8, B, DO_VAVG)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 922222bd78..bcd4b03afc 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -144,6 +144,23 @@ TRANS(xvaddwod_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vaddwod_u_s)
 TRANS(xvaddwod_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vaddwod_u_s)
 TRANS(xvaddwod_q_du_d, LASX, gvec_vvv, 32, MO_64, do_vaddwod_u_s)
 
+TRANS(xvavg_b, LASX, gvec_vvv, 32, MO_8, do_vavg_s)
+TRANS(xvavg_h, LASX, gvec_vvv, 32, MO_16, do_vavg_s)
+TRANS(xvavg_w, LASX, gvec_vvv, 32, MO_32, do_vavg_s)
+TRANS(xvavg_d, LASX, gvec_vvv, 32, MO_64, do_vavg_s)
+TRANS(xvavg_bu, LASX, gvec_vvv, 32, MO_8, do_vavg_u)
+TRANS(xvavg_hu, LASX, gvec_vvv, 32, MO_16, do_vavg_u)
+TRANS(xvavg_wu, LASX, gvec_vvv, 32, MO_32, do_vavg_u)
+TRANS(xvavg_du, LASX, gvec_vvv, 32, MO_64, do_vavg_u)
+TRANS(xvavgr_b, LASX, gvec_vvv, 32, MO_8, do_vavgr_s)
+TRANS(xvavgr_h, LASX, gvec_vvv, 32, MO_16, do_vavgr_s)
+TRANS(xvavgr_w, LASX, gvec_vvv, 32, MO_32, do_vavgr_s)
+TRANS(xvavgr_d, LASX, gvec_vvv, 32, MO_64, do_vavgr_s)
+TRANS(xvavgr_bu, LASX, gvec_vvv, 32, MO_8, do_vavgr_u)
+TRANS(xvavgr_hu, LASX, gvec_vvv, 32, MO_16, do_vavgr_u)
+TRANS(xvavgr_wu, LASX, gvec_vvv, 32, MO_32, do_vavgr_u)
+TRANS(xvavgr_du, LASX, gvec_vvv, 32, MO_64, do_vavgr_u)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 13/48] target/loongarch: Implement xavg/xvagr
  2023-08-30  8:48 ` [PATCH v4 13/48] target/loongarch: Implement xavg/xvagr Song Gao
@ 2023-08-30 18:14   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 18:14 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -50,4 +50,7 @@
>   #define DO_ADD(a, b)  (a + b)
>   #define DO_SUB(a, b)  (a - b)
>   
> +#define DO_VAVG(a, b)  ((a >> 1) + (b >> 1) + (a & b & 1))
> +#define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
> +

No need to move.


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 14/48] target/loongarch: Implement xvabsd
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (12 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 13/48] target/loongarch: Implement xavg/xvagr Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 15/48] target/loongarch: Implement xvadda Song Gao
                   ` (33 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVABSD.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/vec.h                       | 2 ++
 target/loongarch/insns.decode                | 9 +++++++++
 target/loongarch/disas.c                     | 9 +++++++++
 target/loongarch/vec_helper.c                | 2 --
 target/loongarch/insn_trans/trans_lasx.c.inc | 9 +++++++++
 5 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 6ac6b22f20..6767073635 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -53,4 +53,6 @@
 #define DO_VAVG(a, b)  ((a >> 1) + (b >> 1) + (a & b & 1))
 #define DO_VAVGR(a, b) ((a >> 1) + (b >> 1) + ((a | b) & 1))
 
+#define DO_VABSD(a, b)  ((a > b) ? (a - b) : (b - a))
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a2cb39750d..c086ee9b22 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1423,6 +1423,15 @@ xvavgr_hu        0111 01000110 10101 ..... ..... .....    @vvv
 xvavgr_wu        0111 01000110 10110 ..... ..... .....    @vvv
 xvavgr_du        0111 01000110 10111 ..... ..... .....    @vvv
 
+xvabsd_b         0111 01000110 00000 ..... ..... .....    @vvv
+xvabsd_h         0111 01000110 00001 ..... ..... .....    @vvv
+xvabsd_w         0111 01000110 00010 ..... ..... .....    @vvv
+xvabsd_d         0111 01000110 00011 ..... ..... .....    @vvv
+xvabsd_bu        0111 01000110 00100 ..... ..... .....    @vvv
+xvabsd_hu        0111 01000110 00101 ..... ..... .....    @vvv
+xvabsd_wu        0111 01000110 00110 ..... ..... .....    @vvv
+xvabsd_du        0111 01000110 00111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8296aafa98..d0b1de39b8 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1842,6 +1842,15 @@ INSN_LASX(xvavgr_hu,         vvv)
 INSN_LASX(xvavgr_wu,         vvv)
 INSN_LASX(xvavgr_du,         vvv)
 
+INSN_LASX(xvabsd_b,          vvv)
+INSN_LASX(xvabsd_h,          vvv)
+INSN_LASX(xvabsd_w,          vvv)
+INSN_LASX(xvabsd_d,          vvv)
+INSN_LASX(xvabsd_bu,         vvv)
+INSN_LASX(xvabsd_hu,         vvv)
+INSN_LASX(xvabsd_wu,         vvv)
+INSN_LASX(xvabsd_du,         vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index a5d425e965..939ea11f19 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -375,8 +375,6 @@ DO_3OP(vavgr_hu, 16, UH, DO_VAVGR)
 DO_3OP(vavgr_wu, 32, UW, DO_VAVGR)
 DO_3OP(vavgr_du, 64, UD, DO_VAVGR)
 
-#define DO_VABSD(a, b)  ((a > b) ? (a -b) : (b-a))
-
 DO_3OP(vabsd_b, 8, B, DO_VABSD)
 DO_3OP(vabsd_h, 16, H, DO_VABSD)
 DO_3OP(vabsd_w, 32, W, DO_VABSD)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index bcd4b03afc..2be165a839 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -161,6 +161,15 @@ TRANS(xvavgr_hu, LASX, gvec_vvv, 32, MO_16, do_vavgr_u)
 TRANS(xvavgr_wu, LASX, gvec_vvv, 32, MO_32, do_vavgr_u)
 TRANS(xvavgr_du, LASX, gvec_vvv, 32, MO_64, do_vavgr_u)
 
+TRANS(xvabsd_b, LASX, gvec_vvv, 32, MO_8, do_vabsd_s)
+TRANS(xvabsd_h, LASX, gvec_vvv, 32, MO_16, do_vabsd_s)
+TRANS(xvabsd_w, LASX, gvec_vvv, 32, MO_32, do_vabsd_s)
+TRANS(xvabsd_d, LASX, gvec_vvv, 32, MO_64, do_vabsd_s)
+TRANS(xvabsd_bu, LASX, gvec_vvv, 32, MO_8, do_vabsd_u)
+TRANS(xvabsd_hu, LASX, gvec_vvv, 32, MO_16, do_vabsd_u)
+TRANS(xvabsd_wu, LASX, gvec_vvv, 32, MO_32, do_vabsd_u)
+TRANS(xvabsd_du, LASX, gvec_vvv, 32, MO_64, do_vabsd_u)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 15/48] target/loongarch: Implement xvadda
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (13 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 14/48] target/loongarch: Implement xvabsd Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 20:45   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 16/48] target/loongarch: Implement xvmax/xvmin Song Gao
                   ` (32 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVADDA.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  2 ++
 target/loongarch/insns.decode                |  5 ++++
 target/loongarch/disas.c                     |  5 ++++
 target/loongarch/vec_helper.c                | 24 ++++++++++----------
 target/loongarch/insn_trans/trans_lasx.c.inc |  5 ++++
 5 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 6767073635..7ccc89c10f 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -55,4 +55,6 @@
 
 #define DO_VABSD(a, b)  ((a > b) ? (a - b) : (b - a))
 
+#define DO_VABS(a)      ((a < 0) ? (-a) : (a))
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index c086ee9b22..f3722e3aa7 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1432,6 +1432,11 @@ xvabsd_hu        0111 01000110 00101 ..... ..... .....    @vvv
 xvabsd_wu        0111 01000110 00110 ..... ..... .....    @vvv
 xvabsd_du        0111 01000110 00111 ..... ..... .....    @vvv
 
+xvadda_b         0111 01000101 11000 ..... ..... .....    @vvv
+xvadda_h         0111 01000101 11001 ..... ..... .....    @vvv
+xvadda_w         0111 01000101 11010 ..... ..... .....    @vvv
+xvadda_d         0111 01000101 11011 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index d0b1de39b8..b48822e431 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1851,6 +1851,11 @@ INSN_LASX(xvabsd_hu,         vvv)
 INSN_LASX(xvabsd_wu,         vvv)
 INSN_LASX(xvabsd_du,         vvv)
 
+INSN_LASX(xvadda_b,          vvv)
+INSN_LASX(xvadda_h,          vvv)
+INSN_LASX(xvadda_w,          vvv)
+INSN_LASX(xvadda_d,          vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 939ea11f19..819fa5e033 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -384,18 +384,18 @@ DO_3OP(vabsd_hu, 16, UH, DO_VABSD)
 DO_3OP(vabsd_wu, 32, UW, DO_VABSD)
 DO_3OP(vabsd_du, 64, UD, DO_VABSD)
 
-#define DO_VABS(a)  ((a < 0) ? (-a) : (a))
-
-#define DO_VADDA(NAME, BIT, E, DO_OP)                       \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E(i) = DO_OP(Vj->E(i)) + DO_OP(Vk->E(i));       \
-    }                                                       \
+#define DO_VADDA(NAME, BIT, E, DO_OP)                          \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->E(i) = DO_OP(Vj->E(i)) + DO_OP(Vk->E(i));          \
+    }                                                          \
 }
 
 DO_VADDA(vadda_b, 8, B, DO_VABS)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 2be165a839..a3f2740f74 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -170,6 +170,11 @@ TRANS(xvabsd_hu, LASX, gvec_vvv, 32, MO_16, do_vabsd_u)
 TRANS(xvabsd_wu, LASX, gvec_vvv, 32, MO_32, do_vabsd_u)
 TRANS(xvabsd_du, LASX, gvec_vvv, 32, MO_64, do_vabsd_u)
 
+TRANS(xvadda_b, LASX, gvec_vvv, 32, MO_8, do_vadda)
+TRANS(xvadda_h, LASX, gvec_vvv, 32, MO_16, do_vadda)
+TRANS(xvadda_w, LASX, gvec_vvv, 32, MO_32, do_vadda)
+TRANS(xvadda_d, LASX, gvec_vvv, 32, MO_64, do_vadda)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 15/48] target/loongarch: Implement xvadda
  2023-08-30  8:48 ` [PATCH v4 15/48] target/loongarch: Implement xvadda Song Gao
@ 2023-08-30 20:45   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 20:45 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> -#define DO_VABS(a)  ((a < 0) ? (-a) : (a))
> -
> -#define DO_VADDA(NAME, BIT, E, DO_OP)                       \
> -void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
> -{                                                           \
> -    int i;                                                  \
> -    VReg *Vd = (VReg *)vd;                                  \
> -    VReg *Vj = (VReg *)vj;                                  \
> -    VReg *Vk = (VReg *)vk;                                  \
> -    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
> -        Vd->E(i) = DO_OP(Vj->E(i)) + DO_OP(Vk->E(i));       \
> -    }                                                       \
> +#define DO_VADDA(NAME, BIT, E, DO_OP)                          \
> +void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
> +{                                                              \
> +    int i;                                                     \
> +    VReg *Vd = (VReg *)vd;                                     \
> +    VReg *Vj = (VReg *)vj;                                     \
> +    VReg *Vk = (VReg *)vk;                                     \
> +    int oprsz = simd_oprsz(desc);                              \
> +                                                               \
> +    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
> +        Vd->E(i) = DO_OP(Vj->E(i)) + DO_OP(Vk->E(i));          \
> +    }                                                          \
>   }

No need to move DO_VABS, and indeed no need to pass it in as DO_OP, because DO_VADDA is 
only ever used with DO_VABS.

With that,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 16/48] target/loongarch: Implement xvmax/xvmin
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (14 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 15/48] target/loongarch: Implement xvadda Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 20:50   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 17/48] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
                   ` (31 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVMAX[I].{B/H/W/D}[U];
- XVMIN[I].{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  3 ++
 target/loongarch/insns.decode                | 36 ++++++++++++++++++++
 target/loongarch/disas.c                     | 34 ++++++++++++++++++
 target/loongarch/vec_helper.c                | 26 +++++++-------
 target/loongarch/insn_trans/trans_lasx.c.inc | 36 ++++++++++++++++++++
 5 files changed, 121 insertions(+), 14 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 7ccc89c10f..cd6f6a72fd 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -57,4 +57,7 @@
 
 #define DO_VABS(a)      ((a < 0) ? (-a) : (a))
 
+#define DO_MIN(a, b)    (a < b ? a : b)
+#define DO_MAX(a, b)    (a > b ? a : b)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index f3722e3aa7..99aefcb651 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1437,6 +1437,42 @@ xvadda_h         0111 01000101 11001 ..... ..... .....    @vvv
 xvadda_w         0111 01000101 11010 ..... ..... .....    @vvv
 xvadda_d         0111 01000101 11011 ..... ..... .....    @vvv
 
+xvmax_b          0111 01000111 00000 ..... ..... .....    @vvv
+xvmax_h          0111 01000111 00001 ..... ..... .....    @vvv
+xvmax_w          0111 01000111 00010 ..... ..... .....    @vvv
+xvmax_d          0111 01000111 00011 ..... ..... .....    @vvv
+xvmax_bu         0111 01000111 01000 ..... ..... .....    @vvv
+xvmax_hu         0111 01000111 01001 ..... ..... .....    @vvv
+xvmax_wu         0111 01000111 01010 ..... ..... .....    @vvv
+xvmax_du         0111 01000111 01011 ..... ..... .....    @vvv
+
+xvmaxi_b         0111 01101001 00000 ..... ..... .....    @vv_i5
+xvmaxi_h         0111 01101001 00001 ..... ..... .....    @vv_i5
+xvmaxi_w         0111 01101001 00010 ..... ..... .....    @vv_i5
+xvmaxi_d         0111 01101001 00011 ..... ..... .....    @vv_i5
+xvmaxi_bu        0111 01101001 01000 ..... ..... .....    @vv_ui5
+xvmaxi_hu        0111 01101001 01001 ..... ..... .....    @vv_ui5
+xvmaxi_wu        0111 01101001 01010 ..... ..... .....    @vv_ui5
+xvmaxi_du        0111 01101001 01011 ..... ..... .....    @vv_ui5
+
+xvmin_b          0111 01000111 00100 ..... ..... .....    @vvv
+xvmin_h          0111 01000111 00101 ..... ..... .....    @vvv
+xvmin_w          0111 01000111 00110 ..... ..... .....    @vvv
+xvmin_d          0111 01000111 00111 ..... ..... .....    @vvv
+xvmin_bu         0111 01000111 01100 ..... ..... .....    @vvv
+xvmin_hu         0111 01000111 01101 ..... ..... .....    @vvv
+xvmin_wu         0111 01000111 01110 ..... ..... .....    @vvv
+xvmin_du         0111 01000111 01111 ..... ..... .....    @vvv
+
+xvmini_b         0111 01101001 00100 ..... ..... .....    @vv_i5
+xvmini_h         0111 01101001 00101 ..... ..... .....    @vv_i5
+xvmini_w         0111 01101001 00110 ..... ..... .....    @vv_i5
+xvmini_d         0111 01101001 00111 ..... ..... .....    @vv_i5
+xvmini_bu        0111 01101001 01100 ..... ..... .....    @vv_ui5
+xvmini_hu        0111 01101001 01101 ..... ..... .....    @vv_ui5
+xvmini_wu        0111 01101001 01110 ..... ..... .....    @vv_ui5
+xvmini_du        0111 01101001 01111 ..... ..... .....    @vv_ui5
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b48822e431..63c1dc757f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1856,6 +1856,40 @@ INSN_LASX(xvadda_h,          vvv)
 INSN_LASX(xvadda_w,          vvv)
 INSN_LASX(xvadda_d,          vvv)
 
+INSN_LASX(xvmax_b,           vvv)
+INSN_LASX(xvmax_h,           vvv)
+INSN_LASX(xvmax_w,           vvv)
+INSN_LASX(xvmax_d,           vvv)
+INSN_LASX(xvmin_b,           vvv)
+INSN_LASX(xvmin_h,           vvv)
+INSN_LASX(xvmin_w,           vvv)
+INSN_LASX(xvmin_d,           vvv)
+INSN_LASX(xvmax_bu,          vvv)
+INSN_LASX(xvmax_hu,          vvv)
+INSN_LASX(xvmax_wu,          vvv)
+INSN_LASX(xvmax_du,          vvv)
+INSN_LASX(xvmin_bu,          vvv)
+INSN_LASX(xvmin_hu,          vvv)
+INSN_LASX(xvmin_wu,          vvv)
+INSN_LASX(xvmin_du,          vvv)
+
+INSN_LASX(xvmaxi_b,          vv_i)
+INSN_LASX(xvmaxi_h,          vv_i)
+INSN_LASX(xvmaxi_w,          vv_i)
+INSN_LASX(xvmaxi_d,          vv_i)
+INSN_LASX(xvmini_b,          vv_i)
+INSN_LASX(xvmini_h,          vv_i)
+INSN_LASX(xvmini_w,          vv_i)
+INSN_LASX(xvmini_d,          vv_i)
+INSN_LASX(xvmaxi_bu,         vv_i)
+INSN_LASX(xvmaxi_hu,         vv_i)
+INSN_LASX(xvmaxi_wu,         vv_i)
+INSN_LASX(xvmaxi_du,         vv_i)
+INSN_LASX(xvmini_bu,         vv_i)
+INSN_LASX(xvmini_hu,         vv_i)
+INSN_LASX(xvmini_wu,         vv_i)
+INSN_LASX(xvmini_du,         vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 819fa5e033..0c641d80c7 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -403,20 +403,18 @@ DO_VADDA(vadda_h, 16, H, DO_VABS)
 DO_VADDA(vadda_w, 32, W, DO_VABS)
 DO_VADDA(vadda_d, 64, D, DO_VABS)
 
-#define DO_MIN(a, b) (a < b ? a : b)
-#define DO_MAX(a, b) (a > b ? a : b)
-
-#define VMINMAXI(NAME, BIT, E, DO_OP)                           \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
-{                                                               \
-    int i;                                                      \
-    VReg *Vd = (VReg *)vd;                                      \
-    VReg *Vj = (VReg *)vj;                                      \
-    typedef __typeof(Vd->E(0)) TD;                              \
-                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
-        Vd->E(i) = DO_OP(Vj->E(i), (TD)imm);                    \
-    }                                                           \
+#define VMINMAXI(NAME, BIT, E, DO_OP)                              \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    typedef __typeof(Vd->E(0)) TD;                                 \
+    int oprsz = simd_oprsz(desc);                                  \
+                                                                   \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
+        Vd->E(i) = DO_OP(Vj->E(i), (TD)imm);                       \
+    }                                                              \
 }
 
 VMINMAXI(vmini_b, 8, B, DO_MIN)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index a3f2740f74..ba31da6578 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -175,6 +175,42 @@ TRANS(xvadda_h, LASX, gvec_vvv, 32, MO_16, do_vadda)
 TRANS(xvadda_w, LASX, gvec_vvv, 32, MO_32, do_vadda)
 TRANS(xvadda_d, LASX, gvec_vvv, 32, MO_64, do_vadda)
 
+TRANS(xvmax_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_smax)
+TRANS(xvmax_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_smax)
+TRANS(xvmax_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_smax)
+TRANS(xvmax_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_smax)
+TRANS(xvmax_bu, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_umax)
+TRANS(xvmax_hu, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_umax)
+TRANS(xvmax_wu, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_umax)
+TRANS(xvmax_du, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_umax)
+
+TRANS(xvmin_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_smin)
+TRANS(xvmin_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_smin)
+TRANS(xvmin_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_smin)
+TRANS(xvmin_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_smin)
+TRANS(xvmin_bu, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_umin)
+TRANS(xvmin_hu, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_umin)
+TRANS(xvmin_wu, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_umin)
+TRANS(xvmin_du, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_umin)
+
+TRANS(xvmini_b, LASX, gvec_vv_i, 32, MO_8, do_vmini_s)
+TRANS(xvmini_h, LASX, gvec_vv_i, 32, MO_16, do_vmini_s)
+TRANS(xvmini_w, LASX, gvec_vv_i, 32, MO_32, do_vmini_s)
+TRANS(xvmini_d, LASX, gvec_vv_i, 32, MO_64, do_vmini_s)
+TRANS(xvmini_bu, LASX, gvec_vv_i, 32, MO_8, do_vmini_u)
+TRANS(xvmini_hu, LASX, gvec_vv_i, 32, MO_16, do_vmini_u)
+TRANS(xvmini_wu, LASX, gvec_vv_i, 32, MO_32, do_vmini_u)
+TRANS(xvmini_du, LASX, gvec_vv_i, 32, MO_64, do_vmini_u)
+
+TRANS(xvmaxi_b, LASX, gvec_vv_i, 32, MO_8, do_vmaxi_s)
+TRANS(xvmaxi_h, LASX, gvec_vv_i, 32, MO_16, do_vmaxi_s)
+TRANS(xvmaxi_w, LASX, gvec_vv_i, 32, MO_32, do_vmaxi_s)
+TRANS(xvmaxi_d, LASX, gvec_vv_i, 32, MO_64, do_vmaxi_s)
+TRANS(xvmaxi_bu, LASX, gvec_vv_i, 32, MO_8, do_vmaxi_u)
+TRANS(xvmaxi_hu, LASX, gvec_vv_i, 32, MO_16, do_vmaxi_u)
+TRANS(xvmaxi_wu, LASX, gvec_vv_i, 32, MO_32, do_vmaxi_u)
+TRANS(xvmaxi_du, LASX, gvec_vv_i, 32, MO_64, do_vmaxi_u)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 16/48] target/loongarch: Implement xvmax/xvmin
  2023-08-30  8:48 ` [PATCH v4 16/48] target/loongarch: Implement xvmax/xvmin Song Gao
@ 2023-08-30 20:50   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 20:50 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVMAX[I].{B/H/W/D}[U];
> - XVMIN[I].{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |  3 ++
>   target/loongarch/insns.decode                | 36 ++++++++++++++++++++
>   target/loongarch/disas.c                     | 34 ++++++++++++++++++
>   target/loongarch/vec_helper.c                | 26 +++++++-------
>   target/loongarch/insn_trans/trans_lasx.c.inc | 36 ++++++++++++++++++++
>   5 files changed, 121 insertions(+), 14 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 17/48] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od}
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (15 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 16/48] target/loongarch: Implement xvmax/xvmin Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 18:23   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 18/48] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
                   ` (30 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVMUL.{B/H/W/D};
- XVMUH.{B/H/W/D}[U];
- XVMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  2 +
 target/loongarch/insns.decode                | 38 +++++++++++++
 target/loongarch/disas.c                     | 38 +++++++++++++
 target/loongarch/vec_helper.c                | 57 ++++++++++---------
 target/loongarch/insn_trans/trans_lasx.c.inc | 42 ++++++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 60 ++++++++++----------
 6 files changed, 180 insertions(+), 57 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index cd6f6a72fd..6fc84c8c5a 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -60,4 +60,6 @@
 #define DO_MIN(a, b)    (a < b ? a : b)
 #define DO_MAX(a, b)    (a > b ? a : b)
 
+#define DO_MUL(a, b)    (a * b)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 99aefcb651..0f9ebe641f 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1473,6 +1473,44 @@ xvmini_hu        0111 01101001 01101 ..... ..... .....    @vv_ui5
 xvmini_wu        0111 01101001 01110 ..... ..... .....    @vv_ui5
 xvmini_du        0111 01101001 01111 ..... ..... .....    @vv_ui5
 
+xvmul_b          0111 01001000 01000 ..... ..... .....    @vvv
+xvmul_h          0111 01001000 01001 ..... ..... .....    @vvv
+xvmul_w          0111 01001000 01010 ..... ..... .....    @vvv
+xvmul_d          0111 01001000 01011 ..... ..... .....    @vvv
+xvmuh_b          0111 01001000 01100 ..... ..... .....    @vvv
+xvmuh_h          0111 01001000 01101 ..... ..... .....    @vvv
+xvmuh_w          0111 01001000 01110 ..... ..... .....    @vvv
+xvmuh_d          0111 01001000 01111 ..... ..... .....    @vvv
+xvmuh_bu         0111 01001000 10000 ..... ..... .....    @vvv
+xvmuh_hu         0111 01001000 10001 ..... ..... .....    @vvv
+xvmuh_wu         0111 01001000 10010 ..... ..... .....    @vvv
+xvmuh_du         0111 01001000 10011 ..... ..... .....    @vvv
+
+xvmulwev_h_b     0111 01001001 00000 ..... ..... .....    @vvv
+xvmulwev_w_h     0111 01001001 00001 ..... ..... .....    @vvv
+xvmulwev_d_w     0111 01001001 00010 ..... ..... .....    @vvv
+xvmulwev_q_d     0111 01001001 00011 ..... ..... .....    @vvv
+xvmulwod_h_b     0111 01001001 00100 ..... ..... .....    @vvv
+xvmulwod_w_h     0111 01001001 00101 ..... ..... .....    @vvv
+xvmulwod_d_w     0111 01001001 00110 ..... ..... .....    @vvv
+xvmulwod_q_d     0111 01001001 00111 ..... ..... .....    @vvv
+xvmulwev_h_bu    0111 01001001 10000 ..... ..... .....    @vvv
+xvmulwev_w_hu    0111 01001001 10001 ..... ..... .....    @vvv
+xvmulwev_d_wu    0111 01001001 10010 ..... ..... .....    @vvv
+xvmulwev_q_du    0111 01001001 10011 ..... ..... .....    @vvv
+xvmulwod_h_bu    0111 01001001 10100 ..... ..... .....    @vvv
+xvmulwod_w_hu    0111 01001001 10101 ..... ..... .....    @vvv
+xvmulwod_d_wu    0111 01001001 10110 ..... ..... .....    @vvv
+xvmulwod_q_du    0111 01001001 10111 ..... ..... .....    @vvv
+xvmulwev_h_bu_b  0111 01001010 00000 ..... ..... .....    @vvv
+xvmulwev_w_hu_h  0111 01001010 00001 ..... ..... .....    @vvv
+xvmulwev_d_wu_w  0111 01001010 00010 ..... ..... .....    @vvv
+xvmulwev_q_du_d  0111 01001010 00011 ..... ..... .....    @vvv
+xvmulwod_h_bu_b  0111 01001010 00100 ..... ..... .....    @vvv
+xvmulwod_w_hu_h  0111 01001010 00101 ..... ..... .....    @vvv
+xvmulwod_d_wu_w  0111 01001010 00110 ..... ..... .....    @vvv
+xvmulwod_q_du_d  0111 01001010 00111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 63c1dc757f..e5f9a6bcdf 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1890,6 +1890,44 @@ INSN_LASX(xvmini_hu,         vv_i)
 INSN_LASX(xvmini_wu,         vv_i)
 INSN_LASX(xvmini_du,         vv_i)
 
+INSN_LASX(xvmul_b,           vvv)
+INSN_LASX(xvmul_h,           vvv)
+INSN_LASX(xvmul_w,           vvv)
+INSN_LASX(xvmul_d,           vvv)
+INSN_LASX(xvmuh_b,           vvv)
+INSN_LASX(xvmuh_h,           vvv)
+INSN_LASX(xvmuh_w,           vvv)
+INSN_LASX(xvmuh_d,           vvv)
+INSN_LASX(xvmuh_bu,          vvv)
+INSN_LASX(xvmuh_hu,          vvv)
+INSN_LASX(xvmuh_wu,          vvv)
+INSN_LASX(xvmuh_du,          vvv)
+
+INSN_LASX(xvmulwev_h_b,      vvv)
+INSN_LASX(xvmulwev_w_h,      vvv)
+INSN_LASX(xvmulwev_d_w,      vvv)
+INSN_LASX(xvmulwev_q_d,      vvv)
+INSN_LASX(xvmulwod_h_b,      vvv)
+INSN_LASX(xvmulwod_w_h,      vvv)
+INSN_LASX(xvmulwod_d_w,      vvv)
+INSN_LASX(xvmulwod_q_d,      vvv)
+INSN_LASX(xvmulwev_h_bu,     vvv)
+INSN_LASX(xvmulwev_w_hu,     vvv)
+INSN_LASX(xvmulwev_d_wu,     vvv)
+INSN_LASX(xvmulwev_q_du,     vvv)
+INSN_LASX(xvmulwod_h_bu,     vvv)
+INSN_LASX(xvmulwod_w_hu,     vvv)
+INSN_LASX(xvmulwod_d_wu,     vvv)
+INSN_LASX(xvmulwod_q_du,     vvv)
+INSN_LASX(xvmulwev_h_bu_b,   vvv)
+INSN_LASX(xvmulwev_w_hu_h,   vvv)
+INSN_LASX(xvmulwev_d_wu_w,   vvv)
+INSN_LASX(xvmulwev_q_du_d,   vvv)
+INSN_LASX(xvmulwod_h_bu_b,   vvv)
+INSN_LASX(xvmulwod_w_hu_h,   vvv)
+INSN_LASX(xvmulwod_d_wu_w,   vvv)
+INSN_LASX(xvmulwod_q_du_d,   vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 0c641d80c7..f641950cbe 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -434,58 +434,59 @@ VMINMAXI(vmaxi_hu, 16, UH, DO_MAX)
 VMINMAXI(vmaxi_wu, 32, UW, DO_MAX)
 VMINMAXI(vmaxi_du, 64, UD, DO_MAX)
 
-#define DO_VMUH(NAME, BIT, E1, E2, DO_OP)                   \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    typedef __typeof(Vd->E1(0)) T;                          \
-                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E2(i) = ((T)Vj->E2(i)) * ((T)Vk->E2(i)) >> BIT; \
-    }                                                       \
+#define DO_VMUH(NAME, BIT, E1, E2, DO_OP)                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    typedef __typeof(Vd->E1(0)) T;                             \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->E2(i) = ((T)Vj->E2(i)) * ((T)Vk->E2(i)) >> BIT;    \
+    }                                                          \
 }
 
-void HELPER(vmuh_d)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vmuh_d)(void *vd, void *vj, void *vk, uint32_t desc)
 {
-    uint64_t l, h1, h2;
+    int i;
+    uint64_t l, h;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    muls64(&l, &h1, Vj->D(0), Vk->D(0));
-    muls64(&l, &h2, Vj->D(1), Vk->D(1));
-
-    Vd->D(0) = h1;
-    Vd->D(1) = h2;
+    for (i = 0; i < oprsz / 8; i++) {
+        muls64(&l, &h, Vj->D(i), Vk->D(i));
+        Vd->D(i) = h;
+    }
 }
 
 DO_VMUH(vmuh_b, 8, H, B, DO_MUH)
 DO_VMUH(vmuh_h, 16, W, H, DO_MUH)
 DO_VMUH(vmuh_w, 32, D, W, DO_MUH)
 
-void HELPER(vmuh_du)(void *vd, void *vj, void *vk, uint32_t v)
+void HELPER(vmuh_du)(void *vd, void *vj, void *vk, uint32_t desc)
 {
-    uint64_t l, h1, h2;
+    int i;
+    uint64_t l, h;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
-    mulu64(&l, &h1, Vj->D(0), Vk->D(0));
-    mulu64(&l, &h2, Vj->D(1), Vk->D(1));
-
-    Vd->D(0) = h1;
-    Vd->D(1) = h2;
+    for (i = 0; i < oprsz / 8; i++) {
+        mulu64(&l, &h, Vj->D(i), Vk->D(i));
+        Vd->D(i) = h;
+    }
 }
 
 DO_VMUH(vmuh_bu, 8, UH, UB, DO_MUH)
 DO_VMUH(vmuh_hu, 16, UW, UH, DO_MUH)
 DO_VMUH(vmuh_wu, 32, UD, UW, DO_MUH)
 
-#define DO_MUL(a, b) (a * b)
-
 DO_EVEN(vmulwev_h_b, 16, H, B, DO_MUL)
 DO_EVEN(vmulwev_w_h, 32, W, H, DO_MUL)
 DO_EVEN(vmulwev_d_w, 64, D, W, DO_MUL)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index ba31da6578..ca9361782e 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -211,6 +211,48 @@ TRANS(xvmaxi_hu, LASX, gvec_vv_i, 32, MO_16, do_vmaxi_u)
 TRANS(xvmaxi_wu, LASX, gvec_vv_i, 32, MO_32, do_vmaxi_u)
 TRANS(xvmaxi_du, LASX, gvec_vv_i, 32, MO_64, do_vmaxi_u)
 
+TRANS(xvmul_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_mul)
+TRANS(xvmul_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_mul)
+TRANS(xvmul_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_mul)
+TRANS(xvmul_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_mul)
+TRANS(xvmuh_b, LASX, gvec_vvv, 32, MO_8, do_vmuh_s)
+TRANS(xvmuh_h, LASX, gvec_vvv, 32, MO_16, do_vmuh_s)
+TRANS(xvmuh_w, LASX, gvec_vvv, 32, MO_32, do_vmuh_s)
+TRANS(xvmuh_d, LASX, gvec_vvv, 32, MO_64, do_vmuh_s)
+TRANS(xvmuh_bu, LASX, gvec_vvv, 32, MO_8, do_vmuh_u)
+TRANS(xvmuh_hu, LASX, gvec_vvv, 32, MO_16, do_vmuh_u)
+TRANS(xvmuh_wu, LASX, gvec_vvv, 32, MO_32, do_vmuh_u)
+TRANS(xvmuh_du, LASX, gvec_vvv, 32, MO_64, do_vmuh_u)
+
+TRANS(xvmulwev_h_b, LASX, gvec_vvv, 32, MO_8, do_vmulwev_s)
+TRANS(xvmulwev_w_h, LASX, gvec_vvv, 32, MO_16, do_vmulwev_s)
+TRANS(xvmulwev_d_w, LASX, gvec_vvv, 32, MO_32, do_vmulwev_s)
+
+TRANS(xvmulwev_q_d, LASX, gen_vmul_q, 32, 0, 0, tcg_gen_muls2_i64)
+TRANS(xvmulwod_q_d, LASX, gen_vmul_q, 32, 1, 1, tcg_gen_muls2_i64)
+TRANS(xvmulwev_q_du, LASX, gen_vmul_q, 32, 0, 0, tcg_gen_mulu2_i64)
+TRANS(xvmulwod_q_du, LASX, gen_vmul_q, 32, 1, 1, tcg_gen_mulu2_i64)
+TRANS(xvmulwev_q_du_d, LASX, gen_vmul_q, 32, 0, 0, tcg_gen_mulus2_i64)
+TRANS(xvmulwod_q_du_d, LASX, gen_vmul_q, 32, 1, 1, tcg_gen_mulus2_i64)
+
+TRANS(xvmulwod_h_b, LASX, gvec_vvv, 32, MO_8, do_vmulwod_s)
+TRANS(xvmulwod_w_h, LASX, gvec_vvv, 32, MO_16, do_vmulwod_s)
+TRANS(xvmulwod_d_w, LASX, gvec_vvv, 32, MO_32, do_vmulwod_s)
+
+TRANS(xvmulwev_h_bu, LASX, gvec_vvv, 32, MO_8, do_vmulwev_u)
+TRANS(xvmulwev_w_hu, LASX, gvec_vvv, 32, MO_16, do_vmulwev_u)
+TRANS(xvmulwev_d_wu, LASX, gvec_vvv, 32, MO_32, do_vmulwev_u)
+TRANS(xvmulwod_h_bu, LASX, gvec_vvv, 32, MO_8, do_vmulwod_u)
+TRANS(xvmulwod_w_hu, LASX, gvec_vvv, 32, MO_16, do_vmulwod_u)
+TRANS(xvmulwod_d_wu, LASX, gvec_vvv, 32, MO_32, do_vmulwod_u)
+
+TRANS(xvmulwev_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vmulwev_u_s)
+TRANS(xvmulwev_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vmulwev_u_s)
+TRANS(xvmulwev_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vmulwev_u_s)
+TRANS(xvmulwod_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vmulwod_u_s)
+TRANS(xvmulwod_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vmulwod_u_s)
+TRANS(xvmulwod_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vmulwod_u_s)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 5653a556bf..d25f89a6a4 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -1764,37 +1764,39 @@ static void tcg_gen_mulus2_i64(TCGv_i64 rl, TCGv_i64 rh,
     tcg_gen_mulsu2_i64(rl, rh, arg2, arg1);
 }
 
-#define VMUL_Q(NAME, FN, idx1, idx2)                      \
-static bool trans_## NAME (DisasContext *ctx, arg_vvv *a) \
-{                                                         \
-    TCGv_i64 rh, rl, arg1, arg2;                          \
-                                                          \
-    if (!avail_LSX(ctx)) {                                \
-        return false;                                     \
-    }                                                     \
-                                                          \
-    rh = tcg_temp_new_i64();                              \
-    rl = tcg_temp_new_i64();                              \
-    arg1 = tcg_temp_new_i64();                            \
-    arg2 = tcg_temp_new_i64();                            \
-                                                          \
-    get_vreg64(arg1, a->vj, idx1);                        \
-    get_vreg64(arg2, a->vk, idx2);                        \
-                                                          \
-    tcg_gen_## FN ##_i64(rl, rh, arg1, arg2);             \
-                                                          \
-    set_vreg64(rh, a->vd, 1);                             \
-    set_vreg64(rl, a->vd, 0);                             \
-                                                          \
-    return true;                                          \
+static bool gen_vmul_q(DisasContext *ctx,
+                       arg_vvv *a, int oprsz, int idx1, int idx2,
+                       void (*func)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
+{
+    TCGv_i64 rh, rl, arg1, arg2;
+    int i;
+
+    CHECK_VEC;
+
+    rh = tcg_temp_new_i64();
+    rl = tcg_temp_new_i64();
+    arg1 = tcg_temp_new_i64();
+    arg2 = tcg_temp_new_i64();
+
+    for (i = 0; i < oprsz / 16; i++) {
+        get_vreg64(arg1, a->vj, 2 * i + idx1);
+        get_vreg64(arg2, a->vk, 2 * i + idx2);
+
+        func(rl, rh, arg1, arg2);
+
+        set_vreg64(rh, a->vd, 2 * i + 1);
+        set_vreg64(rl, a->vd, 2 * i);
+    }
+
+    return true;
 }
 
-VMUL_Q(vmulwev_q_d, muls2, 0, 0)
-VMUL_Q(vmulwod_q_d, muls2, 1, 1)
-VMUL_Q(vmulwev_q_du, mulu2, 0, 0)
-VMUL_Q(vmulwod_q_du, mulu2, 1, 1)
-VMUL_Q(vmulwev_q_du_d, mulus2, 0, 0)
-VMUL_Q(vmulwod_q_du_d, mulus2, 1, 1)
+TRANS(vmulwev_q_d, LSX, gen_vmul_q, 16, 0, 0, tcg_gen_muls2_i64)
+TRANS(vmulwod_q_d, LSX, gen_vmul_q, 16, 1, 1, tcg_gen_muls2_i64)
+TRANS(vmulwev_q_du, LSX, gen_vmul_q, 16, 0, 0, tcg_gen_mulu2_i64)
+TRANS(vmulwod_q_du, LSX, gen_vmul_q, 16, 1, 1, tcg_gen_mulu2_i64)
+TRANS(vmulwev_q_du_d, LSX, gen_vmul_q, 16, 0, 0, tcg_gen_mulus2_i64)
+TRANS(vmulwod_q_du_d, LSX, gen_vmul_q, 16, 1, 1, tcg_gen_mulus2_i64)
 
 static void gen_vmulwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 17/48] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od}
  2023-08-30  8:48 ` [PATCH v4 17/48] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
@ 2023-08-30 18:23   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 18:23 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVMUL.{B/H/W/D};
> - XVMUH.{B/H/W/D}[U];
> - XVMULW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
> - XVMULW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |  2 +
>   target/loongarch/insns.decode                | 38 +++++++++++++
>   target/loongarch/disas.c                     | 38 +++++++++++++
>   target/loongarch/vec_helper.c                | 57 ++++++++++---------
>   target/loongarch/insn_trans/trans_lasx.c.inc | 42 ++++++++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc  | 60 ++++++++++----------
>   6 files changed, 180 insertions(+), 57 deletions(-)
> 
> diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
> index cd6f6a72fd..6fc84c8c5a 100644
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -60,4 +60,6 @@
>   #define DO_MIN(a, b)    (a < b ? a : b)
>   #define DO_MAX(a, b)    (a > b ? a : b)
>   
> +#define DO_MUL(a, b)    (a * b)
> +

No need to move this.

Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 18/48] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (16 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 17/48] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 21:05   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 19/48] target/loongarch; Implement xvdiv/xvmod Song Gao
                   ` (29 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVMADD.{B/H/W/D};
- XVMSUB.{B/H/W/D};
- XVMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
- XVMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |   3 +
 target/loongarch/insns.decode                |  34 ++++++
 target/loongarch/disas.c                     |  34 ++++++
 target/loongarch/vec_helper.c                | 113 ++++++++++---------
 target/loongarch/insn_trans/trans_lasx.c.inc |  38 +++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  |  72 ++++++------
 6 files changed, 203 insertions(+), 91 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 6fc84c8c5a..06c8d7e314 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -62,4 +62,7 @@
 
 #define DO_MUL(a, b)    (a * b)
 
+#define DO_MADD(a, b, c)  (a + b * c)
+#define DO_MSUB(a, b, c)  (a - b * c)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0f9ebe641f..d6fb51ae64 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1511,6 +1511,40 @@ xvmulwod_w_hu_h  0111 01001010 00101 ..... ..... .....    @vvv
 xvmulwod_d_wu_w  0111 01001010 00110 ..... ..... .....    @vvv
 xvmulwod_q_du_d  0111 01001010 00111 ..... ..... .....    @vvv
 
+xvmadd_b         0111 01001010 10000 ..... ..... .....    @vvv
+xvmadd_h         0111 01001010 10001 ..... ..... .....    @vvv
+xvmadd_w         0111 01001010 10010 ..... ..... .....    @vvv
+xvmadd_d         0111 01001010 10011 ..... ..... .....    @vvv
+xvmsub_b         0111 01001010 10100 ..... ..... .....    @vvv
+xvmsub_h         0111 01001010 10101 ..... ..... .....    @vvv
+xvmsub_w         0111 01001010 10110 ..... ..... .....    @vvv
+xvmsub_d         0111 01001010 10111 ..... ..... .....    @vvv
+
+xvmaddwev_h_b    0111 01001010 11000 ..... ..... .....    @vvv
+xvmaddwev_w_h    0111 01001010 11001 ..... ..... .....    @vvv
+xvmaddwev_d_w    0111 01001010 11010 ..... ..... .....    @vvv
+xvmaddwev_q_d    0111 01001010 11011 ..... ..... .....    @vvv
+xvmaddwod_h_b    0111 01001010 11100 ..... ..... .....    @vvv
+xvmaddwod_w_h    0111 01001010 11101 ..... ..... .....    @vvv
+xvmaddwod_d_w    0111 01001010 11110 ..... ..... .....    @vvv
+xvmaddwod_q_d    0111 01001010 11111 ..... ..... .....    @vvv
+xvmaddwev_h_bu   0111 01001011 01000 ..... ..... .....    @vvv
+xvmaddwev_w_hu   0111 01001011 01001 ..... ..... .....    @vvv
+xvmaddwev_d_wu   0111 01001011 01010 ..... ..... .....    @vvv
+xvmaddwev_q_du   0111 01001011 01011 ..... ..... .....    @vvv
+xvmaddwod_h_bu   0111 01001011 01100 ..... ..... .....    @vvv
+xvmaddwod_w_hu   0111 01001011 01101 ..... ..... .....    @vvv
+xvmaddwod_d_wu   0111 01001011 01110 ..... ..... .....    @vvv
+xvmaddwod_q_du   0111 01001011 01111 ..... ..... .....    @vvv
+xvmaddwev_h_bu_b 0111 01001011 11000 ..... ..... .....    @vvv
+xvmaddwev_w_hu_h 0111 01001011 11001 ..... ..... .....    @vvv
+xvmaddwev_d_wu_w 0111 01001011 11010 ..... ..... .....    @vvv
+xvmaddwev_q_du_d 0111 01001011 11011 ..... ..... .....    @vvv
+xvmaddwod_h_bu_b 0111 01001011 11100 ..... ..... .....    @vvv
+xvmaddwod_w_hu_h 0111 01001011 11101 ..... ..... .....    @vvv
+xvmaddwod_d_wu_w 0111 01001011 11110 ..... ..... .....    @vvv
+xvmaddwod_q_du_d 0111 01001011 11111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e5f9a6bcdf..b115fe8315 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1928,6 +1928,40 @@ INSN_LASX(xvmulwod_w_hu_h,   vvv)
 INSN_LASX(xvmulwod_d_wu_w,   vvv)
 INSN_LASX(xvmulwod_q_du_d,   vvv)
 
+INSN_LASX(xvmadd_b,          vvv)
+INSN_LASX(xvmadd_h,          vvv)
+INSN_LASX(xvmadd_w,          vvv)
+INSN_LASX(xvmadd_d,          vvv)
+INSN_LASX(xvmsub_b,          vvv)
+INSN_LASX(xvmsub_h,          vvv)
+INSN_LASX(xvmsub_w,          vvv)
+INSN_LASX(xvmsub_d,          vvv)
+
+INSN_LASX(xvmaddwev_h_b,     vvv)
+INSN_LASX(xvmaddwev_w_h,     vvv)
+INSN_LASX(xvmaddwev_d_w,     vvv)
+INSN_LASX(xvmaddwev_q_d,     vvv)
+INSN_LASX(xvmaddwod_h_b,     vvv)
+INSN_LASX(xvmaddwod_w_h,     vvv)
+INSN_LASX(xvmaddwod_d_w,     vvv)
+INSN_LASX(xvmaddwod_q_d,     vvv)
+INSN_LASX(xvmaddwev_h_bu,    vvv)
+INSN_LASX(xvmaddwev_w_hu,    vvv)
+INSN_LASX(xvmaddwev_d_wu,    vvv)
+INSN_LASX(xvmaddwev_q_du,    vvv)
+INSN_LASX(xvmaddwod_h_bu,    vvv)
+INSN_LASX(xvmaddwod_w_hu,    vvv)
+INSN_LASX(xvmaddwod_d_wu,    vvv)
+INSN_LASX(xvmaddwod_q_du,    vvv)
+INSN_LASX(xvmaddwev_h_bu_b,  vvv)
+INSN_LASX(xvmaddwev_w_hu_h,  vvv)
+INSN_LASX(xvmaddwev_d_wu_w,  vvv)
+INSN_LASX(xvmaddwev_q_du_d,  vvv)
+INSN_LASX(xvmaddwod_h_bu_b,  vvv)
+INSN_LASX(xvmaddwod_w_hu_h,  vvv)
+INSN_LASX(xvmaddwod_d_wu_w,  vvv)
+INSN_LASX(xvmaddwod_q_du_d,  vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index f641950cbe..5a1bff8b04 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -511,19 +511,18 @@ DO_ODD_U_S(vmulwod_h_bu_b, 16, H, UH, B, UB, DO_MUL)
 DO_ODD_U_S(vmulwod_w_hu_h, 32, W, UW, H, UH, DO_MUL)
 DO_ODD_U_S(vmulwod_d_wu_w, 64, D, UD, W, UW, DO_MUL)
 
-#define DO_MADD(a, b, c)  (a + b * c)
-#define DO_MSUB(a, b, c)  (a - b * c)
-
-#define VMADDSUB(NAME, BIT, E, DO_OP)                       \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E(i) = DO_OP(Vd->E(i), Vj->E(i) ,Vk->E(i));     \
-    }                                                       \
+#define VMADDSUB(NAME, BIT, E, DO_OP)                          \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->E(i) = DO_OP(Vd->E(i), Vj->E(i) ,Vk->E(i));        \
+    }                                                          \
 }
 
 VMADDSUB(vmadd_b, 8, B, DO_MADD)
@@ -536,15 +535,16 @@ VMADDSUB(vmsub_w, 32, W, DO_MSUB)
 VMADDSUB(vmsub_d, 64, D, DO_MSUB)
 
 #define VMADDWEV(NAME, BIT, E1, E2, DO_OP)                        \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v)       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)    \
 {                                                                 \
     int i;                                                        \
     VReg *Vd = (VReg *)vd;                                        \
     VReg *Vj = (VReg *)vj;                                        \
     VReg *Vk = (VReg *)vk;                                        \
     typedef __typeof(Vd->E1(0)) TD;                               \
+    int oprsz = simd_oprsz(desc);                                 \
                                                                   \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                           \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                     \
         Vd->E1(i) += DO_OP((TD)Vj->E2(2 * i), (TD)Vk->E2(2 * i)); \
     }                                                             \
 }
@@ -556,19 +556,20 @@ VMADDWEV(vmaddwev_h_bu, 16, UH, UB, DO_MUL)
 VMADDWEV(vmaddwev_w_hu, 32, UW, UH, DO_MUL)
 VMADDWEV(vmaddwev_d_wu, 64, UD, UW, DO_MUL)
 
-#define VMADDWOD(NAME, BIT, E1, E2, DO_OP)                  \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    typedef __typeof(Vd->E1(0)) TD;                         \
-                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E1(i) += DO_OP((TD)Vj->E2(2 * i + 1),           \
-                           (TD)Vk->E2(2 * i + 1));          \
-    }                                                       \
+#define VMADDWOD(NAME, BIT, E1, E2, DO_OP)                     \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    typedef __typeof(Vd->E1(0)) TD;                            \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->E1(i) += DO_OP((TD)Vj->E2(2 * i + 1),              \
+                           (TD)Vk->E2(2 * i + 1));             \
+    }                                                          \
 }
 
 VMADDWOD(vmaddwod_h_b, 16, H, B, DO_MUL)
@@ -578,40 +579,42 @@ VMADDWOD(vmaddwod_h_bu, 16,  UH, UB, DO_MUL)
 VMADDWOD(vmaddwod_w_hu, 32,  UW, UH, DO_MUL)
 VMADDWOD(vmaddwod_d_wu, 64,  UD, UW, DO_MUL)
 
-#define VMADDWEV_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)  \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    typedef __typeof(Vd->ES1(0)) TS1;                       \
-    typedef __typeof(Vd->EU1(0)) TU1;                       \
-                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->ES1(i) += DO_OP((TU1)Vj->EU2(2 * i),            \
-                            (TS1)Vk->ES2(2 * i));           \
-    }                                                       \
+#define VMADDWEV_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)     \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    typedef __typeof(Vd->ES1(0)) TS1;                          \
+    typedef __typeof(Vd->EU1(0)) TU1;                          \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->ES1(i) += DO_OP((TU1)Vj->EU2(2 * i),               \
+                            (TS1)Vk->ES2(2 * i));              \
+    }                                                          \
 }
 
 VMADDWEV_U_S(vmaddwev_h_bu_b, 16, H, UH, B, UB, DO_MUL)
 VMADDWEV_U_S(vmaddwev_w_hu_h, 32, W, UW, H, UH, DO_MUL)
 VMADDWEV_U_S(vmaddwev_d_wu_w, 64, D, UD, W, UW, DO_MUL)
 
-#define VMADDWOD_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)  \
+#define VMADDWOD_U_S(NAME, BIT, ES1, EU1, ES2, EU2, DO_OP)     \
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-    typedef __typeof(Vd->ES1(0)) TS1;                       \
-    typedef __typeof(Vd->EU1(0)) TU1;                       \
-                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->ES1(i) += DO_OP((TU1)Vj->EU2(2 * i + 1),        \
-                            (TS1)Vk->ES2(2 * i + 1));       \
-    }                                                       \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
+    typedef __typeof(Vd->ES1(0)) TS1;                          \
+    typedef __typeof(Vd->EU1(0)) TU1;                          \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->ES1(i) += DO_OP((TU1)Vj->EU2(2 * i + 1),           \
+                            (TS1)Vk->ES2(2 * i + 1));          \
+    }                                                          \
 }
 
 VMADDWOD_U_S(vmaddwod_h_bu_b, 16, H, UH, B, UB, DO_MUL)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index ca9361782e..1073118417 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -253,6 +253,44 @@ TRANS(xvmulwod_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vmulwod_u_s)
 TRANS(xvmulwod_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vmulwod_u_s)
 TRANS(xvmulwod_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vmulwod_u_s)
 
+TRANS(xvmadd_b, LASX, gvec_vvv, 32, MO_8, do_vmadd)
+TRANS(xvmadd_h, LASX, gvec_vvv, 32, MO_16, do_vmadd)
+TRANS(xvmadd_w, LASX, gvec_vvv, 32, MO_32, do_vmadd)
+TRANS(xvmadd_d, LASX, gvec_vvv, 32, MO_64, do_vmadd)
+TRANS(xvmsub_b, LASX, gvec_vvv, 32, MO_8, do_vmsub)
+TRANS(xvmsub_h, LASX, gvec_vvv, 32, MO_16, do_vmsub)
+TRANS(xvmsub_w, LASX, gvec_vvv, 32, MO_32, do_vmsub)
+TRANS(xvmsub_d, LASX, gvec_vvv, 32, MO_64, do_vmsub)
+
+TRANS(xvmaddwev_h_b, LASX, gvec_vvv, 32, MO_8, do_vmaddwev_s)
+TRANS(xvmaddwev_w_h, LASX, gvec_vvv, 32, MO_16, do_vmaddwev_s)
+TRANS(xvmaddwev_d_w, LASX, gvec_vvv, 32, MO_32, do_vmaddwev_s)
+
+TRANS(xvmaddwev_q_d, LASX, gen_vmadd_q, 32, 0, 0, tcg_gen_muls2_i64)
+TRANS(xvmaddwod_q_d, LASX, gen_vmadd_q, 32, 1, 1, tcg_gen_muls2_i64)
+TRANS(xvmaddwev_q_du, LASX, gen_vmadd_q, 32, 0, 0, tcg_gen_mulu2_i64)
+TRANS(xvmaddwod_q_du, LASX, gen_vmadd_q, 32, 1, 1, tcg_gen_mulu2_i64)
+TRANS(xvmaddwev_q_du_d, LASX, gen_vmadd_q, 32, 0, 0, tcg_gen_mulus2_i64)
+TRANS(xvmaddwod_q_du_d, LASX, gen_vmadd_q, 32, 1, 1, tcg_gen_mulus2_i64)
+
+TRANS(xvmaddwod_h_b, LASX, gvec_vvv, 32, MO_8, do_vmaddwod_s)
+TRANS(xvmaddwod_w_h, LASX, gvec_vvv, 32, MO_16, do_vmaddwod_s)
+TRANS(xvmaddwod_d_w, LASX, gvec_vvv, 32, MO_32, do_vmaddwod_s)
+
+TRANS(xvmaddwev_h_bu, LASX, gvec_vvv, 32, MO_8, do_vmaddwev_u)
+TRANS(xvmaddwev_w_hu, LASX, gvec_vvv, 32, MO_16, do_vmaddwev_u)
+TRANS(xvmaddwev_d_wu, LASX, gvec_vvv, 32, MO_32, do_vmaddwev_u)
+TRANS(xvmaddwod_h_bu, LASX, gvec_vvv, 32, MO_8, do_vmaddwod_u)
+TRANS(xvmaddwod_w_hu, LASX, gvec_vvv, 32, MO_16, do_vmaddwod_u)
+TRANS(xvmaddwod_d_wu, LASX, gvec_vvv, 32, MO_32, do_vmaddwod_u)
+
+TRANS(xvmaddwev_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vmaddwev_u_s)
+TRANS(xvmaddwev_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vmaddwev_u_s)
+TRANS(xvmaddwev_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vmaddwev_u_s)
+TRANS(xvmaddwod_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vmaddwod_u_s)
+TRANS(xvmaddwod_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vmaddwod_u_s)
+TRANS(xvmaddwod_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vmaddwod_u_s)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index d25f89a6a4..7e77686bfc 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -2371,42 +2371,42 @@ TRANS(vmaddwev_h_b, LSX, gvec_vvv, 16, MO_8, do_vmaddwev_s)
 TRANS(vmaddwev_w_h, LSX, gvec_vvv, 16, MO_16, do_vmaddwev_s)
 TRANS(vmaddwev_d_w, LSX, gvec_vvv, 16, MO_32, do_vmaddwev_s)
 
-#define VMADD_Q(NAME, FN, idx1, idx2)                     \
-static bool trans_## NAME (DisasContext *ctx, arg_vvv *a) \
-{                                                         \
-    TCGv_i64 rh, rl, arg1, arg2, th, tl;                  \
-                                                          \
-    if (!avail_LSX(ctx)) {                                \
-        return false;                                     \
-    }                                                     \
-                                                          \
-    rh = tcg_temp_new_i64();                              \
-    rl = tcg_temp_new_i64();                              \
-    arg1 = tcg_temp_new_i64();                            \
-    arg2 = tcg_temp_new_i64();                            \
-    th = tcg_temp_new_i64();                              \
-    tl = tcg_temp_new_i64();                              \
-                                                          \
-    get_vreg64(arg1, a->vj, idx1);                        \
-    get_vreg64(arg2, a->vk, idx2);                        \
-    get_vreg64(rh, a->vd, 1);                             \
-    get_vreg64(rl, a->vd, 0);                             \
-                                                          \
-    tcg_gen_## FN ##_i64(tl, th, arg1, arg2);             \
-    tcg_gen_add2_i64(rl, rh, rl, rh, tl, th);             \
-                                                          \
-    set_vreg64(rh, a->vd, 1);                             \
-    set_vreg64(rl, a->vd, 0);                             \
-                                                          \
-    return true;                                          \
-}
-
-VMADD_Q(vmaddwev_q_d, muls2, 0, 0)
-VMADD_Q(vmaddwod_q_d, muls2, 1, 1)
-VMADD_Q(vmaddwev_q_du, mulu2, 0, 0)
-VMADD_Q(vmaddwod_q_du, mulu2, 1, 1)
-VMADD_Q(vmaddwev_q_du_d, mulus2, 0, 0)
-VMADD_Q(vmaddwod_q_du_d, mulus2, 1, 1)
+static bool gen_vmadd_q(DisasContext * ctx,
+                        arg_vvv *a, int oprsz, int idx1, int idx2,
+                        void (*func)(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_i64))
+{
+    TCGv_i64 rh, rl, arg1, arg2, th, tl;
+    int i;
+
+    rh = tcg_temp_new_i64();
+    rl = tcg_temp_new_i64();
+    arg1 = tcg_temp_new_i64();
+    arg2 = tcg_temp_new_i64();
+    th = tcg_temp_new_i64();
+    tl = tcg_temp_new_i64();
+
+    for (i = 0; i < oprsz / 16; i++) {
+        get_vreg64(arg1, a->vj, 2 * i + idx1);
+        get_vreg64(arg2, a->vk, 2 * i + idx2);
+        get_vreg64(rh, a->vd, 2 * i + 1);
+        get_vreg64(rl, a->vd, 2 * i);
+
+        func(tl, th, arg1, arg2);
+        tcg_gen_add2_i64(rl, rh, rl, rh, tl, th);
+
+        set_vreg64(rh, a->vd, 2 * i + 1);
+        set_vreg64(rl, a->vd, 2 * i);
+    }
+
+    return true;
+}
+
+TRANS(vmaddwev_q_d, LSX, gen_vmadd_q, 16, 0, 0, tcg_gen_muls2_i64)
+TRANS(vmaddwod_q_d, LSX, gen_vmadd_q, 16, 1, 1, tcg_gen_muls2_i64)
+TRANS(vmaddwev_q_du, LSX, gen_vmadd_q, 16, 0, 0, tcg_gen_mulu2_i64)
+TRANS(vmaddwod_q_du, LSX, gen_vmadd_q, 16, 1, 1, tcg_gen_mulu2_i64)
+TRANS(vmaddwev_q_du_d, LSX, gen_vmadd_q, 16, 0, 0, tcg_gen_mulus2_i64)
+TRANS(vmaddwod_q_du_d, LSX, gen_vmadd_q, 16, 1, 1, tcg_gen_mulus2_i64)
 
 static void gen_vmaddwod_s(unsigned vece, TCGv_vec t, TCGv_vec a, TCGv_vec b)
 {
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 18/48] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od}
  2023-08-30  8:48 ` [PATCH v4 18/48] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
@ 2023-08-30 21:05   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 21:05 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVMADD.{B/H/W/D};
> - XVMSUB.{B/H/W/D};
> - XVMADDW{EV/OD}.{H.B/W.H/D.W/Q.D}[U];
> - XVMADDW{EV/OD}.{H.BU.B/W.HU.H/D.WU.W/Q.DU.D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |   3 +
>   target/loongarch/insns.decode                |  34 ++++++
>   target/loongarch/disas.c                     |  34 ++++++
>   target/loongarch/vec_helper.c                | 113 ++++++++++---------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  38 +++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc  |  72 ++++++------
>   6 files changed, 203 insertions(+), 91 deletions(-)
> 
> diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
> index 6fc84c8c5a..06c8d7e314 100644
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -62,4 +62,7 @@
>   
>   #define DO_MUL(a, b)    (a * b)
>   
> +#define DO_MADD(a, b, c)  (a + b * c)
> +#define DO_MSUB(a, b, c)  (a - b * c)
> +

Aside from this movement,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 19/48] target/loongarch; Implement xvdiv/xvmod
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (17 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 18/48] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:14   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 20/48] target/loongarch: Implement xvsat Song Gao
                   ` (28 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVDIV.{B/H/W/D}[U];
- XVMOD.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  7 +++++++
 target/loongarch/insns.decode                | 17 +++++++++++++++++
 target/loongarch/disas.c                     | 17 +++++++++++++++++
 target/loongarch/vec_helper.c                | 10 ++--------
 target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++++++++++
 5 files changed, 60 insertions(+), 8 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 06c8d7e314..ee50d53f4e 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -65,4 +65,11 @@
 #define DO_MADD(a, b, c)  (a + b * c)
 #define DO_MSUB(a, b, c)  (a - b * c)
 
+#define DO_DIVU(N, M) (unlikely(M == 0) ? 0 : N / M)
+#define DO_REMU(N, M) (unlikely(M == 0) ? 0 : N % M)
+#define DO_DIV(N, M)  (unlikely(M == 0) ? 0 :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
+#define DO_REM(N, M)  (unlikely(M == 0) ? 0 :\
+        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d6fb51ae64..fa25c876b4 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1545,6 +1545,23 @@ xvmaddwod_w_hu_h 0111 01001011 11101 ..... ..... .....    @vvv
 xvmaddwod_d_wu_w 0111 01001011 11110 ..... ..... .....    @vvv
 xvmaddwod_q_du_d 0111 01001011 11111 ..... ..... .....    @vvv
 
+xvdiv_b          0111 01001110 00000 ..... ..... .....    @vvv
+xvdiv_h          0111 01001110 00001 ..... ..... .....    @vvv
+xvdiv_w          0111 01001110 00010 ..... ..... .....    @vvv
+xvdiv_d          0111 01001110 00011 ..... ..... .....    @vvv
+xvmod_b          0111 01001110 00100 ..... ..... .....    @vvv
+xvmod_h          0111 01001110 00101 ..... ..... .....    @vvv
+xvmod_w          0111 01001110 00110 ..... ..... .....    @vvv
+xvmod_d          0111 01001110 00111 ..... ..... .....    @vvv
+xvdiv_bu         0111 01001110 01000 ..... ..... .....    @vvv
+xvdiv_hu         0111 01001110 01001 ..... ..... .....    @vvv
+xvdiv_wu         0111 01001110 01010 ..... ..... .....    @vvv
+xvdiv_du         0111 01001110 01011 ..... ..... .....    @vvv
+xvmod_bu         0111 01001110 01100 ..... ..... .....    @vvv
+xvmod_hu         0111 01001110 01101 ..... ..... .....    @vvv
+xvmod_wu         0111 01001110 01110 ..... ..... .....    @vvv
+xvmod_du         0111 01001110 01111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index b115fe8315..72df9f0b08 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1962,6 +1962,23 @@ INSN_LASX(xvmaddwod_w_hu_h,  vvv)
 INSN_LASX(xvmaddwod_d_wu_w,  vvv)
 INSN_LASX(xvmaddwod_q_du_d,  vvv)
 
+INSN_LASX(xvdiv_b,           vvv)
+INSN_LASX(xvdiv_h,           vvv)
+INSN_LASX(xvdiv_w,           vvv)
+INSN_LASX(xvdiv_d,           vvv)
+INSN_LASX(xvdiv_bu,          vvv)
+INSN_LASX(xvdiv_hu,          vvv)
+INSN_LASX(xvdiv_wu,          vvv)
+INSN_LASX(xvdiv_du,          vvv)
+INSN_LASX(xvmod_b,           vvv)
+INSN_LASX(xvmod_h,           vvv)
+INSN_LASX(xvmod_w,           vvv)
+INSN_LASX(xvmod_d,           vvv)
+INSN_LASX(xvmod_bu,          vvv)
+INSN_LASX(xvmod_hu,          vvv)
+INSN_LASX(xvmod_wu,          vvv)
+INSN_LASX(xvmod_du,          vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 5a1bff8b04..d217d76ea7 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -621,13 +621,6 @@ VMADDWOD_U_S(vmaddwod_h_bu_b, 16, H, UH, B, UB, DO_MUL)
 VMADDWOD_U_S(vmaddwod_w_hu_h, 32, W, UW, H, UH, DO_MUL)
 VMADDWOD_U_S(vmaddwod_d_wu_w, 64, D, UD, W, UW, DO_MUL)
 
-#define DO_DIVU(N, M) (unlikely(M == 0) ? 0 : N / M)
-#define DO_REMU(N, M) (unlikely(M == 0) ? 0 : N % M)
-#define DO_DIV(N, M)  (unlikely(M == 0) ? 0 :\
-        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
-#define DO_REM(N, M)  (unlikely(M == 0) ? 0 :\
-        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
-
 #define VDIV(NAME, BIT, E, DO_OP)                              \
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                              \
@@ -635,8 +628,9 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
     VReg *Vd = (VReg *)vd;                                     \
     VReg *Vj = (VReg *)vj;                                     \
     VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
         Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i));                  \
     }                                                          \
 }
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 1073118417..fff6ddd3e0 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -291,6 +291,23 @@ TRANS(xvmaddwod_h_bu_b, LASX, gvec_vvv, 32, MO_8, do_vmaddwod_u_s)
 TRANS(xvmaddwod_w_hu_h, LASX, gvec_vvv, 32, MO_16, do_vmaddwod_u_s)
 TRANS(xvmaddwod_d_wu_w, LASX, gvec_vvv, 32, MO_32, do_vmaddwod_u_s)
 
+TRANS(xvdiv_b, LASX, gen_vvv, 32, gen_helper_vdiv_b)
+TRANS(xvdiv_h, LASX, gen_vvv, 32, gen_helper_vdiv_h)
+TRANS(xvdiv_w, LASX, gen_vvv, 32, gen_helper_vdiv_w)
+TRANS(xvdiv_d, LASX, gen_vvv, 32, gen_helper_vdiv_d)
+TRANS(xvdiv_bu, LASX, gen_vvv, 32, gen_helper_vdiv_bu)
+TRANS(xvdiv_hu, LASX, gen_vvv, 32, gen_helper_vdiv_hu)
+TRANS(xvdiv_wu, LASX, gen_vvv, 32, gen_helper_vdiv_wu)
+TRANS(xvdiv_du, LASX, gen_vvv, 32, gen_helper_vdiv_du)
+TRANS(xvmod_b, LASX, gen_vvv, 32, gen_helper_vmod_b)
+TRANS(xvmod_h, LASX, gen_vvv, 32, gen_helper_vmod_h)
+TRANS(xvmod_w, LASX, gen_vvv, 32, gen_helper_vmod_w)
+TRANS(xvmod_d, LASX, gen_vvv, 32, gen_helper_vmod_d)
+TRANS(xvmod_bu, LASX, gen_vvv, 32, gen_helper_vmod_bu)
+TRANS(xvmod_hu, LASX, gen_vvv, 32, gen_helper_vmod_hu)
+TRANS(xvmod_wu, LASX, gen_vvv, 32, gen_helper_vmod_wu)
+TRANS(xvmod_du, LASX, gen_vvv, 32, gen_helper_vmod_du)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 19/48] target/loongarch; Implement xvdiv/xvmod
  2023-08-30  8:48 ` [PATCH v4 19/48] target/loongarch; Implement xvdiv/xvmod Song Gao
@ 2023-08-30 22:14   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:14 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVDIV.{B/H/W/D}[U];
> - XVMOD.{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |  7 +++++++
>   target/loongarch/insns.decode                | 17 +++++++++++++++++
>   target/loongarch/disas.c                     | 17 +++++++++++++++++
>   target/loongarch/vec_helper.c                | 10 ++--------
>   target/loongarch/insn_trans/trans_lasx.c.inc | 17 +++++++++++++++++
>   5 files changed, 60 insertions(+), 8 deletions(-)
> 
> diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
> index 06c8d7e314..ee50d53f4e 100644
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -65,4 +65,11 @@
>   #define DO_MADD(a, b, c)  (a + b * c)
>   #define DO_MSUB(a, b, c)  (a - b * c)
>   
> +#define DO_DIVU(N, M) (unlikely(M == 0) ? 0 : N / M)
> +#define DO_REMU(N, M) (unlikely(M == 0) ? 0 : N % M)
> +#define DO_DIV(N, M)  (unlikely(M == 0) ? 0 :\
> +        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? N : N / M)
> +#define DO_REM(N, M)  (unlikely(M == 0) ? 0 :\
> +        unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
> +
>   #endif /* LOONGARCH_VEC_H */


Aside from this movement,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 20/48] target/loongarch: Implement xvsat
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (18 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 19/48] target/loongarch; Implement xvdiv/xvmod Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:19   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 21/48] target/loongarch: Implement xvexth Song Gao
                   ` (27 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSAT.{B/H/W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  9 ++++
 target/loongarch/disas.c                     |  9 ++++
 target/loongarch/vec_helper.c                | 48 ++++++++++----------
 target/loongarch/insn_trans/trans_lasx.c.inc |  9 ++++
 4 files changed, 52 insertions(+), 23 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index fa25c876b4..e366cf7615 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1562,6 +1562,15 @@ xvmod_hu         0111 01001110 01101 ..... ..... .....    @vvv
 xvmod_wu         0111 01001110 01110 ..... ..... .....    @vvv
 xvmod_du         0111 01001110 01111 ..... ..... .....    @vvv
 
+xvsat_b          0111 01110010 01000 01 ... ..... .....   @vv_ui3
+xvsat_h          0111 01110010 01000 1 .... ..... .....   @vv_ui4
+xvsat_w          0111 01110010 01001 ..... ..... .....    @vv_ui5
+xvsat_d          0111 01110010 0101 ...... ..... .....    @vv_ui6
+xvsat_bu         0111 01110010 10000 01 ... ..... .....   @vv_ui3
+xvsat_hu         0111 01110010 10000 1 .... ..... .....   @vv_ui4
+xvsat_wu         0111 01110010 10001 ..... ..... .....    @vv_ui5
+xvsat_du         0111 01110010 1001 ...... ..... .....    @vv_ui6
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 72df9f0b08..09e5981fc3 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1979,6 +1979,15 @@ INSN_LASX(xvmod_hu,          vvv)
 INSN_LASX(xvmod_wu,          vvv)
 INSN_LASX(xvmod_du,          vvv)
 
+INSN_LASX(xvsat_b,           vv_i)
+INSN_LASX(xvsat_h,           vv_i)
+INSN_LASX(xvsat_w,           vv_i)
+INSN_LASX(xvsat_d,           vv_i)
+INSN_LASX(xvsat_bu,          vv_i)
+INSN_LASX(xvsat_hu,          vv_i)
+INSN_LASX(xvsat_wu,          vv_i)
+INSN_LASX(xvsat_du,          vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index d217d76ea7..44daf5ee9a 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -652,18 +652,19 @@ VDIV(vmod_hu, 16, UH, DO_REMU)
 VDIV(vmod_wu, 32, UW, DO_REMU)
 VDIV(vmod_du, 64, UD, DO_REMU)
 
-#define VSAT_S(NAME, BIT, E)                                    \
-void HELPER(NAME)(void *vd, void *vj, uint64_t max, uint32_t v) \
-{                                                               \
-    int i;                                                      \
-    VReg *Vd = (VReg *)vd;                                      \
-    VReg *Vj = (VReg *)vj;                                      \
-    typedef __typeof(Vd->E(0)) TD;                              \
-                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
-        Vd->E(i) = Vj->E(i) > (TD)max ? (TD)max :               \
-                   Vj->E(i) < (TD)~max ? (TD)~max: Vj->E(i);    \
-    }                                                           \
+#define VSAT_S(NAME, BIT, E)                                       \
+void HELPER(NAME)(void *vd, void *vj, uint64_t max, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    typedef __typeof(Vd->E(0)) TD;                                 \
+    int oprsz = simd_oprsz(desc);                                  \
+                                                                   \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
+        Vd->E(i) = Vj->E(i) > (TD)max ? (TD)max :                  \
+                   Vj->E(i) < (TD)~max ? (TD)~max: Vj->E(i);       \
+    }                                                              \
 }
 
 VSAT_S(vsat_b, 8, B)
@@ -671,17 +672,18 @@ VSAT_S(vsat_h, 16, H)
 VSAT_S(vsat_w, 32, W)
 VSAT_S(vsat_d, 64, D)
 
-#define VSAT_U(NAME, BIT, E)                                    \
-void HELPER(NAME)(void *vd, void *vj, uint64_t max, uint32_t v) \
-{                                                               \
-    int i;                                                      \
-    VReg *Vd = (VReg *)vd;                                      \
-    VReg *Vj = (VReg *)vj;                                      \
-    typedef __typeof(Vd->E(0)) TD;                              \
-                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
-        Vd->E(i) = Vj->E(i) > (TD)max ? (TD)max : Vj->E(i);     \
-    }                                                           \
+#define VSAT_U(NAME, BIT, E)                                       \
+void HELPER(NAME)(void *vd, void *vj, uint64_t max, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    typedef __typeof(Vd->E(0)) TD;                                 \
+    int oprsz = simd_oprsz(desc);                                  \
+                                                                   \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
+        Vd->E(i) = Vj->E(i) > (TD)max ? (TD)max : Vj->E(i);        \
+    }                                                              \
 }
 
 VSAT_U(vsat_bu, 8, UB)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index fff6ddd3e0..093cf2a1fa 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -308,6 +308,15 @@ TRANS(xvmod_hu, LASX, gen_vvv, 32, gen_helper_vmod_hu)
 TRANS(xvmod_wu, LASX, gen_vvv, 32, gen_helper_vmod_wu)
 TRANS(xvmod_du, LASX, gen_vvv, 32, gen_helper_vmod_du)
 
+TRANS(xvsat_b, LASX, gvec_vv_i, 32, MO_8, do_vsat_s)
+TRANS(xvsat_h, LASX, gvec_vv_i, 32, MO_16, do_vsat_s)
+TRANS(xvsat_w, LASX, gvec_vv_i, 32, MO_32, do_vsat_s)
+TRANS(xvsat_d, LASX, gvec_vv_i, 32, MO_64, do_vsat_s)
+TRANS(xvsat_bu, LASX, gvec_vv_i, 32, MO_8, do_vsat_u)
+TRANS(xvsat_hu, LASX, gvec_vv_i, 32, MO_16, do_vsat_u)
+TRANS(xvsat_wu, LASX, gvec_vv_i, 32, MO_32, do_vsat_u)
+TRANS(xvsat_du, LASX, gvec_vv_i, 32, MO_64, do_vsat_u)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 20/48] target/loongarch: Implement xvsat
  2023-08-30  8:48 ` [PATCH v4 20/48] target/loongarch: Implement xvsat Song Gao
@ 2023-08-30 22:19   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:19 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVSAT.{B/H/W/D}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  9 ++++
>   target/loongarch/disas.c                     |  9 ++++
>   target/loongarch/vec_helper.c                | 48 ++++++++++----------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  9 ++++
>   4 files changed, 52 insertions(+), 23 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 21/48] target/loongarch: Implement xvexth
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (19 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 20/48] target/loongarch: Implement xvsat Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:34   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 22/48] target/loongarch: Implement vext2xv Song Gao
                   ` (26 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVEXTH.{H.B/W.H/D.W/Q.D};
- XVEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  9 +++++
 target/loongarch/disas.c                     |  9 +++++
 target/loongarch/vec_helper.c                | 36 +++++++++++++-------
 target/loongarch/insn_trans/trans_lasx.c.inc |  9 +++++
 4 files changed, 51 insertions(+), 12 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index e366cf7615..7491f295a5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1571,6 +1571,15 @@ xvsat_hu         0111 01110010 10000 1 .... ..... .....   @vv_ui4
 xvsat_wu         0111 01110010 10001 ..... ..... .....    @vv_ui5
 xvsat_du         0111 01110010 1001 ...... ..... .....    @vv_ui6
 
+xvexth_h_b       0111 01101001 11101 11000 ..... .....    @vv
+xvexth_w_h       0111 01101001 11101 11001 ..... .....    @vv
+xvexth_d_w       0111 01101001 11101 11010 ..... .....    @vv
+xvexth_q_d       0111 01101001 11101 11011 ..... .....    @vv
+xvexth_hu_bu     0111 01101001 11101 11100 ..... .....    @vv
+xvexth_wu_hu     0111 01101001 11101 11101 ..... .....    @vv
+xvexth_du_wu     0111 01101001 11101 11110 ..... .....    @vv
+xvexth_qu_du     0111 01101001 11101 11111 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 09e5981fc3..6ca545956d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1988,6 +1988,15 @@ INSN_LASX(xvsat_hu,          vv_i)
 INSN_LASX(xvsat_wu,          vv_i)
 INSN_LASX(xvsat_du,          vv_i)
 
+INSN_LASX(xvexth_h_b,        vv)
+INSN_LASX(xvexth_w_h,        vv)
+INSN_LASX(xvexth_d_w,        vv)
+INSN_LASX(xvexth_q_d,        vv)
+INSN_LASX(xvexth_hu_bu,      vv)
+INSN_LASX(xvexth_wu_hu,      vv)
+INSN_LASX(xvexth_du_wu,      vv)
+INSN_LASX(xvexth_qu_du,      vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 44daf5ee9a..51cc8c4526 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -691,32 +691,44 @@ VSAT_U(vsat_hu, 16, UH)
 VSAT_U(vsat_wu, 32, UW)
 VSAT_U(vsat_du, 64, UD)
 
-#define VEXTH(NAME, BIT, E1, E2)                     \
-void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
-{                                                    \
-    int i;                                           \
-    VReg *Vd = (VReg *)vd;                           \
-    VReg *Vj = (VReg *)vj;                           \
-                                                     \
-    for (i = 0; i < LSX_LEN/BIT; i++) {              \
-        Vd->E1(i) = Vj->E2(i + LSX_LEN/BIT);         \
-    }                                                \
+#define VEXTH(NAME, BIT, E1, E2)                                 \
+void HELPER(NAME)(void *vd, void *vj, uint32_t desc)             \
+{                                                                \
+    int i, j, ofs;                                               \
+    VReg *Vd = (VReg *)vd;                                       \
+    VReg *Vj = (VReg *)vj;                                       \
+    int oprsz = simd_oprsz(desc);                                \
+                                                                 \
+    ofs = LSX_LEN / BIT;                                         \
+    for (i = 0; i < oprsz / 16; i++) {                           \
+        for (j = 0; j < ofs; j++) {                              \
+            Vd->E1(j + i * ofs) = Vj->E2(j + ofs + ofs * 2 * i); \
+        }                                                        \
+    }                                                            \
 }
 
 void HELPER(vexth_q_d)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_makes64(Vj->D(1));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_makes64(Vj->D(2 * i + 1));
+    }
 }
 
 void HELPER(vexth_qu_du)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_make64((uint64_t)Vj->D(1));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_make64(Vj->UD(2 * i + 1));
+    }
 }
 
 VEXTH(vexth_h_b, 16, H, B)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 093cf2a1fa..3fb86d9a92 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -317,6 +317,15 @@ TRANS(xvsat_hu, LASX, gvec_vv_i, 32, MO_16, do_vsat_u)
 TRANS(xvsat_wu, LASX, gvec_vv_i, 32, MO_32, do_vsat_u)
 TRANS(xvsat_du, LASX, gvec_vv_i, 32, MO_64, do_vsat_u)
 
+TRANS(xvexth_h_b, LASX, gen_vv, 32, gen_helper_vexth_h_b)
+TRANS(xvexth_w_h, LASX, gen_vv, 32, gen_helper_vexth_w_h)
+TRANS(xvexth_d_w, LASX, gen_vv, 32, gen_helper_vexth_d_w)
+TRANS(xvexth_q_d, LASX, gen_vv, 32, gen_helper_vexth_q_d)
+TRANS(xvexth_hu_bu, LASX, gen_vv, 32, gen_helper_vexth_hu_bu)
+TRANS(xvexth_wu_hu, LASX, gen_vv, 32, gen_helper_vexth_wu_hu)
+TRANS(xvexth_du_wu, LASX, gen_vv, 32, gen_helper_vexth_du_wu)
+TRANS(xvexth_qu_du, LASX, gen_vv, 32, gen_helper_vexth_qu_du)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 21/48] target/loongarch: Implement xvexth
  2023-08-30  8:48 ` [PATCH v4 21/48] target/loongarch: Implement xvexth Song Gao
@ 2023-08-30 22:34   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:34 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVEXTH.{H.B/W.H/D.W/Q.D};
> - XVEXTH.{HU.BU/WU.HU/DU.WU/QU.DU}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  9 +++++
>   target/loongarch/disas.c                     |  9 +++++
>   target/loongarch/vec_helper.c                | 36 +++++++++++++-------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  9 +++++
>   4 files changed, 51 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 22/48] target/loongarch: Implement vext2xv
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (20 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 21/48] target/loongarch: Implement xvexth Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:36   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 23/48] target/loongarch: Implement xvsigncov Song Gao
                   ` (25 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VEXT2XV.{H/W/D}.B, VEXT2XV.{HU/WU/DU}.BU;
- VEXT2XV.{W/D}.B, VEXT2XV.{WU/DU}.HU;
- VEXT2XV.D.W, VEXT2XV.DU.WU.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h                    | 13 +++++++++
 target/loongarch/insns.decode                | 13 +++++++++
 target/loongarch/disas.c                     | 13 +++++++++
 target/loongarch/vec_helper.c                | 28 ++++++++++++++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 13 +++++++++
 5 files changed, 80 insertions(+)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 1abd9e1410..e9c5412267 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -340,6 +340,19 @@ DEF_HELPER_FLAGS_3(vexth_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(vexth_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 DEF_HELPER_FLAGS_3(vexth_qu_du, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
 
+DEF_HELPER_FLAGS_3(vext2xv_h_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_w_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_d_b, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_w_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_d_h, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_d_w, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_hu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_wu_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_du_bu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_wu_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_du_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(vext2xv_du_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+
 DEF_HELPER_FLAGS_4(vsigncov_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vsigncov_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7491f295a5..db1a6689f0 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1580,6 +1580,19 @@ xvexth_wu_hu     0111 01101001 11101 11101 ..... .....    @vv
 xvexth_du_wu     0111 01101001 11101 11110 ..... .....    @vv
 xvexth_qu_du     0111 01101001 11101 11111 ..... .....    @vv
 
+vext2xv_h_b      0111 01101001 11110 00100 ..... .....    @vv
+vext2xv_w_b      0111 01101001 11110 00101 ..... .....    @vv
+vext2xv_d_b      0111 01101001 11110 00110 ..... .....    @vv
+vext2xv_w_h      0111 01101001 11110 00111 ..... .....    @vv
+vext2xv_d_h      0111 01101001 11110 01000 ..... .....    @vv
+vext2xv_d_w      0111 01101001 11110 01001 ..... .....    @vv
+vext2xv_hu_bu    0111 01101001 11110 01010 ..... .....    @vv
+vext2xv_wu_bu    0111 01101001 11110 01011 ..... .....    @vv
+vext2xv_du_bu    0111 01101001 11110 01100 ..... .....    @vv
+vext2xv_wu_hu    0111 01101001 11110 01101 ..... .....    @vv
+vext2xv_du_hu    0111 01101001 11110 01110 ..... .....    @vv
+vext2xv_du_wu    0111 01101001 11110 01111 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 6ca545956d..975ea018da 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1997,6 +1997,19 @@ INSN_LASX(xvexth_wu_hu,      vv)
 INSN_LASX(xvexth_du_wu,      vv)
 INSN_LASX(xvexth_qu_du,      vv)
 
+INSN_LASX(vext2xv_h_b,       vv)
+INSN_LASX(vext2xv_w_b,       vv)
+INSN_LASX(vext2xv_d_b,       vv)
+INSN_LASX(vext2xv_w_h,       vv)
+INSN_LASX(vext2xv_d_h,       vv)
+INSN_LASX(vext2xv_d_w,       vv)
+INSN_LASX(vext2xv_hu_bu,     vv)
+INSN_LASX(vext2xv_wu_bu,     vv)
+INSN_LASX(vext2xv_du_bu,     vv)
+INSN_LASX(vext2xv_wu_hu,     vv)
+INSN_LASX(vext2xv_du_hu,     vv)
+INSN_LASX(vext2xv_du_wu,     vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 51cc8c4526..5f78bd076b 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -738,6 +738,34 @@ VEXTH(vexth_hu_bu, 16, UH, UB)
 VEXTH(vexth_wu_hu, 32, UW, UH)
 VEXTH(vexth_du_wu, 64, UD, UW)
 
+#define VEXT2XV(NAME, BIT, E1, E2)                   \
+void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
+{                                                    \
+    int i;                                           \
+    VReg temp = {};                                  \
+    VReg *Vd = (VReg *)vd;                           \
+    VReg *Vj = (VReg *)vj;                           \
+    int oprsz = simd_oprsz(desc);                    \
+                                                     \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {        \
+        temp.E1(i) = Vj->E2(i);                      \
+    }                                                \
+    *Vd = temp;                                      \
+}
+
+VEXT2XV(vext2xv_h_b, 16, H, B)
+VEXT2XV(vext2xv_w_b, 32, W, B)
+VEXT2XV(vext2xv_d_b, 64, D, B)
+VEXT2XV(vext2xv_w_h, 32, W, H)
+VEXT2XV(vext2xv_d_h, 64, D, H)
+VEXT2XV(vext2xv_d_w, 64, D, W)
+VEXT2XV(vext2xv_hu_bu, 16, UH, UB)
+VEXT2XV(vext2xv_wu_bu, 32, UW, UB)
+VEXT2XV(vext2xv_du_bu, 64, UD, UB)
+VEXT2XV(vext2xv_wu_hu, 32, UW, UH)
+VEXT2XV(vext2xv_du_hu, 64, UD, UH)
+VEXT2XV(vext2xv_du_wu, 64, UD, UW)
+
 #define DO_SIGNCOV(a, b)  (a == 0 ? 0 : a < 0 ? -b : b)
 
 DO_3OP(vsigncov_b, 8, B, DO_SIGNCOV)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 3fb86d9a92..1e75815995 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -326,6 +326,19 @@ TRANS(xvexth_wu_hu, LASX, gen_vv, 32, gen_helper_vexth_wu_hu)
 TRANS(xvexth_du_wu, LASX, gen_vv, 32, gen_helper_vexth_du_wu)
 TRANS(xvexth_qu_du, LASX, gen_vv, 32, gen_helper_vexth_qu_du)
 
+TRANS(vext2xv_h_b, LASX, gen_vv, 32, gen_helper_vext2xv_h_b)
+TRANS(vext2xv_w_b, LASX, gen_vv, 32, gen_helper_vext2xv_w_b)
+TRANS(vext2xv_d_b, LASX, gen_vv, 32, gen_helper_vext2xv_d_b)
+TRANS(vext2xv_w_h, LASX, gen_vv, 32, gen_helper_vext2xv_w_h)
+TRANS(vext2xv_d_h, LASX, gen_vv, 32, gen_helper_vext2xv_d_h)
+TRANS(vext2xv_d_w, LASX, gen_vv, 32, gen_helper_vext2xv_d_w)
+TRANS(vext2xv_hu_bu, LASX, gen_vv, 32, gen_helper_vext2xv_hu_bu)
+TRANS(vext2xv_wu_bu, LASX, gen_vv, 32, gen_helper_vext2xv_wu_bu)
+TRANS(vext2xv_du_bu, LASX, gen_vv, 32, gen_helper_vext2xv_du_bu)
+TRANS(vext2xv_wu_hu, LASX, gen_vv, 32, gen_helper_vext2xv_wu_hu)
+TRANS(vext2xv_du_hu, LASX, gen_vv, 32, gen_helper_vext2xv_du_hu)
+TRANS(vext2xv_du_wu, LASX, gen_vv, 32, gen_helper_vext2xv_du_wu)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 22/48] target/loongarch: Implement vext2xv
  2023-08-30  8:48 ` [PATCH v4 22/48] target/loongarch: Implement vext2xv Song Gao
@ 2023-08-30 22:36   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:36 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - VEXT2XV.{H/W/D}.B, VEXT2XV.{HU/WU/DU}.BU;
> - VEXT2XV.{W/D}.B, VEXT2XV.{WU/DU}.HU;
> - VEXT2XV.D.W, VEXT2XV.DU.WU.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/helper.h                    | 13 +++++++++
>   target/loongarch/insns.decode                | 13 +++++++++
>   target/loongarch/disas.c                     | 13 +++++++++
>   target/loongarch/vec_helper.c                | 28 ++++++++++++++++++++
>   target/loongarch/insn_trans/trans_lasx.c.inc | 13 +++++++++
>   5 files changed, 80 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 23/48] target/loongarch: Implement xvsigncov
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (21 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 22/48] target/loongarch: Implement vext2xv Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 24/48] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
                   ` (24 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSIGNCOV.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/vec.h                       | 2 ++
 target/loongarch/insns.decode                | 5 +++++
 target/loongarch/disas.c                     | 5 +++++
 target/loongarch/vec_helper.c                | 2 --
 target/loongarch/insn_trans/trans_lasx.c.inc | 5 +++++
 5 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index ee50d53f4e..681afd842f 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -72,4 +72,6 @@
 #define DO_REM(N, M)  (unlikely(M == 0) ? 0 :\
         unlikely((N == -N) && (M == (__typeof(N))(-1))) ? 0 : N % M)
 
+#define DO_SIGNCOV(a, b)  (a == 0 ? 0 : a < 0 ? -b : b)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index db1a6689f0..7bbda1a142 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1593,6 +1593,11 @@ vext2xv_wu_hu    0111 01101001 11110 01101 ..... .....    @vv
 vext2xv_du_hu    0111 01101001 11110 01110 ..... .....    @vv
 vext2xv_du_wu    0111 01101001 11110 01111 ..... .....    @vv
 
+xvsigncov_b      0111 01010010 11100 ..... ..... .....    @vvv
+xvsigncov_h      0111 01010010 11101 ..... ..... .....    @vvv
+xvsigncov_w      0111 01010010 11110 ..... ..... .....    @vvv
+xvsigncov_d      0111 01010010 11111 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 975ea018da..85e0cb7c8d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2010,6 +2010,11 @@ INSN_LASX(vext2xv_wu_hu,     vv)
 INSN_LASX(vext2xv_du_hu,     vv)
 INSN_LASX(vext2xv_du_wu,     vv)
 
+INSN_LASX(xvsigncov_b,       vvv)
+INSN_LASX(xvsigncov_h,       vvv)
+INSN_LASX(xvsigncov_w,       vvv)
+INSN_LASX(xvsigncov_d,       vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 5f78bd076b..0a322b3287 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -766,8 +766,6 @@ VEXT2XV(vext2xv_wu_hu, 32, UW, UH)
 VEXT2XV(vext2xv_du_hu, 64, UD, UH)
 VEXT2XV(vext2xv_du_wu, 64, UD, UW)
 
-#define DO_SIGNCOV(a, b)  (a == 0 ? 0 : a < 0 ? -b : b)
-
 DO_3OP(vsigncov_b, 8, B, DO_SIGNCOV)
 DO_3OP(vsigncov_h, 16, H, DO_SIGNCOV)
 DO_3OP(vsigncov_w, 32, W, DO_SIGNCOV)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 1e75815995..93dff7d20a 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -339,6 +339,11 @@ TRANS(vext2xv_wu_hu, LASX, gen_vv, 32, gen_helper_vext2xv_wu_hu)
 TRANS(vext2xv_du_hu, LASX, gen_vv, 32, gen_helper_vext2xv_du_hu)
 TRANS(vext2xv_du_wu, LASX, gen_vv, 32, gen_helper_vext2xv_du_wu)
 
+TRANS(xvsigncov_b, LASX, gvec_vvv, 32, MO_8, do_vsigncov)
+TRANS(xvsigncov_h, LASX, gvec_vvv, 32, MO_16, do_vsigncov)
+TRANS(xvsigncov_w, LASX, gvec_vvv, 32, MO_32, do_vsigncov)
+TRANS(xvsigncov_d, LASX, gvec_vvv, 32, MO_64, do_vsigncov)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 24/48] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (22 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 23/48] target/loongarch: Implement xvsigncov Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:44   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 25/48] target/loognarch: Implement xvldi Song Gao
                   ` (23 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVMSKLTZ.{B/H/W/D};
- XVMSKGEZ.B;
- XVMSKNZ.B.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  7 ++
 target/loongarch/disas.c                     |  7 ++
 target/loongarch/vec_helper.c                | 80 ++++++++++++++------
 target/loongarch/insn_trans/trans_lasx.c.inc |  7 ++
 4 files changed, 76 insertions(+), 25 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 7bbda1a142..6a161d6d20 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1598,6 +1598,13 @@ xvsigncov_h      0111 01010010 11101 ..... ..... .....    @vvv
 xvsigncov_w      0111 01010010 11110 ..... ..... .....    @vvv
 xvsigncov_d      0111 01010010 11111 ..... ..... .....    @vvv
 
+xvmskltz_b       0111 01101001 11000 10000 ..... .....    @vv
+xvmskltz_h       0111 01101001 11000 10001 ..... .....    @vv
+xvmskltz_w       0111 01101001 11000 10010 ..... .....    @vv
+xvmskltz_d       0111 01101001 11000 10011 ..... .....    @vv
+xvmskgez_b       0111 01101001 11000 10100 ..... .....    @vv
+xvmsknz_b        0111 01101001 11000 11000 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 85e0cb7c8d..1a11153343 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2010,6 +2010,13 @@ INSN_LASX(vext2xv_wu_hu,     vv)
 INSN_LASX(vext2xv_du_hu,     vv)
 INSN_LASX(vext2xv_du_wu,     vv)
 
+INSN_LASX(xvmskltz_b,        vv)
+INSN_LASX(xvmskltz_h,        vv)
+INSN_LASX(xvmskltz_w,        vv)
+INSN_LASX(xvmskltz_d,        vv)
+INSN_LASX(xvmskgez_b,        vv)
+INSN_LASX(xvmsknz_b,         vv)
+
 INSN_LASX(xvsigncov_b,       vvv)
 INSN_LASX(xvsigncov_h,       vvv)
 INSN_LASX(xvsigncov_w,       vvv)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 0a322b3287..47837875a8 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -783,14 +783,19 @@ static uint64_t do_vmskltz_b(int64_t val)
 
 void HELPER(vmskltz_b)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     uint16_t temp = 0;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    temp = do_vmskltz_b(Vj->D(0));
-    temp |= (do_vmskltz_b(Vj->D(1)) << 8);
-    Vd->D(0) = temp;
-    Vd->D(1) = 0;
+    for (i = 0; i < oprsz / 16; i++) {
+        temp = 0;
+        temp = do_vmskltz_b(Vj->D(2 * i));
+        temp |= (do_vmskltz_b(Vj->D(2 * i  + 1)) << 8);
+        Vd->D(2 * i) = temp;
+        Vd->D(2 * i + 1) = 0;
+    }
 }
 
 static uint64_t do_vmskltz_h(int64_t val)
@@ -804,14 +809,19 @@ static uint64_t do_vmskltz_h(int64_t val)
 
 void HELPER(vmskltz_h)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     uint16_t temp = 0;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    temp = do_vmskltz_h(Vj->D(0));
-    temp |= (do_vmskltz_h(Vj->D(1)) << 4);
-    Vd->D(0) = temp;
-    Vd->D(1) = 0;
+    for (i = 0; i < oprsz / 16; i++) {
+        temp = 0;
+        temp = do_vmskltz_h(Vj->D(2 * i));
+        temp |= (do_vmskltz_h(Vj->D(2 * i + 1)) << 4);
+        Vd->D(2 * i) = temp;
+        Vd->D(2 * i + 1) = 0;
+    }
 }
 
 static uint64_t do_vmskltz_w(int64_t val)
@@ -824,14 +834,19 @@ static uint64_t do_vmskltz_w(int64_t val)
 
 void HELPER(vmskltz_w)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     uint16_t temp = 0;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    temp = do_vmskltz_w(Vj->D(0));
-    temp |= (do_vmskltz_w(Vj->D(1)) << 2);
-    Vd->D(0) = temp;
-    Vd->D(1) = 0;
+    for (i = 0; i < oprsz / 16; i++) {
+        temp = 0;
+        temp = do_vmskltz_w(Vj->D(2 * i));
+        temp |= (do_vmskltz_w(Vj->D(2 * i + 1)) << 2);
+        Vd->D(2 * i) = temp;
+        Vd->D(2 * i + 1) = 0;
+    }
 }
 
 static uint64_t do_vmskltz_d(int64_t val)
@@ -840,26 +855,36 @@ static uint64_t do_vmskltz_d(int64_t val)
 }
 void HELPER(vmskltz_d)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     uint16_t temp = 0;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    temp = do_vmskltz_d(Vj->D(0));
-    temp |= (do_vmskltz_d(Vj->D(1)) << 1);
-    Vd->D(0) = temp;
-    Vd->D(1) = 0;
+    for (i = 0; i < oprsz / 16; i++) {
+        temp = 0;
+        temp = do_vmskltz_d(Vj->D(2 * i));
+        temp |= (do_vmskltz_d(Vj->D(2 * i + 1)) << 1);
+        Vd->D(2 * i) = temp;
+        Vd->D(2 * i + 1) = 0;
+    }
 }
 
 void HELPER(vmskgez_b)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     uint16_t temp = 0;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    temp =  do_vmskltz_b(Vj->D(0));
-    temp |= (do_vmskltz_b(Vj->D(1)) << 8);
-    Vd->D(0) = (uint16_t)(~temp);
-    Vd->D(1) = 0;
+    for (i = 0; i < oprsz / 16; i++) {
+        temp = 0;
+        temp =  do_vmskltz_b(Vj->D(2 * i));
+        temp |= (do_vmskltz_b(Vj->D(2 * i + 1)) << 8);
+        Vd->D(2 * i) = (uint16_t)(~temp);
+        Vd->D(2 * i + 1) = 0;
+    }
 }
 
 static uint64_t do_vmskez_b(uint64_t a)
@@ -872,16 +897,21 @@ static uint64_t do_vmskez_b(uint64_t a)
     return c >> 56;
 }
 
-void HELPER(vmsknz_b)(void vd, void vj, uint32_t desc)
+void HELPER(vmsknz_b)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     uint16_t temp = 0;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    temp = do_vmskez_b(Vj->D(0));
-    temp |= (do_vmskez_b(Vj->D(1)) << 8);
-    Vd->D(0) = (uint16_t)(~temp);
-    Vd->D(1) = 0;
+    for (i = 0; i < oprsz / 16; i++) {
+        temp = 0;
+        temp = do_vmskez_b(Vj->D(2 * i));
+        temp |= (do_vmskez_b(Vj->D(2 * i + 1)) << 8);
+        Vd->D(2 * i) = (uint16_t)(~temp);
+        Vd->D(2 * i + 1) = 0;
+    }
 }
 
 void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 93dff7d20a..92fae91900 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -344,6 +344,13 @@ TRANS(xvsigncov_h, LASX, gvec_vvv, 32, MO_16, do_vsigncov)
 TRANS(xvsigncov_w, LASX, gvec_vvv, 32, MO_32, do_vsigncov)
 TRANS(xvsigncov_d, LASX, gvec_vvv, 32, MO_64, do_vsigncov)
 
+TRANS(xvmskltz_b, LASX, gen_vv, 32, gen_helper_vmskltz_b)
+TRANS(xvmskltz_h, LASX, gen_vv, 32, gen_helper_vmskltz_h)
+TRANS(xvmskltz_w, LASX, gen_vv, 32, gen_helper_vmskltz_w)
+TRANS(xvmskltz_d, LASX, gen_vv, 32, gen_helper_vmskltz_d)
+TRANS(xvmskgez_b, LASX, gen_vv, 32, gen_helper_vmskgez_b)
+TRANS(xvmsknz_b, LASX, gen_vv, 32, gen_helper_vmsknz_b)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 24/48] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz
  2023-08-30  8:48 ` [PATCH v4 24/48] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
@ 2023-08-30 22:44   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:44 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVMSKLTZ.{B/H/W/D};
> - XVMSKGEZ.B;
> - XVMSKNZ.B.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  7 ++
>   target/loongarch/disas.c                     |  7 ++
>   target/loongarch/vec_helper.c                | 80 ++++++++++++++------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  7 ++
>   4 files changed, 76 insertions(+), 25 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 25/48] target/loognarch: Implement xvldi
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (23 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 24/48] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 26/48] target/loongarch: Implement LASX logic instructions Song Gao
                   ` (22 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVLDI.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/insns.decode                | 2 ++
 target/loongarch/disas.c                     | 7 +++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 2 ++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 6 ++++--
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6a161d6d20..edaa756395 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1605,6 +1605,8 @@ xvmskltz_d       0111 01101001 11000 10011 ..... .....    @vv
 xvmskgez_b       0111 01101001 11000 10100 ..... .....    @vv
 xvmsknz_b        0111 01101001 11000 11000 ..... .....    @vv
 
+xvldi            0111 01111110 00 ............. .....     @v_i13
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 1a11153343..8fa2edf007 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1703,6 +1703,11 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
     return true;                                            \
 }
 
+static void output_v_i_x(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, 0x%x", a->vd, a->imm);
+}
+
 static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
 {
     output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
@@ -2022,6 +2027,8 @@ INSN_LASX(xvsigncov_h,       vvv)
 INSN_LASX(xvsigncov_w,       vvv)
 INSN_LASX(xvsigncov_d,       vvv)
 
+INSN_LASX(xvldi,             v_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 92fae91900..f0e71f5f98 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -351,6 +351,8 @@ TRANS(xvmskltz_d, LASX, gen_vv, 32, gen_helper_vmskltz_d)
 TRANS(xvmskgez_b, LASX, gen_vv, 32, gen_helper_vmskgez_b)
 TRANS(xvmsknz_b, LASX, gen_vv, 32, gen_helper_vmsknz_b)
 
+TRANS(xvldi, LASX, do_vldi, 32)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 7e77686bfc..f76da508c3 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3068,7 +3068,7 @@ static uint64_t vldi_get_value(DisasContext *ctx, uint32_t imm)
     return data;
 }
 
-static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
+static bool do_vldi(DisasContext *ctx, arg_vldi *a, uint32_t oprsz)
 {
     int sel, vece;
     uint64_t value;
@@ -3089,11 +3089,13 @@ static bool trans_vldi(DisasContext *ctx, arg_vldi *a)
         vece = (a->imm >> 10) & 0x3;
     }
 
-    tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->vd), 16, ctx->vl/8,
+    tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->vd), oprsz, ctx->vl / 8,
                          tcg_constant_i64(value));
     return true;
 }
 
+TRANS(vldi, LSX, do_vldi, 16)
+
 TRANS(vand_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_and)
 TRANS(vor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_or)
 TRANS(vxor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_xor)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 26/48] target/loongarch: Implement LASX logic instructions
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (24 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 25/48] target/loognarch: Implement xvldi Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:46   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 27/48] target/loongarch: Implement xvsll xvsrl xvsra xvrotr Song Gao
                   ` (21 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XV{AND/OR/XOR/NOR/ANDN/ORN}.V;
- XV{AND/OR/XOR/NOR}I.B.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                | 12 ++++++++++++
 target/loongarch/disas.c                     | 12 ++++++++++++
 target/loongarch/vec_helper.c                |  4 ++--
 target/loongarch/insn_trans/trans_lasx.c.inc | 11 +++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  |  5 +++--
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index edaa756395..fb28666577 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1607,6 +1607,18 @@ xvmsknz_b        0111 01101001 11000 11000 ..... .....    @vv
 
 xvldi            0111 01111110 00 ............. .....     @v_i13
 
+xvand_v          0111 01010010 01100 ..... ..... .....    @vvv
+xvor_v           0111 01010010 01101 ..... ..... .....    @vvv
+xvxor_v          0111 01010010 01110 ..... ..... .....    @vvv
+xvnor_v          0111 01010010 01111 ..... ..... .....    @vvv
+xvandn_v         0111 01010010 10000 ..... ..... .....    @vvv
+xvorn_v          0111 01010010 10001 ..... ..... .....    @vvv
+
+xvandi_b         0111 01111101 00 ........ ..... .....    @vv_ui8
+xvori_b          0111 01111101 01 ........ ..... .....    @vv_ui8
+xvxori_b         0111 01111101 10 ........ ..... .....    @vv_ui8
+xvnori_b         0111 01111101 11 ........ ..... .....    @vv_ui8
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 8fa2edf007..59fa249bae 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2029,6 +2029,18 @@ INSN_LASX(xvsigncov_d,       vvv)
 
 INSN_LASX(xvldi,             v_i)
 
+INSN_LASX(xvand_v,           vvv)
+INSN_LASX(xvor_v,            vvv)
+INSN_LASX(xvxor_v,           vvv)
+INSN_LASX(xvnor_v,           vvv)
+INSN_LASX(xvandn_v,          vvv)
+INSN_LASX(xvorn_v,           vvv)
+
+INSN_LASX(xvandi_b,          vv_i)
+INSN_LASX(xvori_b,           vv_i)
+INSN_LASX(xvxori_b,          vv_i)
+INSN_LASX(xvnori_b,          vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 47837875a8..e33969339f 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -914,13 +914,13 @@ void HELPER(vmsknz_b)(void *vd, void *vj, uint32_t desc)
     }
 }
 
-void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t v)
+void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
 
-    for (i = 0; i < LSX_LEN/8; i++) {
+    for (i = 0; i < simd_oprsz(desc); i++) {
         Vd->B(i) = ~(Vj->B(i) | (uint8_t)imm);
     }
 }
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index f0e71f5f98..9a3a504538 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -353,6 +353,17 @@ TRANS(xvmsknz_b, LASX, gen_vv, 32, gen_helper_vmsknz_b)
 
 TRANS(xvldi, LASX, do_vldi, 32)
 
+TRANS(xvand_v, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_and)
+TRANS(xvor_v, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_or)
+TRANS(xvxor_v, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_xor)
+TRANS(xvnor_v, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_nor)
+TRANS(xvandn_v, LASX, do_vandn_v, 32)
+TRANS(xvorn_v, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_orc)
+TRANS(xvandi_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_andi)
+TRANS(xvori_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_ori)
+TRANS(xvxori_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_xori)
+TRANS(xvnori_b, LASX, gvec_vv_i, 32, MO_8, do_vnori_b)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index f76da508c3..64de014a58 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3101,7 +3101,7 @@ TRANS(vor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_or)
 TRANS(vxor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_xor)
 TRANS(vnor_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_nor)
 
-static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
+static bool do_vandn_v(DisasContext *ctx, arg_vvv *a, uint32_t oprsz)
 {
     uint32_t vd_ofs, vj_ofs, vk_ofs;
 
@@ -3115,9 +3115,10 @@ static bool trans_vandn_v(DisasContext *ctx, arg_vvv *a)
     vj_ofs = vec_full_offset(a->vj);
     vk_ofs = vec_full_offset(a->vk);
 
-    tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, 16, ctx->vl/8);
+    tcg_gen_gvec_andc(MO_64, vd_ofs, vk_ofs, vj_ofs, oprsz, ctx->vl / 8);
     return true;
 }
+TRANS(vandn_v, LSX, do_vandn_v, 16)
 TRANS(vorn_v, LSX, gvec_vvv, 16, MO_64, tcg_gen_gvec_orc)
 TRANS(vandi_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_andi)
 TRANS(vori_b, LSX, gvec_vv_i, 16, MO_8, tcg_gen_gvec_ori)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 26/48] target/loongarch: Implement LASX logic instructions
  2023-08-30  8:48 ` [PATCH v4 26/48] target/loongarch: Implement LASX logic instructions Song Gao
@ 2023-08-30 22:46   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:46 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XV{AND/OR/XOR/NOR/ANDN/ORN}.V;
> - XV{AND/OR/XOR/NOR}I.B.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                | 12 ++++++++++++
>   target/loongarch/disas.c                     | 12 ++++++++++++
>   target/loongarch/vec_helper.c                |  4 ++--
>   target/loongarch/insn_trans/trans_lasx.c.inc | 11 +++++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc  |  5 +++--
>   5 files changed, 40 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 27/48] target/loongarch: Implement xvsll xvsrl xvsra xvrotr
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (25 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 26/48] target/loongarch: Implement LASX logic instructions Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 28/48] target/loongarch: Implement xvsllwil xvextl Song Gao
                   ` (20 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSLL[I].{B/H/W/D};
- XVSRL[I].{B/H/W/D};
- XVSRA[I].{B/H/W/D};
- XVROTR[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/loongarch/insns.decode                | 33 ++++++++++++++++++
 target/loongarch/disas.c                     | 36 ++++++++++++++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 36 ++++++++++++++++++++
 3 files changed, 105 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index fb28666577..fb7bd9fb34 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1619,6 +1619,39 @@ xvori_b          0111 01111101 01 ........ ..... .....    @vv_ui8
 xvxori_b         0111 01111101 10 ........ ..... .....    @vv_ui8
 xvnori_b         0111 01111101 11 ........ ..... .....    @vv_ui8
 
+xvsll_b          0111 01001110 10000 ..... ..... .....    @vvv
+xvsll_h          0111 01001110 10001 ..... ..... .....    @vvv
+xvsll_w          0111 01001110 10010 ..... ..... .....    @vvv
+xvsll_d          0111 01001110 10011 ..... ..... .....    @vvv
+xvslli_b         0111 01110010 11000 01 ... ..... .....   @vv_ui3
+xvslli_h         0111 01110010 11000 1 .... ..... .....   @vv_ui4
+xvslli_w         0111 01110010 11001 ..... ..... .....    @vv_ui5
+xvslli_d         0111 01110010 1101 ...... ..... .....    @vv_ui6
+xvsrl_b          0111 01001110 10100 ..... ..... .....    @vvv
+xvsrl_h          0111 01001110 10101 ..... ..... .....    @vvv
+xvsrl_w          0111 01001110 10110 ..... ..... .....    @vvv
+xvsrl_d          0111 01001110 10111 ..... ..... .....    @vvv
+xvsrli_b         0111 01110011 00000 01 ... ..... .....   @vv_ui3
+xvsrli_h         0111 01110011 00000 1 .... ..... .....   @vv_ui4
+xvsrli_w         0111 01110011 00001 ..... ..... .....    @vv_ui5
+xvsrli_d         0111 01110011 0001 ...... ..... .....    @vv_ui6
+xvsra_b          0111 01001110 11000 ..... ..... .....    @vvv
+xvsra_h          0111 01001110 11001 ..... ..... .....    @vvv
+xvsra_w          0111 01001110 11010 ..... ..... .....    @vvv
+xvsra_d          0111 01001110 11011 ..... ..... .....    @vvv
+xvsrai_b         0111 01110011 01000 01 ... ..... .....   @vv_ui3
+xvsrai_h         0111 01110011 01000 1 .... ..... .....   @vv_ui4
+xvsrai_w         0111 01110011 01001 ..... ..... .....    @vv_ui5
+xvsrai_d         0111 01110011 0101 ...... ..... .....    @vv_ui6
+xvrotr_b         0111 01001110 11100 ..... ..... .....    @vvv
+xvrotr_h         0111 01001110 11101 ..... ..... .....    @vvv
+xvrotr_w         0111 01001110 11110 ..... ..... .....    @vvv
+xvrotr_d         0111 01001110 11111 ..... ..... .....    @vvv
+xvrotri_b        0111 01101010 00000 01 ... ..... .....   @vv_ui3
+xvrotri_h        0111 01101010 00000 1 .... ..... .....   @vv_ui4
+xvrotri_w        0111 01101010 00001 ..... ..... .....    @vv_ui5
+xvrotri_d        0111 01101010 0001 ...... ..... .....    @vv_ui6
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 59fa249bae..e081a11aba 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2041,6 +2041,42 @@ INSN_LASX(xvori_b,           vv_i)
 INSN_LASX(xvxori_b,          vv_i)
 INSN_LASX(xvnori_b,          vv_i)
 
+INSN_LASX(xvsll_b,           vvv)
+INSN_LASX(xvsll_h,           vvv)
+INSN_LASX(xvsll_w,           vvv)
+INSN_LASX(xvsll_d,           vvv)
+INSN_LASX(xvslli_b,          vv_i)
+INSN_LASX(xvslli_h,          vv_i)
+INSN_LASX(xvslli_w,          vv_i)
+INSN_LASX(xvslli_d,          vv_i)
+
+INSN_LASX(xvsrl_b,           vvv)
+INSN_LASX(xvsrl_h,           vvv)
+INSN_LASX(xvsrl_w,           vvv)
+INSN_LASX(xvsrl_d,           vvv)
+INSN_LASX(xvsrli_b,          vv_i)
+INSN_LASX(xvsrli_h,          vv_i)
+INSN_LASX(xvsrli_w,          vv_i)
+INSN_LASX(xvsrli_d,          vv_i)
+
+INSN_LASX(xvsra_b,           vvv)
+INSN_LASX(xvsra_h,           vvv)
+INSN_LASX(xvsra_w,           vvv)
+INSN_LASX(xvsra_d,           vvv)
+INSN_LASX(xvsrai_b,          vv_i)
+INSN_LASX(xvsrai_h,          vv_i)
+INSN_LASX(xvsrai_w,          vv_i)
+INSN_LASX(xvsrai_d,          vv_i)
+
+INSN_LASX(xvrotr_b,          vvv)
+INSN_LASX(xvrotr_h,          vvv)
+INSN_LASX(xvrotr_w,          vvv)
+INSN_LASX(xvrotr_d,          vvv)
+INSN_LASX(xvrotri_b,         vv_i)
+INSN_LASX(xvrotri_h,         vv_i)
+INSN_LASX(xvrotri_w,         vv_i)
+INSN_LASX(xvrotri_d,         vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 9a3a504538..d13dfacebf 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -364,6 +364,42 @@ TRANS(xvori_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_ori)
 TRANS(xvxori_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_xori)
 TRANS(xvnori_b, LASX, gvec_vv_i, 32, MO_8, do_vnori_b)
 
+TRANS(xvsll_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_shlv)
+TRANS(xvsll_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_shlv)
+TRANS(xvsll_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_shlv)
+TRANS(xvsll_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_shlv)
+TRANS(xvslli_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_shli)
+TRANS(xvslli_h, LASX, gvec_vv_i, 32, MO_16, tcg_gen_gvec_shli)
+TRANS(xvslli_w, LASX, gvec_vv_i, 32, MO_32, tcg_gen_gvec_shli)
+TRANS(xvslli_d, LASX, gvec_vv_i, 32, MO_64, tcg_gen_gvec_shli)
+
+TRANS(xvsrl_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_shrv)
+TRANS(xvsrl_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_shrv)
+TRANS(xvsrl_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_shrv)
+TRANS(xvsrl_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_shrv)
+TRANS(xvsrli_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_shri)
+TRANS(xvsrli_h, LASX, gvec_vv_i, 32, MO_16, tcg_gen_gvec_shri)
+TRANS(xvsrli_w, LASX, gvec_vv_i, 32, MO_32, tcg_gen_gvec_shri)
+TRANS(xvsrli_d, LASX, gvec_vv_i, 32, MO_64, tcg_gen_gvec_shri)
+
+TRANS(xvsra_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_sarv)
+TRANS(xvsra_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_sarv)
+TRANS(xvsra_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_sarv)
+TRANS(xvsra_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_sarv)
+TRANS(xvsrai_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_sari)
+TRANS(xvsrai_h, LASX, gvec_vv_i, 32, MO_16, tcg_gen_gvec_sari)
+TRANS(xvsrai_w, LASX, gvec_vv_i, 32, MO_32, tcg_gen_gvec_sari)
+TRANS(xvsrai_d, LASX, gvec_vv_i, 32, MO_64, tcg_gen_gvec_sari)
+
+TRANS(xvrotr_b, LASX, gvec_vvv, 32, MO_8, tcg_gen_gvec_rotrv)
+TRANS(xvrotr_h, LASX, gvec_vvv, 32, MO_16, tcg_gen_gvec_rotrv)
+TRANS(xvrotr_w, LASX, gvec_vvv, 32, MO_32, tcg_gen_gvec_rotrv)
+TRANS(xvrotr_d, LASX, gvec_vvv, 32, MO_64, tcg_gen_gvec_rotrv)
+TRANS(xvrotri_b, LASX, gvec_vv_i, 32, MO_8, tcg_gen_gvec_rotri)
+TRANS(xvrotri_h, LASX, gvec_vv_i, 32, MO_16, tcg_gen_gvec_rotri)
+TRANS(xvrotri_w, LASX, gvec_vv_i, 32, MO_32, tcg_gen_gvec_rotri)
+TRANS(xvrotri_d, LASX, gvec_vv_i, 32, MO_64, tcg_gen_gvec_rotri)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 28/48] target/loongarch: Implement xvsllwil xvextl
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (26 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 27/48] target/loongarch: Implement xvsll xvsrl xvsra xvrotr Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:52   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 29/48] target/loongarch: Implement xvsrlr xvsrar Song Gao
                   ` (19 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSLLWIL.{H.B/W.H/D.W};
- XVSLLWIL.{HU.BU/WU.HU/DU.WU};
- XVEXTL.Q.D, VEXTL.QU.DU.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  9 ++++
 target/loongarch/disas.c                     |  9 ++++
 target/loongarch/vec_helper.c                | 44 ++++++++++++--------
 target/loongarch/insn_trans/trans_lasx.c.inc |  9 ++++
 4 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index fb7bd9fb34..8a7933eccc 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1652,6 +1652,15 @@ xvrotri_h        0111 01101010 00000 1 .... ..... .....   @vv_ui4
 xvrotri_w        0111 01101010 00001 ..... ..... .....    @vv_ui5
 xvrotri_d        0111 01101010 0001 ...... ..... .....    @vv_ui6
 
+xvsllwil_h_b     0111 01110000 10000 01 ... ..... .....   @vv_ui3
+xvsllwil_w_h     0111 01110000 10000 1 .... ..... .....   @vv_ui4
+xvsllwil_d_w     0111 01110000 10001 ..... ..... .....    @vv_ui5
+xvextl_q_d       0111 01110000 10010 00000 ..... .....    @vv
+xvsllwil_hu_bu   0111 01110000 11000 01 ... ..... .....   @vv_ui3
+xvsllwil_wu_hu   0111 01110000 11000 1 .... ..... .....   @vv_ui4
+xvsllwil_du_wu   0111 01110000 11001 ..... ..... .....    @vv_ui5
+xvextl_qu_du     0111 01110000 11010 00000 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index e081a11aba..93c205fa32 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2077,6 +2077,15 @@ INSN_LASX(xvrotri_h,         vv_i)
 INSN_LASX(xvrotri_w,         vv_i)
 INSN_LASX(xvrotri_d,         vv_i)
 
+INSN_LASX(xvsllwil_h_b,      vv_i)
+INSN_LASX(xvsllwil_w_h,      vv_i)
+INSN_LASX(xvsllwil_d_w,      vv_i)
+INSN_LASX(xvextl_q_d,        vv)
+INSN_LASX(xvsllwil_hu_bu,    vv_i)
+INSN_LASX(xvsllwil_wu_hu,    vv_i)
+INSN_LASX(xvsllwil_du_wu,    vv_i)
+INSN_LASX(xvextl_qu_du,      vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index e33969339f..7fe9f9f34e 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -925,37 +925,47 @@ void HELPER(vnori_b)(void *vd, void *vj, uint64_t imm, uint32_t desc)
     }
 }
 
-#define VSLLWIL(NAME, BIT, E1, E2)                                 \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
-{                                                                  \
-    int i;                                                         \
-    VReg temp;                                                     \
-    VReg *Vd = (VReg *)vd;                                         \
-    VReg *Vj = (VReg *)vj;                                         \
-    typedef __typeof(temp.E1(0)) TD;                               \
-                                                                   \
-    temp.D(0) = 0;                                                 \
-    temp.D(1) = 0;                                                 \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
-        temp.E1(i) = (TD)Vj->E2(i) << (imm % BIT);                 \
-    }                                                              \
-    *Vd = temp;                                                    \
+#define VSLLWIL(NAME, BIT, E1, E2)                                             \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)             \
+{                                                                              \
+    int i, j, ofs;                                                             \
+    VReg temp = {};                                                            \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
+    int oprsz = simd_oprsz(desc);                                              \
+    typedef __typeof(temp.E1(0)) TD;                                           \
+                                                                               \
+    ofs = LSX_LEN / BIT;                                                       \
+    for (i = 0; i < oprsz / 16; i++) {                                         \
+        for (j = 0; j < ofs; j++) {                                            \
+            temp.E1(j + ofs * i) = (TD)Vj->E2(j + ofs * 2 * i) << (imm % BIT); \
+        }                                                                      \
+    }                                                                          \
+    *Vd = temp;                                                                \
 }
 
 void HELPER(vextl_q_d)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_makes64(Vj->D(0));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_makes64(Vj->D(2 * i));
+    }
 }
 
 void HELPER(vextl_qu_du)(void *vd, void *vj, uint32_t desc)
 {
+    int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    Vd->Q(0) = int128_make64(Vj->D(0));
+    for (i = 0; i < oprsz / 16; i++) {
+        Vd->Q(i) = int128_make64(Vj->UD(2 * i));
+    }
 }
 
 VSLLWIL(vsllwil_h_b, 16, H, B)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index d13dfacebf..eef6f28338 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -400,6 +400,15 @@ TRANS(xvrotri_h, LASX, gvec_vv_i, 32, MO_16, tcg_gen_gvec_rotri)
 TRANS(xvrotri_w, LASX, gvec_vv_i, 32, MO_32, tcg_gen_gvec_rotri)
 TRANS(xvrotri_d, LASX, gvec_vv_i, 32, MO_64, tcg_gen_gvec_rotri)
 
+TRANS(xvsllwil_h_b, LASX, gen_vv_i, 32, gen_helper_vsllwil_h_b)
+TRANS(xvsllwil_w_h, LASX, gen_vv_i, 32, gen_helper_vsllwil_w_h)
+TRANS(xvsllwil_d_w, LASX, gen_vv_i, 32, gen_helper_vsllwil_d_w)
+TRANS(xvextl_q_d, LASX, gen_vv, 32, gen_helper_vextl_q_d)
+TRANS(xvsllwil_hu_bu, LASX, gen_vv_i, 32, gen_helper_vsllwil_hu_bu)
+TRANS(xvsllwil_wu_hu, LASX, gen_vv_i, 32, gen_helper_vsllwil_wu_hu)
+TRANS(xvsllwil_du_wu, LASX, gen_vv_i, 32, gen_helper_vsllwil_du_wu)
+TRANS(xvextl_qu_du, LASX, gen_vv, 32, gen_helper_vextl_qu_du)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 28/48] target/loongarch: Implement xvsllwil xvextl
  2023-08-30  8:48 ` [PATCH v4 28/48] target/loongarch: Implement xvsllwil xvextl Song Gao
@ 2023-08-30 22:52   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:52 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVSLLWIL.{H.B/W.H/D.W};
> - XVSLLWIL.{HU.BU/WU.HU/DU.WU};
> - XVEXTL.Q.D, VEXTL.QU.DU.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  9 ++++
>   target/loongarch/disas.c                     |  9 ++++
>   target/loongarch/vec_helper.c                | 44 ++++++++++++--------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  9 ++++
>   4 files changed, 54 insertions(+), 17 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 29/48] target/loongarch: Implement xvsrlr xvsrar
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (27 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 28/48] target/loongarch: Implement xvsllwil xvextl Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:54   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 30/48] target/loongarch: Implement xvsrln xvsran Song Gao
                   ` (18 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSRLR[I].{B/H/W/D};
- XVSRAR[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                | 17 +++++++++++++++++
 target/loongarch/disas.c                     | 18 ++++++++++++++++++
 target/loongarch/vec_helper.c                | 12 ++++++++----
 target/loongarch/insn_trans/trans_lasx.c.inc | 18 ++++++++++++++++++
 4 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 8a7933eccc..ca0951e1cc 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1661,6 +1661,23 @@ xvsllwil_wu_hu   0111 01110000 11000 1 .... ..... .....   @vv_ui4
 xvsllwil_du_wu   0111 01110000 11001 ..... ..... .....    @vv_ui5
 xvextl_qu_du     0111 01110000 11010 00000 ..... .....    @vv
 
+xvsrlr_b         0111 01001111 00000 ..... ..... .....    @vvv
+xvsrlr_h         0111 01001111 00001 ..... ..... .....    @vvv
+xvsrlr_w         0111 01001111 00010 ..... ..... .....    @vvv
+xvsrlr_d         0111 01001111 00011 ..... ..... .....    @vvv
+xvsrlri_b        0111 01101010 01000 01 ... ..... .....   @vv_ui3
+xvsrlri_h        0111 01101010 01000 1 .... ..... .....   @vv_ui4
+xvsrlri_w        0111 01101010 01001 ..... ..... .....    @vv_ui5
+xvsrlri_d        0111 01101010 0101 ...... ..... .....    @vv_ui6
+xvsrar_b         0111 01001111 00100 ..... ..... .....    @vvv
+xvsrar_h         0111 01001111 00101 ..... ..... .....    @vvv
+xvsrar_w         0111 01001111 00110 ..... ..... .....    @vvv
+xvsrar_d         0111 01001111 00111 ..... ..... .....    @vvv
+xvsrari_b        0111 01101010 10000 01 ... ..... .....   @vv_ui3
+xvsrari_h        0111 01101010 10000 1 .... ..... .....   @vv_ui4
+xvsrari_w        0111 01101010 10001 ..... ..... .....    @vv_ui5
+xvsrari_d        0111 01101010 1001 ...... ..... .....    @vv_ui6
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 93c205fa32..9109203a05 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2086,6 +2086,24 @@ INSN_LASX(xvsllwil_wu_hu,    vv_i)
 INSN_LASX(xvsllwil_du_wu,    vv_i)
 INSN_LASX(xvextl_qu_du,      vv)
 
+INSN_LASX(xvsrlr_b,          vvv)
+INSN_LASX(xvsrlr_h,          vvv)
+INSN_LASX(xvsrlr_w,          vvv)
+INSN_LASX(xvsrlr_d,          vvv)
+INSN_LASX(xvsrlri_b,         vv_i)
+INSN_LASX(xvsrlri_h,         vv_i)
+INSN_LASX(xvsrlri_w,         vv_i)
+INSN_LASX(xvsrlri_d,         vv_i)
+
+INSN_LASX(xvsrar_b,          vvv)
+INSN_LASX(xvsrar_h,          vvv)
+INSN_LASX(xvsrar_w,          vvv)
+INSN_LASX(xvsrar_d,          vvv)
+INSN_LASX(xvsrari_b,         vv_i)
+INSN_LASX(xvsrari_h,         vv_i)
+INSN_LASX(xvsrari_w,         vv_i)
+INSN_LASX(xvsrari_d,         vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 7fe9f9f34e..12a2b2a9e6 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -997,8 +997,9 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)  \
     VReg *Vd = (VReg *)vd;                                      \
     VReg *Vj = (VReg *)vj;                                      \
     VReg *Vk = (VReg *)vk;                                      \
+    int oprsz = simd_oprsz(desc);                               \
                                                                 \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                   \
         Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
     }                                                           \
 }
@@ -1014,8 +1015,9 @@ void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
     int i;                                                         \
     VReg *Vd = (VReg *)vd;                                         \
     VReg *Vj = (VReg *)vj;                                         \
+    int oprsz = simd_oprsz(desc);                                  \
                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
         Vd->E(i) = do_vsrlr_ ## E(Vj->E(i), imm);                  \
     }                                                              \
 }
@@ -1047,8 +1049,9 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)  \
     VReg *Vd = (VReg *)vd;                                      \
     VReg *Vj = (VReg *)vj;                                      \
     VReg *Vk = (VReg *)vk;                                      \
+    int oprsz = simd_oprsz(desc);                               \
                                                                 \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                   \
         Vd->E(i) = do_vsrar_ ## E(Vj->E(i), ((T)Vk->E(i))%BIT); \
     }                                                           \
 }
@@ -1064,8 +1067,9 @@ void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
     int i;                                                         \
     VReg *Vd = (VReg *)vd;                                         \
     VReg *Vj = (VReg *)vj;                                         \
+    int oprsz = simd_oprsz(desc);                                  \
                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
         Vd->E(i) = do_vsrar_ ## E(Vj->E(i), imm);                  \
     }                                                              \
 }
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index eef6f28338..4a92df2cd9 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -409,6 +409,24 @@ TRANS(xvsllwil_wu_hu, LASX, gen_vv_i, 32, gen_helper_vsllwil_wu_hu)
 TRANS(xvsllwil_du_wu, LASX, gen_vv_i, 32, gen_helper_vsllwil_du_wu)
 TRANS(xvextl_qu_du, LASX, gen_vv, 32, gen_helper_vextl_qu_du)
 
+TRANS(xvsrlr_b, LASX, gen_vvv, 32, gen_helper_vsrlr_b)
+TRANS(xvsrlr_h, LASX, gen_vvv, 32, gen_helper_vsrlr_h)
+TRANS(xvsrlr_w, LASX, gen_vvv, 32, gen_helper_vsrlr_w)
+TRANS(xvsrlr_d, LASX, gen_vvv, 32, gen_helper_vsrlr_d)
+TRANS(xvsrlri_b, LASX, gen_vv_i, 32, gen_helper_vsrlri_b)
+TRANS(xvsrlri_h, LASX, gen_vv_i, 32, gen_helper_vsrlri_h)
+TRANS(xvsrlri_w, LASX, gen_vv_i, 32, gen_helper_vsrlri_w)
+TRANS(xvsrlri_d, LASX, gen_vv_i, 32, gen_helper_vsrlri_d)
+
+TRANS(xvsrar_b, LASX, gen_vvv, 32, gen_helper_vsrar_b)
+TRANS(xvsrar_h, LASX, gen_vvv, 32, gen_helper_vsrar_h)
+TRANS(xvsrar_w, LASX, gen_vvv, 32, gen_helper_vsrar_w)
+TRANS(xvsrar_d, LASX, gen_vvv, 32, gen_helper_vsrar_d)
+TRANS(xvsrari_b, LASX, gen_vv_i, 32, gen_helper_vsrari_b)
+TRANS(xvsrari_h, LASX, gen_vv_i, 32, gen_helper_vsrari_h)
+TRANS(xvsrari_w, LASX, gen_vv_i, 32, gen_helper_vsrari_w)
+TRANS(xvsrari_d, LASX, gen_vv_i, 32, gen_helper_vsrari_d)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 29/48] target/loongarch: Implement xvsrlr xvsrar
  2023-08-30  8:48 ` [PATCH v4 29/48] target/loongarch: Implement xvsrlr xvsrar Song Gao
@ 2023-08-30 22:54   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:54 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVSRLR[I].{B/H/W/D};
> - XVSRAR[I].{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                | 17 +++++++++++++++++
>   target/loongarch/disas.c                     | 18 ++++++++++++++++++
>   target/loongarch/vec_helper.c                | 12 ++++++++----
>   target/loongarch/insn_trans/trans_lasx.c.inc | 18 ++++++++++++++++++
>   4 files changed, 61 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 30/48] target/loongarch: Implement xvsrln xvsran
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (28 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 29/48] target/loongarch: Implement xvsrlr xvsrar Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 22:57   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 31/48] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
                   ` (17 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSRLN.{B.H/H.W/W.D};
- XVSRAN.{B.H/H.W/W.D};
- XVSRLNI.{B.H/H.W/W.D/D.Q};
- XVSRANI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |   2 +
 target/loongarch/insns.decode                |  16 ++
 target/loongarch/disas.c                     |  16 ++
 target/loongarch/vec_helper.c                | 168 ++++++++++---------
 target/loongarch/insn_trans/trans_lasx.c.inc |  16 ++
 5 files changed, 141 insertions(+), 77 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 681afd842f..67d829f9da 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -74,4 +74,6 @@
 
 #define DO_SIGNCOV(a, b)  (a == 0 ? 0 : a < 0 ? -b : b)
 
+#define R_SHIFT(a, b) (a >> b)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ca0951e1cc..204dcfa075 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1678,6 +1678,22 @@ xvsrari_h        0111 01101010 10000 1 .... ..... .....   @vv_ui4
 xvsrari_w        0111 01101010 10001 ..... ..... .....    @vv_ui5
 xvsrari_d        0111 01101010 1001 ...... ..... .....    @vv_ui6
 
+xvsrln_b_h       0111 01001111 01001 ..... ..... .....    @vvv
+xvsrln_h_w       0111 01001111 01010 ..... ..... .....    @vvv
+xvsrln_w_d       0111 01001111 01011 ..... ..... .....    @vvv
+xvsran_b_h       0111 01001111 01101 ..... ..... .....    @vvv
+xvsran_h_w       0111 01001111 01110 ..... ..... .....    @vvv
+xvsran_w_d       0111 01001111 01111 ..... ..... .....    @vvv
+
+xvsrlni_b_h      0111 01110100 00000 1 .... ..... .....   @vv_ui4
+xvsrlni_h_w      0111 01110100 00001 ..... ..... .....    @vv_ui5
+xvsrlni_w_d      0111 01110100 0001 ...... ..... .....    @vv_ui6
+xvsrlni_d_q      0111 01110100 001 ....... ..... .....    @vv_ui7
+xvsrani_b_h      0111 01110101 10000 1 .... ..... .....   @vv_ui4
+xvsrani_h_w      0111 01110101 10001 ..... ..... .....    @vv_ui5
+xvsrani_w_d      0111 01110101 1001 ...... ..... .....    @vv_ui6
+xvsrani_d_q      0111 01110101 101 ....... ..... .....    @vv_ui7
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 9109203a05..14b526abd6 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2104,6 +2104,22 @@ INSN_LASX(xvsrari_h,         vv_i)
 INSN_LASX(xvsrari_w,         vv_i)
 INSN_LASX(xvsrari_d,         vv_i)
 
+INSN_LASX(xvsrln_b_h,        vvv)
+INSN_LASX(xvsrln_h_w,        vvv)
+INSN_LASX(xvsrln_w_d,        vvv)
+INSN_LASX(xvsran_b_h,        vvv)
+INSN_LASX(xvsran_h_w,        vvv)
+INSN_LASX(xvsran_w_d,        vvv)
+
+INSN_LASX(xvsrlni_b_h,       vv_i)
+INSN_LASX(xvsrlni_h_w,       vv_i)
+INSN_LASX(xvsrlni_w_d,       vv_i)
+INSN_LASX(xvsrlni_d_q,       vv_i)
+INSN_LASX(xvsrani_b_h,       vv_i)
+INSN_LASX(xvsrani_h_w,       vv_i)
+INSN_LASX(xvsrani_w_d,       vv_i)
+INSN_LASX(xvsrani_d_q,       vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 12a2b2a9e6..bcfa7b9530 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -1079,107 +1079,121 @@ VSRARI(vsrari_h, 16, H)
 VSRARI(vsrari_w, 32, W)
 VSRARI(vsrari_d, 64, D)
 
-#define R_SHIFT(a, b) (a >> b)
-
-#define VSRLN(NAME, BIT, T, E1, E2)                             \
-void HELPER(NAME)(void *vd, void *v, void *vk, uint32_t desc)   \
-{                                                               \
-    int i;                                                      \
-    VReg *Vd = (VReg *)vd;                                      \
-    VReg *Vj = (VReg *)vj;                                      \
-    VReg *Vk = (VReg *)vk;                                      \
-                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
-        Vd->E1(i) = R_SHIFT((T)Vj->E2(i),((T)Vk->E2(i)) % BIT); \
-    }                                                           \
-    Vd->D(1) = 0;                                               \
+#define VSRLN(NAME, BIT, E1, E2)                                          \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / BIT;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            Vd->E1(j + ofs * 2 * i) = R_SHIFT(Vj->E2(j + ofs * i),        \
+                                              Vk->E2(j + ofs * i) % BIT); \
+        }                                                                 \
+        Vd->D(2 * i + 1) = 0;                                             \
+    }                                                                     \
 }
 
-VSRLN(vsrln_b_h, 16, uint16_t, B, H)
-VSRLN(vsrln_h_w, 32, uint32_t, H, W)
-VSRLN(vsrln_w_d, 64, uint64_t, W, D)
+VSRLN(vsrln_b_h, 16, B, UH)
+VSRLN(vsrln_h_w, 32, H, UW)
+VSRLN(vsrln_w_d, 64, W, UD)
 
-#define VSRAN(NAME, BIT, T, E1, E2)                            \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{                                                              \
-    int i;                                                     \
-    VReg *Vd = (VReg *)vd;                                     \
-    VReg *Vj = (VReg *)vj;                                     \
-    VReg *Vk = (VReg *)vk;                                     \
-                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
-        Vd->E1(i) = R_SHIFT(Vj->E2(i), ((T)Vk->E2(i)) % BIT);  \
-    }                                                          \
-    Vd->D(1) = 0;                                              \
+#define VSRAN(NAME, BIT, E1, E2, E3)                                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / BIT;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            Vd->E1(j + ofs * 2 * i) = R_SHIFT(Vj->E2(j + ofs * i),        \
+                                              Vk->E3(j + ofs * i) % BIT); \
+        }                                                                 \
+        Vd->D(2 * i + 1) = 0;                                             \
+    }                                                                     \
 }
 
-VSRAN(vsran_b_h, 16, uint16_t, B, H)
-VSRAN(vsran_h_w, 32, uint32_t, H, W)
-VSRAN(vsran_w_d, 64, uint64_t, W, D)
+VSRAN(vsran_b_h, 16, B, H, UH)
+VSRAN(vsran_h_w, 32, H, W, UW)
+VSRAN(vsran_w_d, 64, W, D, UD)
 
-#define VSRLNI(NAME, BIT, T, E1, E2)                               \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
-{                                                                  \
-    int i, max;                                                    \
-    VReg temp;                                                     \
-    VReg *Vd = (VReg *)vd;                                         \
-    VReg *Vj = (VReg *)vj;                                         \
-                                                                   \
-    temp.D(0) = 0;                                                 \
-    temp.D(1) = 0;                                                 \
-    max = LSX_LEN/BIT;                                             \
-    for (i = 0; i < max; i++) {                                    \
-        temp.E1(i) = R_SHIFT((T)Vj->E2(i), imm);                   \
-        temp.E1(i + max) = R_SHIFT((T)Vd->E2(i), imm);             \
-    }                                                              \
-    *Vd = temp;                                                    \
+#define VSRLNI(NAME, BIT, E1, E2)                                         \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)        \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg temp = {};                                                       \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / BIT;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            temp.E1(j + ofs * 2 * i) = R_SHIFT(Vj->E2(j + ofs * i), imm); \
+            temp.E1(j + ofs * (2 * i + 1)) = R_SHIFT(Vd->E2(j + ofs * i), \
+                                                     imm);                \
+        }                                                                 \
+    }                                                                     \
+    *Vd = temp;                                                           \
 }
 
 void HELPER(vsrlni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    VReg temp;
+    int i;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
 
-    temp.D(0) = 0;
-    temp.D(1) = 0;
-    temp.D(0) = int128_getlo(int128_urshift(Vj->Q(0), imm % 128));
-    temp.D(1) = int128_getlo(int128_urshift(Vd->Q(0), imm % 128));
+    for (i = 0; i < 2; i++) {
+        temp.D(2 * i) = int128_getlo(int128_urshift(Vj->Q(i), imm % 128));
+        temp.D(2 * i +1) = int128_getlo(int128_urshift(Vd->Q(i), imm % 128));
+    }
     *Vd = temp;
 }
 
-VSRLNI(vsrlni_b_h, 16, uint16_t, B, H)
-VSRLNI(vsrlni_h_w, 32, uint32_t, H, W)
-VSRLNI(vsrlni_w_d, 64, uint64_t, W, D)
+VSRLNI(vsrlni_b_h, 16, B, UH)
+VSRLNI(vsrlni_h_w, 32, H, UW)
+VSRLNI(vsrlni_w_d, 64, W, UD)
 
-#define VSRANI(NAME, BIT, E1, E2)                                  \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
-{                                                                  \
-    int i, max;                                                    \
-    VReg temp;                                                     \
-    VReg *Vd = (VReg *)vd;                                         \
-    VReg *Vj = (VReg *)vj;                                         \
-                                                                   \
-    temp.D(0) = 0;                                                 \
-    temp.D(1) = 0;                                                 \
-    max = LSX_LEN/BIT;                                             \
-    for (i = 0; i < max; i++) {                                    \
-        temp.E1(i) = R_SHIFT(Vj->E2(i), imm);                      \
-        temp.E1(i + max) = R_SHIFT(Vd->E2(i), imm);                \
-    }                                                              \
-    *Vd = temp;                                                    \
+#define VSRANI(NAME, BIT, E1, E2)                                         \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)        \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg temp = {};                                                       \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / BIT;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            temp.E1(j + ofs * 2 * i) = R_SHIFT(Vj->E2(j + ofs * i), imm); \
+            temp.E1(j + ofs * (2 * i + 1)) = R_SHIFT(Vd->E2(j + ofs * i), \
+                                                     imm);                \
+        }                                                                 \
+    }                                                                     \
+    *Vd = temp;                                                           \
 }
 
 void HELPER(vsrani_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    VReg temp;
+    int i;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
 
-    temp.D(0) = 0;
-    temp.D(1) = 0;
-    temp.D(0) = int128_getlo(int128_rshift(Vj->Q(0), imm % 128));
-    temp.D(1) = int128_getlo(int128_rshift(Vd->Q(0), imm % 128));
+    for (i = 0; i < 2; i++) {
+        temp.D(2 * i) = int128_getlo(int128_rshift(Vj->Q(i), imm % 128));
+        temp.D(2 * i + 1) = int128_getlo(int128_rshift(Vd->Q(i), imm % 128));
+    }
     *Vd = temp;
 }
 
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 4a92df2cd9..a420e8dfc9 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -427,6 +427,22 @@ TRANS(xvsrari_h, LASX, gen_vv_i, 32, gen_helper_vsrari_h)
 TRANS(xvsrari_w, LASX, gen_vv_i, 32, gen_helper_vsrari_w)
 TRANS(xvsrari_d, LASX, gen_vv_i, 32, gen_helper_vsrari_d)
 
+TRANS(xvsrln_b_h, LASX, gen_vvv, 32, gen_helper_vsrln_b_h)
+TRANS(xvsrln_h_w, LASX, gen_vvv, 32, gen_helper_vsrln_h_w)
+TRANS(xvsrln_w_d, LASX, gen_vvv, 32, gen_helper_vsrln_w_d)
+TRANS(xvsran_b_h, LASX, gen_vvv, 32, gen_helper_vsran_b_h)
+TRANS(xvsran_h_w, LASX, gen_vvv, 32, gen_helper_vsran_h_w)
+TRANS(xvsran_w_d, LASX, gen_vvv, 32, gen_helper_vsran_w_d)
+
+TRANS(xvsrlni_b_h, LASX, gen_vv_i, 32, gen_helper_vsrlni_b_h)
+TRANS(xvsrlni_h_w, LASX, gen_vv_i, 32, gen_helper_vsrlni_h_w)
+TRANS(xvsrlni_w_d, LASX, gen_vv_i, 32, gen_helper_vsrlni_w_d)
+TRANS(xvsrlni_d_q, LASX, gen_vv_i, 32, gen_helper_vsrlni_d_q)
+TRANS(xvsrani_b_h, LASX, gen_vv_i, 32, gen_helper_vsrani_b_h)
+TRANS(xvsrani_h_w, LASX, gen_vv_i, 32, gen_helper_vsrani_h_w)
+TRANS(xvsrani_w_d, LASX, gen_vv_i, 32, gen_helper_vsrani_w_d)
+TRANS(xvsrani_d_q, LASX, gen_vv_i, 32, gen_helper_vsrani_d_q)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 30/48] target/loongarch: Implement xvsrln xvsran
  2023-08-30  8:48 ` [PATCH v4 30/48] target/loongarch: Implement xvsrln xvsran Song Gao
@ 2023-08-30 22:57   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 22:57 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVSRLN.{B.H/H.W/W.D};
> - XVSRAN.{B.H/H.W/W.D};
> - XVSRLNI.{B.H/H.W/W.D/D.Q};
> - XVSRANI.{B.H/H.W/W.D/D.Q}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |   2 +
>   target/loongarch/insns.decode                |  16 ++
>   target/loongarch/disas.c                     |  16 ++
>   target/loongarch/vec_helper.c                | 168 ++++++++++---------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  16 ++
>   5 files changed, 141 insertions(+), 77 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 31/48] target/loongarch: Implement xvsrlrn xvsrarn
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (29 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 30/48] target/loongarch: Implement xvsrln xvsran Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:00   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 32/48] target/loongarch: Implement xvssrln xvssran Song Gao
                   ` (16 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSRLRN.{B.H/H.W/W.D};
- XVSRARN.{B.H/H.W/W.D};
- XVSRLRNI.{B.H/H.W/W.D/D.Q};
- XVSRARNI.{B.H/H.W/W.D/D.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  16 ++
 target/loongarch/disas.c                     |  16 ++
 target/loongarch/vec_helper.c                | 198 +++++++++++--------
 target/loongarch/insn_trans/trans_lasx.c.inc |  16 ++
 4 files changed, 161 insertions(+), 85 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 204dcfa075..d7c50b14ca 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1694,6 +1694,22 @@ xvsrani_h_w      0111 01110101 10001 ..... ..... .....    @vv_ui5
 xvsrani_w_d      0111 01110101 1001 ...... ..... .....    @vv_ui6
 xvsrani_d_q      0111 01110101 101 ....... ..... .....    @vv_ui7
 
+xvsrlrn_b_h      0111 01001111 10001 ..... ..... .....    @vvv
+xvsrlrn_h_w      0111 01001111 10010 ..... ..... .....    @vvv
+xvsrlrn_w_d      0111 01001111 10011 ..... ..... .....    @vvv
+xvsrarn_b_h      0111 01001111 10101 ..... ..... .....    @vvv
+xvsrarn_h_w      0111 01001111 10110 ..... ..... .....    @vvv
+xvsrarn_w_d      0111 01001111 10111 ..... ..... .....    @vvv
+
+xvsrlrni_b_h     0111 01110100 01000 1 .... ..... .....   @vv_ui4
+xvsrlrni_h_w     0111 01110100 01001 ..... ..... .....    @vv_ui5
+xvsrlrni_w_d     0111 01110100 0101 ...... ..... .....    @vv_ui6
+xvsrlrni_d_q     0111 01110100 011 ....... ..... .....    @vv_ui7
+xvsrarni_b_h     0111 01110101 11000 1 .... ..... .....   @vv_ui4
+xvsrarni_h_w     0111 01110101 11001 ..... ..... .....    @vv_ui5
+xvsrarni_w_d     0111 01110101 1101 ...... ..... .....    @vv_ui6
+xvsrarni_d_q     0111 01110101 111 ....... ..... .....    @vv_ui7
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 14b526abd6..04b6ea713d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2120,6 +2120,22 @@ INSN_LASX(xvsrani_h_w,       vv_i)
 INSN_LASX(xvsrani_w_d,       vv_i)
 INSN_LASX(xvsrani_d_q,       vv_i)
 
+INSN_LASX(xvsrlrn_b_h,       vvv)
+INSN_LASX(xvsrlrn_h_w,       vvv)
+INSN_LASX(xvsrlrn_w_d,       vvv)
+INSN_LASX(xvsrarn_b_h,       vvv)
+INSN_LASX(xvsrarn_h_w,       vvv)
+INSN_LASX(xvsrarn_w_d,       vvv)
+
+INSN_LASX(xvsrlrni_b_h,      vv_i)
+INSN_LASX(xvsrlrni_h_w,      vv_i)
+INSN_LASX(xvsrlrni_w_d,      vv_i)
+INSN_LASX(xvsrlrni_d_q,      vv_i)
+INSN_LASX(xvsrarni_b_h,      vv_i)
+INSN_LASX(xvsrarni_h_w,      vv_i)
+INSN_LASX(xvsrarni_w_d,      vv_i)
+INSN_LASX(xvsrarni_d_q,      vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index bcfa7b9530..d4f2091656 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -1201,76 +1201,95 @@ VSRANI(vsrani_b_h, 16, B, H)
 VSRANI(vsrani_h_w, 32, H, W)
 VSRANI(vsrani_w_d, 64, W, D)
 
-#define VSRLRN(NAME, BIT, T, E1, E2)                                \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)      \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = (VReg *)vd;                                          \
-    VReg *Vj = (VReg *)vj;                                          \
-    VReg *Vk = (VReg *)vk;                                          \
-                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
-        Vd->E1(i) = do_vsrlr_ ## E2(Vj->E2(i), ((T)Vk->E2(i))%BIT); \
-    }                                                               \
-    Vd->D(1) = 0;                                                   \
+#define VSRLRN(NAME, BIT, E1, E2, E3)                                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)             \
+{                                                                          \
+    int i, j, ofs;                                                         \
+    VReg *Vd = (VReg *)vd;                                                 \
+    VReg *Vj = (VReg *)vj;                                                 \
+    VReg *Vk = (VReg *)vk;                                                 \
+    int oprsz = simd_oprsz(desc);                                          \
+                                                                           \
+    ofs = LSX_LEN / BIT;                                                   \
+    for (i = 0; i < oprsz / 16; i++) {                                     \
+        for (j = 0; j < ofs; j++) {                                        \
+            Vd->E1(j + ofs * 2 * i) = do_vsrlr_ ##E2(Vj->E2(j + ofs * i),  \
+                                               Vk->E3(j + ofs * i) % BIT); \
+        }                                                                  \
+        Vd->D(2 * i + 1) = 0;                                              \
+    }                                                                      \
 }
 
-VSRLRN(vsrlrn_b_h, 16, uint16_t, B, H)
-VSRLRN(vsrlrn_h_w, 32, uint32_t, H, W)
-VSRLRN(vsrlrn_w_d, 64, uint64_t, W, D)
+VSRLRN(vsrlrn_b_h, 16, B, H, UH)
+VSRLRN(vsrlrn_h_w, 32, H, W, UW)
+VSRLRN(vsrlrn_w_d, 64, W, D, UD)
 
-#define VSRARN(NAME, BIT, T, E1, E2)                                \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)      \
-{                                                                   \
-    int i;                                                          \
-    VReg *Vd = (VReg *)vd;                                          \
-    VReg *Vj = (VReg *)vj;                                          \
-    VReg *Vk = (VReg *)vk;                                          \
-                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                             \
-        Vd->E1(i) = do_vsrar_ ## E2(Vj->E2(i), ((T)Vk->E2(i))%BIT); \
-    }                                                               \
-    Vd->D(1) = 0;                                                   \
+#define VSRARN(NAME, BIT, E1, E2, E3)                                       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)              \
+{                                                                           \
+    int i, j, ofs;                                                          \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
+    VReg *Vk = (VReg *)vk;                                                  \
+    int oprsz = simd_oprsz(desc);                                           \
+                                                                            \
+    ofs = LSX_LEN / BIT;                                                    \
+    for (i = 0; i < oprsz / 16; i++) {                                      \
+        for (j = 0; j < ofs; j++) {                                         \
+            Vd->E1(j + ofs * 2 * i) = do_vsrar_ ## E2(Vj->E2(j + ofs * i),  \
+                                                Vk->E3(j + ofs * i) % BIT); \
+        }                                                                   \
+        Vd->D(2 * i + 1) = 0;                                               \
+    }                                                                       \
 }
 
-VSRARN(vsrarn_b_h, 16, uint8_t,  B, H)
-VSRARN(vsrarn_h_w, 32, uint16_t, H, W)
-VSRARN(vsrarn_w_d, 64, uint32_t, W, D)
-
-#define VSRLRNI(NAME, BIT, E1, E2)                                 \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
-{                                                                  \
-    int i, max;                                                    \
-    VReg temp;                                                     \
-    VReg *Vd = (VReg *)vd;                                         \
-    VReg *Vj = (VReg *)vj;                                         \
-                                                                   \
-    temp.D(0) = 0;                                                 \
-    temp.D(1) = 0;                                                 \
-    max = LSX_LEN/BIT;                                             \
-    for (i = 0; i < max; i++) {                                    \
-        temp.E1(i) = do_vsrlr_ ## E2(Vj->E2(i), imm);              \
-        temp.E1(i + max) = do_vsrlr_ ## E2(Vd->E2(i), imm);        \
-    }                                                              \
-    *Vd = temp;                                                    \
+VSRARN(vsrarn_b_h, 16, B, H, UH)
+VSRARN(vsrarn_h_w, 32, H, W, UW)
+VSRARN(vsrarn_w_d, 64, W, D, UD)
+
+#define VSRLRNI(NAME, BIT, E1, E2)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                \
+{                                                                                 \
+    int i, j, ofs;                                                                \
+    VReg temp = {};                                                               \
+    VReg *Vd = (VReg *)vd;                                                        \
+    VReg *Vj = (VReg *)vj;                                                        \
+    int oprsz = simd_oprsz(desc);                                                 \
+                                                                                  \
+    ofs = LSX_LEN / BIT;                                                          \
+    for (i = 0; i < oprsz / 16; i++) {                                            \
+        for (j = 0; j < ofs; j++) {                                               \
+            temp.E1(j + ofs * 2 * i) = do_vsrlr_ ## E2(Vj->E2(j + ofs * i), imm); \
+            temp.E1(j + ofs * (2 * i + 1)) = do_vsrlr_ ## E2(Vd->E2(j + ofs * i), \
+                                                                 imm);            \
+        }                                                                         \
+    }                                                                             \
+    *Vd = temp;                                                                   \
 }
 
 void HELPER(vsrlrni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    VReg temp;
+    int i;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
-    Int128 r1, r2;
-
-    if (imm == 0) {
-        temp.D(0) = int128_getlo(Vj->Q(0));
-        temp.D(1) = int128_getlo(Vd->Q(0));
-    } else {
-        r1 = int128_and(int128_urshift(Vj->Q(0), (imm -1)), int128_one());
-        r2 = int128_and(int128_urshift(Vd->Q(0), (imm -1)), int128_one());
+    Int128 r[4];
+    int oprsz = simd_oprsz(desc);
 
-       temp.D(0) = int128_getlo(int128_add(int128_urshift(Vj->Q(0), imm), r1));
-       temp.D(1) = int128_getlo(int128_add(int128_urshift(Vd->Q(0), imm), r2));
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            temp.D(2 * i) = int128_getlo(Vj->Q(i));
+            temp.D(2 * i + 1) = int128_getlo(Vd->Q(i));
+        } else {
+            r[2 * i] = int128_and(int128_urshift(Vj->Q(i), (imm - 1)),
+                                  int128_one());
+            r[2 * i + 1] = int128_and(int128_urshift(Vd->Q(i), (imm - 1)),
+                                      int128_one());
+            temp.D(2 * i) = int128_getlo(int128_add(int128_urshift(Vj->Q(i),
+                                                    imm), r[2 * i]));
+            temp.D(2 * i + 1) = int128_getlo(int128_add(int128_urshift(Vd->Q(i),
+                                                        imm), r[ 2 * i + 1]));
+        }
     }
     *Vd = temp;
 }
@@ -1279,40 +1298,49 @@ VSRLRNI(vsrlrni_b_h, 16, B, H)
 VSRLRNI(vsrlrni_h_w, 32, H, W)
 VSRLRNI(vsrlrni_w_d, 64, W, D)
 
-#define VSRARNI(NAME, BIT, E1, E2)                                 \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
-{                                                                  \
-    int i, max;                                                    \
-    VReg temp;                                                     \
-    VReg *Vd = (VReg *)vd;                                         \
-    VReg *Vj = (VReg *)vj;                                         \
-                                                                   \
-    temp.D(0) = 0;                                                 \
-    temp.D(1) = 0;                                                 \
-    max = LSX_LEN/BIT;                                             \
-    for (i = 0; i < max; i++) {                                    \
-        temp.E1(i) = do_vsrar_ ## E2(Vj->E2(i), imm);              \
-        temp.E1(i + max) = do_vsrar_ ## E2(Vd->E2(i), imm);        \
-    }                                                              \
-    *Vd = temp;                                                    \
+#define VSRARNI(NAME, BIT, E1, E2)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                \
+{                                                                                 \
+    int i, j, ofs;                                                                \
+    VReg temp = {};                                                               \
+    VReg *Vd = (VReg *)vd;                                                        \
+    VReg *Vj = (VReg *)vj;                                                        \
+    int oprsz = simd_oprsz(desc);                                                 \
+                                                                                  \
+    ofs = LSX_LEN / BIT;                                                          \
+    for (i = 0; i < oprsz / 16; i++) {                                            \
+        for (j = 0; j < ofs; j++) {                                               \
+            temp.E1(j + ofs * 2 * i) = do_vsrar_ ## E2(Vj->E2(j + ofs * i), imm); \
+            temp.E1(j + ofs * (2 * i + 1)) = do_vsrar_ ## E2(Vd->E2(j + ofs * i), \
+                                                             imm);                \
+        }                                                                         \
+    }                                                                             \
+    *Vd = temp;                                                                   \
 }
 
 void HELPER(vsrarni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    VReg temp;
+    int i;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
-    Int128 r1, r2;
-
-    if (imm == 0) {
-        temp.D(0) = int128_getlo(Vj->Q(0));
-        temp.D(1) = int128_getlo(Vd->Q(0));
-    } else {
-        r1 = int128_and(int128_rshift(Vj->Q(0), (imm -1)), int128_one());
-        r2 = int128_and(int128_rshift(Vd->Q(0), (imm -1)), int128_one());
+    Int128 r[4];
+    int oprsz = simd_oprsz(desc);
 
-       temp.D(0) = int128_getlo(int128_add(int128_rshift(Vj->Q(0), imm), r1));
-       temp.D(1) = int128_getlo(int128_add(int128_rshift(Vd->Q(0), imm), r2));
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            temp.D(2 * i) = int128_getlo(Vj->Q(i));
+            temp.D(2 * i + 1) = int128_getlo(Vd->Q(i));
+        } else {
+            r[2 * i] = int128_and(int128_rshift(Vj->Q(i), (imm - 1)),
+                                  int128_one());
+            r[2 * i + 1] = int128_and(int128_rshift(Vd->Q(i), (imm - 1)),
+                                      int128_one());
+            temp.D(2 * i) = int128_getlo(int128_add(int128_rshift(Vj->Q(i),
+                                                    imm), r[2 * i]));
+            temp.D(2 * i + 1) = int128_getlo(int128_add(int128_rshift(Vd->Q(i),
+                                                        imm), r[2 * i + 1]));
+        }
     }
     *Vd = temp;
 }
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index a420e8dfc9..702a2f770d 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -443,6 +443,22 @@ TRANS(xvsrani_h_w, LASX, gen_vv_i, 32, gen_helper_vsrani_h_w)
 TRANS(xvsrani_w_d, LASX, gen_vv_i, 32, gen_helper_vsrani_w_d)
 TRANS(xvsrani_d_q, LASX, gen_vv_i, 32, gen_helper_vsrani_d_q)
 
+TRANS(xvsrlrn_b_h, LASX, gen_vvv, 32, gen_helper_vsrlrn_b_h)
+TRANS(xvsrlrn_h_w, LASX, gen_vvv, 32, gen_helper_vsrlrn_h_w)
+TRANS(xvsrlrn_w_d, LASX, gen_vvv, 32, gen_helper_vsrlrn_w_d)
+TRANS(xvsrarn_b_h, LASX, gen_vvv, 32, gen_helper_vsrarn_b_h)
+TRANS(xvsrarn_h_w, LASX, gen_vvv, 32, gen_helper_vsrarn_h_w)
+TRANS(xvsrarn_w_d, LASX, gen_vvv, 32, gen_helper_vsrarn_w_d)
+
+TRANS(xvsrlrni_b_h, LASX, gen_vv_i, 32, gen_helper_vsrlrni_b_h)
+TRANS(xvsrlrni_h_w, LASX, gen_vv_i, 32, gen_helper_vsrlrni_h_w)
+TRANS(xvsrlrni_w_d, LASX, gen_vv_i, 32, gen_helper_vsrlrni_w_d)
+TRANS(xvsrlrni_d_q, LASX, gen_vv_i, 32, gen_helper_vsrlrni_d_q)
+TRANS(xvsrarni_b_h, LASX, gen_vv_i, 32, gen_helper_vsrarni_b_h)
+TRANS(xvsrarni_h_w, LASX, gen_vv_i, 32, gen_helper_vsrarni_h_w)
+TRANS(xvsrarni_w_d, LASX, gen_vv_i, 32, gen_helper_vsrarni_w_d)
+TRANS(xvsrarni_d_q, LASX, gen_vv_i, 32, gen_helper_vsrarni_d_q)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 31/48] target/loongarch: Implement xvsrlrn xvsrarn
  2023-08-30  8:48 ` [PATCH v4 31/48] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
@ 2023-08-30 23:00   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:00 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVSRLRN.{B.H/H.W/W.D};
> - XVSRARN.{B.H/H.W/W.D};
> - XVSRLRNI.{B.H/H.W/W.D/D.Q};
> - XVSRARNI.{B.H/H.W/W.D/D.Q}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  16 ++
>   target/loongarch/disas.c                     |  16 ++
>   target/loongarch/vec_helper.c                | 198 +++++++++++--------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  16 ++
>   4 files changed, 161 insertions(+), 85 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 32/48] target/loongarch: Implement xvssrln xvssran
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (30 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 31/48] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:22   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 33/48] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
                   ` (15 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSSRLN.{B.H/H.W/W.D};
- XVSSRAN.{B.H/H.W/W.D};
- XVSSRLN.{BU.H/HU.W/WU.D};
- XVSSRAN.{BU.H/HU.W/WU.D};
- XVSSRLNI.{B.H/H.W/W.D/D.Q};
- XVSSRANI.{B.H/H.W/W.D/D.Q};
- XVSSRLNI.{BU.H/HU.W/WU.D/DU.Q};
- XVSSRANI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  30 ++
 target/loongarch/disas.c                     |  30 ++
 target/loongarch/vec_helper.c                | 451 ++++++++++---------
 target/loongarch/insn_trans/trans_lasx.c.inc |  30 ++
 4 files changed, 337 insertions(+), 204 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d7c50b14ca..022dd9bfd1 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1710,6 +1710,36 @@ xvsrarni_h_w     0111 01110101 11001 ..... ..... .....    @vv_ui5
 xvsrarni_w_d     0111 01110101 1101 ...... ..... .....    @vv_ui6
 xvsrarni_d_q     0111 01110101 111 ....... ..... .....    @vv_ui7
 
+xvssrln_b_h      0111 01001111 11001 ..... ..... .....    @vvv
+xvssrln_h_w      0111 01001111 11010 ..... ..... .....    @vvv
+xvssrln_w_d      0111 01001111 11011 ..... ..... .....    @vvv
+xvssran_b_h      0111 01001111 11101 ..... ..... .....    @vvv
+xvssran_h_w      0111 01001111 11110 ..... ..... .....    @vvv
+xvssran_w_d      0111 01001111 11111 ..... ..... .....    @vvv
+xvssrln_bu_h     0111 01010000 01001 ..... ..... .....    @vvv
+xvssrln_hu_w     0111 01010000 01010 ..... ..... .....    @vvv
+xvssrln_wu_d     0111 01010000 01011 ..... ..... .....    @vvv
+xvssran_bu_h     0111 01010000 01101 ..... ..... .....    @vvv
+xvssran_hu_w     0111 01010000 01110 ..... ..... .....    @vvv
+xvssran_wu_d     0111 01010000 01111 ..... ..... .....    @vvv
+
+xvssrlni_b_h     0111 01110100 10000 1 .... ..... .....   @vv_ui4
+xvssrlni_h_w     0111 01110100 10001 ..... ..... .....    @vv_ui5
+xvssrlni_w_d     0111 01110100 1001 ...... ..... .....    @vv_ui6
+xvssrlni_d_q     0111 01110100 101 ....... ..... .....    @vv_ui7
+xvssrani_b_h     0111 01110110 00000 1 .... ..... .....   @vv_ui4
+xvssrani_h_w     0111 01110110 00001 ..... ..... .....    @vv_ui5
+xvssrani_w_d     0111 01110110 0001 ...... ..... .....    @vv_ui6
+xvssrani_d_q     0111 01110110 001 ....... ..... .....    @vv_ui7
+xvssrlni_bu_h    0111 01110100 11000 1 .... ..... .....   @vv_ui4
+xvssrlni_hu_w    0111 01110100 11001 ..... ..... .....    @vv_ui5
+xvssrlni_wu_d    0111 01110100 1101 ...... ..... .....    @vv_ui6
+xvssrlni_du_q    0111 01110100 111 ....... ..... .....    @vv_ui7
+xvssrani_bu_h    0111 01110110 01000 1 .... ..... .....   @vv_ui4
+xvssrani_hu_w    0111 01110110 01001 ..... ..... .....    @vv_ui5
+xvssrani_wu_d    0111 01110110 0101 ...... ..... .....    @vv_ui6
+xvssrani_du_q    0111 01110110 011 ....... ..... .....    @vv_ui7
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 04b6ea713d..04e8d42044 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2136,6 +2136,36 @@ INSN_LASX(xvsrarni_h_w,      vv_i)
 INSN_LASX(xvsrarni_w_d,      vv_i)
 INSN_LASX(xvsrarni_d_q,      vv_i)
 
+INSN_LASX(xvssrln_b_h,       vvv)
+INSN_LASX(xvssrln_h_w,       vvv)
+INSN_LASX(xvssrln_w_d,       vvv)
+INSN_LASX(xvssran_b_h,       vvv)
+INSN_LASX(xvssran_h_w,       vvv)
+INSN_LASX(xvssran_w_d,       vvv)
+INSN_LASX(xvssrln_bu_h,      vvv)
+INSN_LASX(xvssrln_hu_w,      vvv)
+INSN_LASX(xvssrln_wu_d,      vvv)
+INSN_LASX(xvssran_bu_h,      vvv)
+INSN_LASX(xvssran_hu_w,      vvv)
+INSN_LASX(xvssran_wu_d,      vvv)
+
+INSN_LASX(xvssrlni_b_h,      vv_i)
+INSN_LASX(xvssrlni_h_w,      vv_i)
+INSN_LASX(xvssrlni_w_d,      vv_i)
+INSN_LASX(xvssrlni_d_q,      vv_i)
+INSN_LASX(xvssrani_b_h,      vv_i)
+INSN_LASX(xvssrani_h_w,      vv_i)
+INSN_LASX(xvssrani_w_d,      vv_i)
+INSN_LASX(xvssrani_d_q,      vv_i)
+INSN_LASX(xvssrlni_bu_h,     vv_i)
+INSN_LASX(xvssrlni_hu_w,     vv_i)
+INSN_LASX(xvssrlni_wu_d,     vv_i)
+INSN_LASX(xvssrlni_du_q,     vv_i)
+INSN_LASX(xvssrani_bu_h,     vv_i)
+INSN_LASX(xvssrani_hu_w,     vv_i)
+INSN_LASX(xvssrani_wu_d,     vv_i)
+INSN_LASX(xvssrani_du_q,     vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index d4f2091656..738bb452f6 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -1371,23 +1371,29 @@ SSRLNS(B, uint16_t, int16_t, uint8_t)
 SSRLNS(H, uint32_t, int32_t, uint16_t)
 SSRLNS(W, uint64_t, int64_t, uint32_t)
 
-#define VSSRLN(NAME, BIT, T, E1, E2)                                          \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
-{                                                                             \
-    int i;                                                                    \
-    VReg *Vd = (VReg *)vd;                                                    \
-    VReg *Vj = (VReg *)vj;                                                    \
-    VReg *Vk = (VReg *)vk;                                                    \
-                                                                              \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
-        Vd->E1(i) = do_ssrlns_ ## E1(Vj->E2(i), (T)Vk->E2(i)% BIT, BIT/2 -1); \
-    }                                                                         \
-    Vd->D(1) = 0;                                                             \
+#define VSSRLN(NAME, BIT, E1, E2, E3)                                       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)              \
+{                                                                           \
+    int i, j, ofs;                                                          \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
+    VReg *Vk = (VReg *)vk;                                                  \
+    int oprsz = simd_oprsz(desc);                                           \
+                                                                            \
+    ofs = LSX_LEN / BIT;                                                    \
+    for (i = 0; i < oprsz / 16; i++) {                                      \
+        for (j = 0; j < ofs; j++) {                                         \
+            Vd->E1(j + ofs * 2 * i) = do_ssrlns_ ## E1(Vj->E2(j + ofs * i), \
+                                                Vk->E3(j + ofs * i) % BIT,  \
+                                                BIT / 2 - 1);               \
+        }                                                                   \
+        Vd->D(2 * i + 1) = 0;                                               \
+    }                                                                       \
 }
 
-VSSRLN(vssrln_b_h, 16, uint16_t, B, H)
-VSSRLN(vssrln_h_w, 32, uint32_t, H, W)
-VSSRLN(vssrln_w_d, 64, uint64_t, W, D)
+VSSRLN(vssrln_b_h, 16, B, H, UH)
+VSSRLN(vssrln_h_w, 32, H, W, UW)
+VSSRLN(vssrln_w_d, 64, W, D, UD)
 
 #define SSRANS(E, T1, T2)                        \
 static T1 do_ssrans_ ## E(T1 e2, int sa, int sh) \
@@ -1399,10 +1405,10 @@ static T1 do_ssrans_ ## E(T1 e2, int sa, int sh) \
             shft_res = e2 >> sa;                 \
         }                                        \
         T2 mask;                                 \
-        mask = (1ll << sh) -1;                   \
+        mask = (1ll << sh) - 1;                  \
         if (shft_res > mask) {                   \
             return  mask;                        \
-        } else if (shft_res < -(mask +1)) {      \
+        } else if (shft_res < -(mask + 1)) {     \
             return  ~mask;                       \
         } else {                                 \
             return shft_res;                     \
@@ -1413,23 +1419,29 @@ SSRANS(B, int16_t, int8_t)
 SSRANS(H, int32_t, int16_t)
 SSRANS(W, int64_t, int32_t)
 
-#define VSSRAN(NAME, BIT, T, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)               \
-{                                                                            \
-    int i;                                                                   \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-    VReg *Vk = (VReg *)vk;                                                   \
-                                                                             \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
-        Vd->E1(i) = do_ssrans_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
-    }                                                                        \
-    Vd->D(1) = 0;                                                            \
+#define VSSRAN(NAME, BIT, E1, E2, E3)                                       \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)              \
+{                                                                           \
+    int i, j, ofs;                                                          \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
+    VReg *Vk = (VReg *)vk;                                                  \
+    int oprsz = simd_oprsz(desc);                                           \
+                                                                            \
+    ofs = LSX_LEN / BIT;                                                    \
+    for (i = 0; i < oprsz / 16; i++) {                                      \
+        for (j = 0; j < ofs; j++) {                                         \
+            Vd->E1(j + ofs * 2 * i) = do_ssrans_ ## E1(Vj->E2(j + ofs * i), \
+                                                Vk->E3(j + ofs * i) % BIT,  \
+                                                BIT / 2 - 1);               \
+        }                                                                   \
+        Vd->D(2 * i + 1) = 0;                                               \
+    }                                                                       \
 }
 
-VSSRAN(vssran_b_h, 16, uint16_t, B, H)
-VSSRAN(vssran_h_w, 32, uint32_t, H, W)
-VSSRAN(vssran_w_d, 64, uint64_t, W, D)
+VSSRAN(vssran_b_h, 16, B, H, UH)
+VSSRAN(vssran_h_w, 32, H, W, UW)
+VSSRAN(vssran_w_d, 64, W, D, UD)
 
 #define SSRLNU(E, T1, T2, T3)                    \
 static T1 do_ssrlnu_ ## E(T3 e2, int sa, int sh) \
@@ -1441,7 +1453,7 @@ static T1 do_ssrlnu_ ## E(T3 e2, int sa, int sh) \
             shft_res = (((T1)e2) >> sa);         \
         }                                        \
         T2 mask;                                 \
-        mask = (1ull << sh) -1;                  \
+        mask = (1ull << sh) - 1;                 \
         if (shft_res > mask) {                   \
             return mask;                         \
         } else {                                 \
@@ -1453,23 +1465,29 @@ SSRLNU(B, uint16_t, uint8_t,  int16_t)
 SSRLNU(H, uint32_t, uint16_t, int32_t)
 SSRLNU(W, uint64_t, uint32_t, int64_t)
 
-#define VSSRLNU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
-{                                                                         \
-    int i;                                                                \
-    VReg *Vd = (VReg *)vd;                                                \
-    VReg *Vj = (VReg *)vj;                                                \
-    VReg *Vk = (VReg *)vk;                                                \
-                                                                          \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
-        Vd->E1(i) = do_ssrlnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
-    }                                                                     \
-    Vd->D(1) = 0;                                                         \
+#define VSSRLNU(NAME, BIT, E1, E2, E3)                                      \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)              \
+{                                                                           \
+    int i, j, ofs;                                                          \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
+    VReg *Vk = (VReg *)vk;                                                  \
+    int oprsz = simd_oprsz(desc);                                           \
+                                                                            \
+    ofs = LSX_LEN / BIT;                                                    \
+    for (i = 0; i < oprsz / 16; i++) {                                      \
+        for (j = 0; j < ofs; j++) {                                         \
+            Vd->E1(j + ofs * 2 * i) = do_ssrlnu_ ## E1(Vj->E2(j + ofs * i), \
+                                                Vk->E3(j + ofs * i) % BIT,  \
+                                                BIT / 2);                   \
+        }                                                                   \
+        Vd->D(2 * i + 1) = 0;                                               \
+    }                                                                       \
 }
 
-VSSRLNU(vssrln_bu_h, 16, uint16_t, B, H)
-VSSRLNU(vssrln_hu_w, 32, uint32_t, H, W)
-VSSRLNU(vssrln_wu_d, 64, uint64_t, W, D)
+VSSRLNU(vssrln_bu_h, 16, B, H, UH)
+VSSRLNU(vssrln_hu_w, 32, H, W, UW)
+VSSRLNU(vssrln_wu_d, 64, W, D, UD)
 
 #define SSRANU(E, T1, T2, T3)                    \
 static T1 do_ssranu_ ## E(T3 e2, int sa, int sh) \
@@ -1484,7 +1502,7 @@ static T1 do_ssranu_ ## E(T3 e2, int sa, int sh) \
             shft_res = 0;                        \
         }                                        \
         T2 mask;                                 \
-        mask = (1ull << sh) -1;                  \
+        mask = (1ull << sh) - 1;                 \
         if (shft_res > mask) {                   \
             return mask;                         \
         } else {                                 \
@@ -1496,64 +1514,76 @@ SSRANU(B, uint16_t, uint8_t,  int16_t)
 SSRANU(H, uint32_t, uint16_t, int32_t)
 SSRANU(W, uint64_t, uint32_t, int64_t)
 
-#define VSSRANU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
-{                                                                         \
-    int i;                                                                \
-    VReg *Vd = (VReg *)vd;                                                \
-    VReg *Vj = (VReg *)vj;                                                \
-    VReg *Vk = (VReg *)vk;                                                \
-                                                                          \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                   \
-        Vd->E1(i) = do_ssranu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
-    }                                                                     \
-    Vd->D(1) = 0;                                                         \
+#define VSSRANU(NAME, BIT, E1, E2, E3)                                         \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                 \
+{                                                                              \
+    int i, j, ofs;                                                             \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
+    VReg *Vk = (VReg *)vk;                                                     \
+    int oprsz = simd_oprsz(desc);                                              \
+                                                                               \
+    ofs = LSX_LEN / BIT;                                                       \
+    for (i = 0; i < oprsz / 16; i++) {                                         \
+        for (j = 0; j < ofs; j++) {                                            \
+            Vd->E1(j + ofs * 2 * i) = do_ssranu_ ## E1(Vj->E2(j + ofs * i),    \
+                                                    Vk->E3(j + ofs * i) % BIT, \
+                                                    BIT / 2);                  \
+        }                                                                      \
+        Vd->D(2 * i + 1) = 0;                                                  \
+    }                                                                          \
 }
 
-VSSRANU(vssran_bu_h, 16, uint16_t, B, H)
-VSSRANU(vssran_hu_w, 32, uint32_t, H, W)
-VSSRANU(vssran_wu_d, 64, uint64_t, W, D)
-
-#define VSSRLNI(NAME, BIT, E1, E2)                                            \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)            \
-{                                                                             \
-    int i;                                                                    \
-    VReg temp;                                                                \
-    VReg *Vd = (VReg *)vd;                                                    \
-    VReg *Vj = (VReg *)vj;                                                    \
-                                                                              \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
-        temp.E1(i) = do_ssrlns_ ## E1(Vj->E2(i), imm, BIT/2 -1);              \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrlns_ ## E1(Vd->E2(i), imm, BIT/2 -1);\
-    }                                                                         \
-    *Vd = temp;                                                               \
+VSSRANU(vssran_bu_h, 16, B, H, UH)
+VSSRANU(vssran_hu_w, 32, H, W, UW)
+VSSRANU(vssran_wu_d, 64, W, D, UD)
+
+#define VSSRLNI(NAME, BIT, E1, E2)                                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                 \
+{                                                                                  \
+    int i, j, ofs;                                                                 \
+    VReg temp = {};                                                                \
+    VReg *Vd = (VReg *)vd;                                                         \
+    VReg *Vj = (VReg *)vj;                                                         \
+    int oprsz = simd_oprsz(desc);                                                  \
+                                                                                   \
+    ofs = LSX_LEN / BIT;                                                           \
+    for (i = 0; i < oprsz / 16; i++) {                                             \
+        for (j = 0; j < ofs; j++) {                                                \
+            temp.E1(j + ofs * 2 * i) = do_ssrlns_ ## E1(Vj->E2(j + ofs * i),       \
+                                                     imm, BIT / 2 - 1);            \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrlns_ ## E1(Vd->E2(j + ofs * i), \
+                                                           imm, BIT / 2 - 1);      \
+        }                                                                          \
+    }                                                                              \
+    *Vd = temp;                                                                    \
 }
 
 void HELPER(vssrlni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    Int128 shft_res1, shft_res2, mask;
+    int i, j;
+    Int128 shft_res[4], mask;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    if (imm == 0) {
-        shft_res1 = Vj->Q(0);
-        shft_res2 = Vd->Q(0);
-    } else {
-        shft_res1 = int128_urshift(Vj->Q(0), imm);
-        shft_res2 = int128_urshift(Vd->Q(0), imm);
-    }
     mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
 
-    if (int128_ult(mask, shft_res1)) {
-        Vd->D(0) = int128_getlo(mask);
-    }else {
-        Vd->D(0) = int128_getlo(shft_res1);
-    }
-
-    if (int128_ult(mask, shft_res2)) {
-        Vd->D(1) = int128_getlo(mask);
-    }else {
-        Vd->D(1) = int128_getlo(shft_res2);
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            shft_res[2 * i] = Vj->Q(i);
+            shft_res[2 * i + 1] = Vd->Q(i);
+        } else {
+            shft_res[2 * i] = int128_urshift(Vj->Q(i), imm);
+            shft_res[2 * i + 1] = int128_urshift(Vd->Q(i), imm);
+        }
+        for (j = 2 * i; j <= 2 * i + 1; j++) {
+            if (int128_ult(mask, shft_res[j])) {
+                Vd->D(j) = int128_getlo(mask);
+            }else {
+                Vd->D(j) = int128_getlo(shft_res[j]);
+            }
+        }
     }
 }
 
@@ -1561,51 +1591,55 @@ VSSRLNI(vssrlni_b_h, 16, B, H)
 VSSRLNI(vssrlni_h_w, 32, H, W)
 VSSRLNI(vssrlni_w_d, 64, W, D)
 
-#define VSSRANI(NAME, BIT, E1, E2)                                             \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)             \
-{                                                                              \
-    int i;                                                                     \
-    VReg temp;                                                                 \
-    VReg *Vd = (VReg *)vd;                                                     \
-    VReg *Vj = (VReg *)vj;                                                     \
-                                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                        \
-        temp.E1(i) = do_ssrans_ ## E1(Vj->E2(i), imm, BIT/2 -1);               \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrans_ ## E1(Vd->E2(i), imm, BIT/2 -1); \
-    }                                                                          \
-    *Vd = temp;                                                                \
+#define VSSRANI(NAME, BIT, E1, E2)                                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                 \
+{                                                                                  \
+    int i, j, ofs;                                                                 \
+    VReg temp = {};                                                                \
+    VReg *Vd = (VReg *)vd;                                                         \
+    VReg *Vj = (VReg *)vj;                                                         \
+    int oprsz = simd_oprsz(desc);                                                  \
+                                                                                   \
+    ofs = LSX_LEN / BIT;                                                           \
+    for (i = 0; i < oprsz / 16; i++) {                                             \
+        for (j = 0; j < ofs; j++) {                                                \
+            temp.E1(j + ofs * 2 * i) = do_ssrans_ ## E1(Vj->E2(j + ofs * i),       \
+                                                        imm, BIT / 2 - 1);         \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrans_ ## E1(Vd->E2(j + ofs * i), \
+                                                              imm, BIT / 2 - 1);   \
+        }                                                                          \
+    }                                                                              \
+    *Vd = temp;                                                                    \
 }
 
 void HELPER(vssrani_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    Int128 shft_res1, shft_res2, mask, min;
-    VReg *Vd = (VReg *)vd; 
-    VReg *Vj = (VReg *)vj; 
+    int i, j;
+    Int128 shft_res[4], mask, min;
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    if (imm == 0) {
-        shft_res1 = Vj->Q(0);
-        shft_res2 = Vd->Q(0);
-    } else {
-        shft_res1 = int128_rshift(Vj->Q(0), imm);
-        shft_res2 = int128_rshift(Vd->Q(0), imm);
-    }
     mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
     min  = int128_lshift(int128_one(), 63);
 
-    if (int128_gt(shft_res1,  mask)) {
-        Vd->D(0) = int128_getlo(mask);
-    } else if (int128_lt(shft_res1, int128_neg(min))) {
-        Vd->D(0) = int128_getlo(min);
-    } else {
-        Vd->D(0) = int128_getlo(shft_res1);
-    }
-
-    if (int128_gt(shft_res2, mask)) {
-        Vd->D(1) = int128_getlo(mask);
-    } else if (int128_lt(shft_res2, int128_neg(min))) {
-        Vd->D(1) = int128_getlo(min);
-    } else {
-        Vd->D(1) = int128_getlo(shft_res2);
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            shft_res[2 * i] = Vj->Q(i);
+            shft_res[2 * i + 1] = Vd->Q(i);
+        } else {
+            shft_res[2 * i] = int128_rshift(Vj->Q(i), imm);
+            shft_res[2 * i + 1] = int128_rshift(Vd->Q(i), imm);
+        }
+        for (j = 2 * i; j <= 2 * i + 1; j++) {
+            if (int128_gt(shft_res[j],  mask)) {
+                Vd->D(j) = int128_getlo(mask);
+            } else if (int128_lt(shft_res[j], int128_neg(min))) {
+                Vd->D(j) = int128_getlo(min);
+            } else {
+                Vd->D(j) = int128_getlo(shft_res[j]);
+            }
+        }
     }
 }
 
@@ -1613,46 +1647,52 @@ VSSRANI(vssrani_b_h, 16, B, H)
 VSSRANI(vssrani_h_w, 32, H, W)
 VSSRANI(vssrani_w_d, 64, W, D)
 
-#define VSSRLNUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)          \
-{                                                                           \
-    int i;                                                                  \
-    VReg temp;                                                              \
-    VReg *Vd = (VReg *)vd;                                                  \
-    VReg *Vj = (VReg *)vj;                                                  \
-                                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
-        temp.E1(i) = do_ssrlnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrlnu_ ## E1(Vd->E2(i), imm, BIT/2); \
-    }                                                                       \
-    *Vd = temp;                                                             \
+#define VSSRLNUI(NAME, BIT, E1, E2)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                 \
+{                                                                                  \
+    int i, j, ofs;                                                                 \
+    VReg temp = {};                                                                \
+    VReg *Vd = (VReg *)vd;                                                         \
+    VReg *Vj = (VReg *)vj;                                                         \
+    int oprsz = simd_oprsz(desc);                                                  \
+                                                                                   \
+    ofs = LSX_LEN / BIT;                                                           \
+    for (i = 0; i < oprsz / 16; i++) {                                             \
+        for (j = 0; j < ofs; j++) {                                                \
+            temp.E1(j + ofs * 2 * i) = do_ssrlnu_ ## E1(Vj->E2(j + ofs * i),       \
+                                                        imm, BIT / 2);             \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrlnu_ ## E1(Vd->E2(j + ofs * i), \
+                                                              imm, BIT / 2);       \
+        }                                                                          \
+    }                                                                              \
+    *Vd = temp;                                                                    \
 }
 
 void HELPER(vssrlni_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    Int128 shft_res1, shft_res2, mask;
+    int i, j;
+    Int128 shft_res[4], mask;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    if (imm == 0) {
-        shft_res1 = Vj->Q(0);
-        shft_res2 = Vd->Q(0);
-    } else {
-        shft_res1 = int128_urshift(Vj->Q(0), imm);
-        shft_res2 = int128_urshift(Vd->Q(0), imm);
-    }
     mask = int128_sub(int128_lshift(int128_one(), 64), int128_one());
 
-    if (int128_ult(mask, shft_res1)) {
-        Vd->D(0) = int128_getlo(mask);
-    }else {
-        Vd->D(0) = int128_getlo(shft_res1);
-    }
-
-    if (int128_ult(mask, shft_res2)) {
-        Vd->D(1) = int128_getlo(mask);
-    }else {
-        Vd->D(1) = int128_getlo(shft_res2);
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            shft_res[2 * i] = Vj->Q(i);
+            shft_res[2 * i + 1] = Vd->Q(i);
+        } else {
+            shft_res[2 * i] = int128_urshift(Vj->Q(i), imm);
+            shft_res[2 * i + 1] = int128_urshift(Vd->Q(i), imm);
+        }
+        for (j = 2 * i; j <= 2 * i + 1; j++) {
+            if (int128_ult(mask, shft_res[j])) {
+                Vd->D(j) = int128_getlo(mask);
+            }else {
+                Vd->D(j) = int128_getlo(shft_res[j]);
+            }
+        }
     }
 }
 
@@ -1660,55 +1700,58 @@ VSSRLNUI(vssrlni_bu_h, 16, B, H)
 VSSRLNUI(vssrlni_hu_w, 32, H, W)
 VSSRLNUI(vssrlni_wu_d, 64, W, D)
 
-#define VSSRANUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)          \
-{                                                                           \
-    int i;                                                                  \
-    VReg temp;                                                              \
-    VReg *Vd = (VReg *)vd;                                                  \
-    VReg *Vj = (VReg *)vj;                                                  \
-                                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                     \
-        temp.E1(i) = do_ssranu_ ## E1(Vj->E2(i), imm, BIT/2);               \
-        temp.E1(i + LSX_LEN/BIT) = do_ssranu_ ## E1(Vd->E2(i), imm, BIT/2); \
-    }                                                                       \
-    *Vd = temp;                                                             \
+#define VSSRANUI(NAME, BIT, E1, E2)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                 \
+{                                                                                  \
+    int i, j, ofs;                                                                 \
+    VReg temp = {};                                                                \
+    VReg *Vd = (VReg *)vd;                                                         \
+    VReg *Vj = (VReg *)vj;                                                         \
+    int oprsz = simd_oprsz(desc);                                                  \
+                                                                                   \
+    ofs = LSX_LEN / BIT;                                                           \
+    for (i = 0; i < oprsz / 16; i++) {                                             \
+        for (j = 0; j < ofs; j++) {                                                \
+            temp.E1(j + ofs * 2 * i) = do_ssranu_ ## E1(Vj->E2(j + ofs * i),       \
+                                                        imm, BIT / 2);             \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssranu_ ## E1(Vd->E2(j + ofs * i), \
+                                                              imm, BIT / 2);       \
+        }                                                                          \
+    }                                                                              \
+    *Vd = temp;                                                                    \
 }
 
 void HELPER(vssrani_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    Int128 shft_res1, shft_res2, mask;
+    int i, j;
+    Int128 shft_res[4], mask;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
-
-    if (imm == 0) {
-        shft_res1 = Vj->Q(0);
-        shft_res2 = Vd->Q(0);
-    } else {
-        shft_res1 = int128_rshift(Vj->Q(0), imm);
-        shft_res2 = int128_rshift(Vd->Q(0), imm);
-    }
-
-    if (int128_lt(Vj->Q(0), int128_zero())) {
-        shft_res1 = int128_zero();
-    }
-
-    if (int128_lt(Vd->Q(0), int128_zero())) {
-        shft_res2 = int128_zero();
-    }
+    int oprsz = simd_oprsz(desc);
 
     mask = int128_sub(int128_lshift(int128_one(), 64), int128_one());
 
-    if (int128_ult(mask, shft_res1)) {
-        Vd->D(0) = int128_getlo(mask);
-    }else {
-        Vd->D(0) = int128_getlo(shft_res1);
-    }
-
-    if (int128_ult(mask, shft_res2)) {
-        Vd->D(1) = int128_getlo(mask);
-    }else {
-        Vd->D(1) = int128_getlo(shft_res2);
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            shft_res[2 * i] = Vj->Q(i);
+            shft_res[2 * i + 1] = Vd->Q(i);
+        } else {
+            shft_res[2 * i] = int128_rshift(Vj->Q(i), imm);
+            shft_res[2 * i + 1] = int128_rshift(Vd->Q(i), imm);
+        }
+        if (int128_lt(Vj->Q(i), int128_zero())) {
+            shft_res[2 * i] = int128_zero();
+        }
+        if (int128_lt(Vd->Q(i), int128_zero())) {
+            shft_res[2 * i + 1] = int128_zero();
+        }
+        for (j = 2 * i; j <= 2 * i + 1; j++) {
+            if (int128_ult(mask, shft_res[j])) {
+                Vd->D(j) = int128_getlo(mask);
+            }else {
+                Vd->D(j) = int128_getlo(shft_res[j]);
+            }
+        }
     }
 }
 
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 702a2f770d..9c218abb6f 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -459,6 +459,36 @@ TRANS(xvsrarni_h_w, LASX, gen_vv_i, 32, gen_helper_vsrarni_h_w)
 TRANS(xvsrarni_w_d, LASX, gen_vv_i, 32, gen_helper_vsrarni_w_d)
 TRANS(xvsrarni_d_q, LASX, gen_vv_i, 32, gen_helper_vsrarni_d_q)
 
+TRANS(xvssrln_b_h, LASX, gen_vvv, 32, gen_helper_vssrln_b_h)
+TRANS(xvssrln_h_w, LASX, gen_vvv, 32, gen_helper_vssrln_h_w)
+TRANS(xvssrln_w_d, LASX, gen_vvv, 32, gen_helper_vssrln_w_d)
+TRANS(xvssran_b_h, LASX, gen_vvv, 32, gen_helper_vssran_b_h)
+TRANS(xvssran_h_w, LASX, gen_vvv, 32, gen_helper_vssran_h_w)
+TRANS(xvssran_w_d, LASX, gen_vvv, 32, gen_helper_vssran_w_d)
+TRANS(xvssrln_bu_h, LASX, gen_vvv, 32, gen_helper_vssrln_bu_h)
+TRANS(xvssrln_hu_w, LASX, gen_vvv, 32, gen_helper_vssrln_hu_w)
+TRANS(xvssrln_wu_d, LASX, gen_vvv, 32, gen_helper_vssrln_wu_d)
+TRANS(xvssran_bu_h, LASX, gen_vvv, 32, gen_helper_vssran_bu_h)
+TRANS(xvssran_hu_w, LASX, gen_vvv, 32, gen_helper_vssran_hu_w)
+TRANS(xvssran_wu_d, LASX, gen_vvv, 32, gen_helper_vssran_wu_d)
+
+TRANS(xvssrlni_b_h, LASX, gen_vv_i, 32, gen_helper_vssrlni_b_h)
+TRANS(xvssrlni_h_w, LASX, gen_vv_i, 32, gen_helper_vssrlni_h_w)
+TRANS(xvssrlni_w_d, LASX, gen_vv_i, 32, gen_helper_vssrlni_w_d)
+TRANS(xvssrlni_d_q, LASX, gen_vv_i, 32, gen_helper_vssrlni_d_q)
+TRANS(xvssrani_b_h, LASX, gen_vv_i, 32, gen_helper_vssrani_b_h)
+TRANS(xvssrani_h_w, LASX, gen_vv_i, 32, gen_helper_vssrani_h_w)
+TRANS(xvssrani_w_d, LASX, gen_vv_i, 32, gen_helper_vssrani_w_d)
+TRANS(xvssrani_d_q, LASX, gen_vv_i, 32, gen_helper_vssrani_d_q)
+TRANS(xvssrlni_bu_h, LASX, gen_vv_i, 32, gen_helper_vssrlni_bu_h)
+TRANS(xvssrlni_hu_w, LASX, gen_vv_i, 32, gen_helper_vssrlni_hu_w)
+TRANS(xvssrlni_wu_d, LASX, gen_vv_i, 32, gen_helper_vssrlni_wu_d)
+TRANS(xvssrlni_du_q, LASX, gen_vv_i, 32, gen_helper_vssrlni_du_q)
+TRANS(xvssrani_bu_h, LASX, gen_vv_i, 32, gen_helper_vssrani_bu_h)
+TRANS(xvssrani_hu_w, LASX, gen_vv_i, 32, gen_helper_vssrani_hu_w)
+TRANS(xvssrani_wu_d, LASX, gen_vv_i, 32, gen_helper_vssrani_wu_d)
+TRANS(xvssrani_du_q, LASX, gen_vv_i, 32, gen_helper_vssrani_du_q)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 32/48] target/loongarch: Implement xvssrln xvssran
  2023-08-30  8:48 ` [PATCH v4 32/48] target/loongarch: Implement xvssrln xvssran Song Gao
@ 2023-08-30 23:22   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:22 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
>   void HELPER(vssrlni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
>   {
> -    Int128 shft_res1, shft_res2, mask;
> +    int i, j;
> +    Int128 shft_res[4], mask;
>       VReg *Vd = (VReg *)vd;
>       VReg *Vj = (VReg *)vj;
> +    int oprsz = simd_oprsz(desc);
>   
> -    if (imm == 0) {
> -        shft_res1 = Vj->Q(0);
> -        shft_res2 = Vd->Q(0);
> -    } else {
> -        shft_res1 = int128_urshift(Vj->Q(0), imm);
> -        shft_res2 = int128_urshift(Vd->Q(0), imm);
> -    }
>       mask = int128_sub(int128_lshift(int128_one(), 63), int128_one());
>   
> -    if (int128_ult(mask, shft_res1)) {
> -        Vd->D(0) = int128_getlo(mask);
> -    }else {
> -        Vd->D(0) = int128_getlo(shft_res1);
> -    }
> -
> -    if (int128_ult(mask, shft_res2)) {
> -        Vd->D(1) = int128_getlo(mask);
> -    }else {
> -        Vd->D(1) = int128_getlo(shft_res2);
> +    for (i = 0; i < oprsz / 16; i++) {
> +        if (imm == 0) {
> +            shft_res[2 * i] = Vj->Q(i);
> +            shft_res[2 * i + 1] = Vd->Q(i);
> +        } else {
> +            shft_res[2 * i] = int128_urshift(Vj->Q(i), imm);
> +            shft_res[2 * i + 1] = int128_urshift(Vd->Q(i), imm);
> +        }
> +        for (j = 2 * i; j <= 2 * i + 1; j++) {
> +            if (int128_ult(mask, shft_res[j])) {
> +                Vd->D(j) = int128_getlo(mask);
> +            }else {
> +                Vd->D(j) = int128_getlo(shft_res[j]);
> +            }
> +        }
>       }
>   }

This does not require an array of shift_res.
In fact, I encourage you to split out a helper.


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 33/48] target/loongarch: Implement xvssrlrn xvssrarn
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (31 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 32/48] target/loongarch: Implement xvssrln xvssran Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:26   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 34/48] target/loongarch: Implement xvclo xvclz Song Gao
                   ` (14 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSSRLRN.{B.H/H.W/W.D};
- XVSSRARN.{B.H/H.W/W.D};
- XVSSRLRN.{BU.H/HU.W/WU.D};
- XVSSRARN.{BU.H/HU.W/WU.D};
- XVSSRLRNI.{B.H/H.W/W.D/D.Q};
- XVSSRARNI.{B.H/H.W/W.D/D.Q};
- XVSSRLRNI.{BU.H/HU.W/WU.D/DU.Q};
- XVSSRARNI.{BU.H/HU.W/WU.D/DU.Q}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  30 ++
 target/loongarch/disas.c                     |  30 ++
 target/loongarch/vec_helper.c                | 467 ++++++++++---------
 target/loongarch/insn_trans/trans_lasx.c.inc |  30 ++
 4 files changed, 348 insertions(+), 209 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 022dd9bfd1..dc74bae7a5 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1740,6 +1740,36 @@ xvssrani_hu_w    0111 01110110 01001 ..... ..... .....    @vv_ui5
 xvssrani_wu_d    0111 01110110 0101 ...... ..... .....    @vv_ui6
 xvssrani_du_q    0111 01110110 011 ....... ..... .....    @vv_ui7
 
+xvssrlrn_b_h     0111 01010000 00001 ..... ..... .....    @vvv
+xvssrlrn_h_w     0111 01010000 00010 ..... ..... .....    @vvv
+xvssrlrn_w_d     0111 01010000 00011 ..... ..... .....    @vvv
+xvssrarn_b_h     0111 01010000 00101 ..... ..... .....    @vvv
+xvssrarn_h_w     0111 01010000 00110 ..... ..... .....    @vvv
+xvssrarn_w_d     0111 01010000 00111 ..... ..... .....    @vvv
+xvssrlrn_bu_h    0111 01010000 10001 ..... ..... .....    @vvv
+xvssrlrn_hu_w    0111 01010000 10010 ..... ..... .....    @vvv
+xvssrlrn_wu_d    0111 01010000 10011 ..... ..... .....    @vvv
+xvssrarn_bu_h    0111 01010000 10101 ..... ..... .....    @vvv
+xvssrarn_hu_w    0111 01010000 10110 ..... ..... .....    @vvv
+xvssrarn_wu_d    0111 01010000 10111 ..... ..... .....    @vvv
+
+xvssrlrni_b_h    0111 01110101 00000 1 .... ..... .....   @vv_ui4
+xvssrlrni_h_w    0111 01110101 00001 ..... ..... .....    @vv_ui5
+xvssrlrni_w_d    0111 01110101 0001 ...... ..... .....    @vv_ui6
+xvssrlrni_d_q    0111 01110101 001 ....... ..... .....    @vv_ui7
+xvssrarni_b_h    0111 01110110 10000 1 .... ..... .....   @vv_ui4
+xvssrarni_h_w    0111 01110110 10001 ..... ..... .....    @vv_ui5
+xvssrarni_w_d    0111 01110110 1001 ...... ..... .....    @vv_ui6
+xvssrarni_d_q    0111 01110110 101 ....... ..... .....    @vv_ui7
+xvssrlrni_bu_h   0111 01110101 01000 1 .... ..... .....   @vv_ui4
+xvssrlrni_hu_w   0111 01110101 01001 ..... ..... .....    @vv_ui5
+xvssrlrni_wu_d   0111 01110101 0101 ...... ..... .....    @vv_ui6
+xvssrlrni_du_q   0111 01110101 011 ....... ..... .....    @vv_ui7
+xvssrarni_bu_h   0111 01110110 11000 1 .... ..... .....   @vv_ui4
+xvssrarni_hu_w   0111 01110110 11001 ..... ..... .....    @vv_ui5
+xvssrarni_wu_d   0111 01110110 1101 ...... ..... .....    @vv_ui6
+xvssrarni_du_q   0111 01110110 111 ....... ..... .....    @vv_ui7
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 04e8d42044..f043a2f9b6 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2166,6 +2166,36 @@ INSN_LASX(xvssrani_hu_w,     vv_i)
 INSN_LASX(xvssrani_wu_d,     vv_i)
 INSN_LASX(xvssrani_du_q,     vv_i)
 
+INSN_LASX(xvssrlrn_b_h,      vvv)
+INSN_LASX(xvssrlrn_h_w,      vvv)
+INSN_LASX(xvssrlrn_w_d,      vvv)
+INSN_LASX(xvssrarn_b_h,      vvv)
+INSN_LASX(xvssrarn_h_w,      vvv)
+INSN_LASX(xvssrarn_w_d,      vvv)
+INSN_LASX(xvssrlrn_bu_h,     vvv)
+INSN_LASX(xvssrlrn_hu_w,     vvv)
+INSN_LASX(xvssrlrn_wu_d,     vvv)
+INSN_LASX(xvssrarn_bu_h,     vvv)
+INSN_LASX(xvssrarn_hu_w,     vvv)
+INSN_LASX(xvssrarn_wu_d,     vvv)
+
+INSN_LASX(xvssrlrni_b_h,     vv_i)
+INSN_LASX(xvssrlrni_h_w,     vv_i)
+INSN_LASX(xvssrlrni_w_d,     vv_i)
+INSN_LASX(xvssrlrni_d_q,     vv_i)
+INSN_LASX(xvssrlrni_bu_h,    vv_i)
+INSN_LASX(xvssrlrni_hu_w,    vv_i)
+INSN_LASX(xvssrlrni_wu_d,    vv_i)
+INSN_LASX(xvssrlrni_du_q,    vv_i)
+INSN_LASX(xvssrarni_b_h,     vv_i)
+INSN_LASX(xvssrarni_h_w,     vv_i)
+INSN_LASX(xvssrarni_w_d,     vv_i)
+INSN_LASX(xvssrarni_d_q,     vv_i)
+INSN_LASX(xvssrarni_bu_h,    vv_i)
+INSN_LASX(xvssrarni_hu_w,    vv_i)
+INSN_LASX(xvssrarni_wu_d,    vv_i)
+INSN_LASX(xvssrarni_du_q,    vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 738bb452f6..852c65716e 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -1766,7 +1766,7 @@ static T1 do_ssrlrns_ ## E1(T2 e2, int sa, int sh) \
                                                    \
     shft_res = do_vsrlr_ ## E2(e2, sa);            \
     T1 mask;                                       \
-    mask = (1ull << sh) -1;                        \
+    mask = (1ull << sh) - 1;                       \
     if (shft_res > mask) {                         \
         return mask;                               \
     } else {                                       \
@@ -1778,23 +1778,29 @@ SSRLRNS(B, H, uint16_t, int16_t, uint8_t)
 SSRLRNS(H, W, uint32_t, int32_t, uint16_t)
 SSRLRNS(W, D, uint64_t, int64_t, uint32_t)
 
-#define VSSRLRN(NAME, BIT, T, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
-{                                                                             \
-    int i;                                                                    \
-    VReg *Vd = (VReg *)vd;                                                    \
-    VReg *Vj = (VReg *)vj;                                                    \
-    VReg *Vk = (VReg *)vk;                                                    \
-                                                                              \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
-        Vd->E1(i) = do_ssrlrns_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
-    }                                                                         \
-    Vd->D(1) = 0;                                                             \
+#define VSSRLRN(NAME, BIT, E1, E2, E3)                                         \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                 \
+{                                                                              \
+    int i, j, ofs;                                                             \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
+    VReg *Vk = (VReg *)vk;                                                     \
+    int oprsz = simd_oprsz(desc);                                              \
+                                                                               \
+    ofs = LSX_LEN / BIT;                                                       \
+    for (i = 0; i < oprsz / 16; i++) {                                         \
+        for (j = 0; j < ofs; j++) {                                            \
+            Vd->E1(j + ofs * 2 * i) = do_ssrlrns_ ## E1(Vj->E2(j + ofs * i),   \
+                                                    Vk->E3(j + ofs * i) % BIT, \
+                                                    BIT / 2 - 1);              \
+        }                                                                      \
+        Vd->D(2 * i + 1) = 0;                                                  \
+    }                                                                          \
 }
 
-VSSRLRN(vssrlrn_b_h, 16, uint16_t, B, H)
-VSSRLRN(vssrlrn_h_w, 32, uint32_t, H, W)
-VSSRLRN(vssrlrn_w_d, 64, uint64_t, W, D)
+VSSRLRN(vssrlrn_b_h, 16, B, H, UH)
+VSSRLRN(vssrlrn_h_w, 32, H, W, UW)
+VSSRLRN(vssrlrn_w_d, 64, W, D, UD)
 
 #define SSRARNS(E1, E2, T1, T2)                    \
 static T1 do_ssrarns_ ## E1(T1 e2, int sa, int sh) \
@@ -1803,7 +1809,7 @@ static T1 do_ssrarns_ ## E1(T1 e2, int sa, int sh) \
                                                    \
     shft_res = do_vsrar_ ## E2(e2, sa);            \
     T2 mask;                                       \
-    mask = (1ll << sh) -1;                         \
+    mask = (1ll << sh) - 1;                        \
     if (shft_res > mask) {                         \
         return  mask;                              \
     } else if (shft_res < -(mask +1)) {            \
@@ -1817,23 +1823,29 @@ SSRARNS(B, H, int16_t, int8_t)
 SSRARNS(H, W, int32_t, int16_t)
 SSRARNS(W, D, int64_t, int32_t)
 
-#define VSSRARN(NAME, BIT, T, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                \
-{                                                                             \
-    int i;                                                                    \
-    VReg *Vd = (VReg *)vd;                                                    \
-    VReg *Vj = (VReg *)vj;                                                    \
-    VReg *Vk = (VReg *)vk;                                                    \
-                                                                              \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                       \
-        Vd->E1(i) = do_ssrarns_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2 -1); \
-    }                                                                         \
-    Vd->D(1) = 0;                                                             \
+#define VSSRARN(NAME, BIT, E1, E2, E3)                                         \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                 \
+{                                                                              \
+    int i, j, ofs;                                                             \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
+    VReg *Vk = (VReg *)vk;                                                     \
+    int oprsz = simd_oprsz(desc);                                              \
+                                                                               \
+    ofs = LSX_LEN / BIT;                                                       \
+    for (i = 0; i < oprsz / 16; i++) {                                         \
+        for (j = 0; j < ofs; j++) {                                            \
+            Vd->E1(j + ofs * 2 * i) = do_ssrarns_ ## E1(Vj->E2(j + ofs * i),   \
+                                                    Vk->E3(j + ofs * i) % BIT, \
+                                                    BIT/ 2 - 1);               \
+        }                                                                      \
+        Vd->D(2 * i + 1) = 0;                                                  \
+    }                                                                          \
 }
 
-VSSRARN(vssrarn_b_h, 16, uint16_t, B, H)
-VSSRARN(vssrarn_h_w, 32, uint32_t, H, W)
-VSSRARN(vssrarn_w_d, 64, uint64_t, W, D)
+VSSRARN(vssrarn_b_h, 16, B, H, UH)
+VSSRARN(vssrarn_h_w, 32, H, W, UW)
+VSSRARN(vssrarn_w_d, 64, W, D, UD)
 
 #define SSRLRNU(E1, E2, T1, T2, T3)                \
 static T1 do_ssrlrnu_ ## E1(T3 e2, int sa, int sh) \
@@ -1843,7 +1855,7 @@ static T1 do_ssrlrnu_ ## E1(T3 e2, int sa, int sh) \
     shft_res = do_vsrlr_ ## E2(e2, sa);            \
                                                    \
     T2 mask;                                       \
-    mask = (1ull << sh) -1;                        \
+    mask = (1ull << sh) - 1;                       \
     if (shft_res > mask) {                         \
         return mask;                               \
     } else {                                       \
@@ -1855,23 +1867,29 @@ SSRLRNU(B, H, uint16_t, uint8_t, int16_t)
 SSRLRNU(H, W, uint32_t, uint16_t, int32_t)
 SSRLRNU(W, D, uint64_t, uint32_t, int64_t)
 
-#define VSSRLRNU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)             \
-{                                                                          \
-    int i;                                                                 \
-    VReg *Vd = (VReg *)vd;                                                 \
-    VReg *Vj = (VReg *)vj;                                                 \
-    VReg *Vk = (VReg *)vk;                                                 \
-                                                                           \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                    \
-        Vd->E1(i) = do_ssrlrnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
-    }                                                                      \
-    Vd->D(1) = 0;                                                          \
+#define VSSRLRNU(NAME, BIT, E1, E2, E3)                                        \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)                 \
+{                                                                              \
+    int i, j, ofs;                                                             \
+    VReg *Vd = (VReg *)vd;                                                     \
+    VReg *Vj = (VReg *)vj;                                                     \
+    VReg *Vk = (VReg *)vk;                                                     \
+    int oprsz = simd_oprsz(desc);                                              \
+                                                                               \
+    ofs = LSX_LEN / BIT;                                                       \
+    for (i = 0; i < oprsz / 16; i++) {                                         \
+        for (j = 0; j < ofs; j++) {                                            \
+            Vd->E1(j + ofs * 2 * i) = do_ssrlrnu_ ## E1(Vj->E2(j + ofs * i),   \
+                                                    Vk->E3(j + ofs * i) % BIT, \
+                                                    BIT / 2);                  \
+        }                                                                      \
+        Vd->D(2 * i + 1) = 0;                                                  \
+    }                                                                          \
 }
 
-VSSRLRNU(vssrlrn_bu_h, 16, uint16_t, B, H)
-VSSRLRNU(vssrlrn_hu_w, 32, uint32_t, H, W)
-VSSRLRNU(vssrlrn_wu_d, 64, uint64_t, W, D)
+VSSRLRNU(vssrlrn_bu_h, 16, B, H, UH)
+VSSRLRNU(vssrlrn_hu_w, 32, H, W, UW)
+VSSRLRNU(vssrlrn_wu_d, 64, W, D, UD)
 
 #define SSRARNU(E1, E2, T1, T2, T3)                \
 static T1 do_ssrarnu_ ## E1(T3 e2, int sa, int sh) \
@@ -1884,7 +1902,7 @@ static T1 do_ssrarnu_ ## E1(T3 e2, int sa, int sh) \
         shft_res = do_vsrar_ ## E2(e2, sa);        \
     }                                              \
     T2 mask;                                       \
-    mask = (1ull << sh) -1;                        \
+    mask = (1ull << sh) - 1;                       \
     if (shft_res > mask) {                         \
         return mask;                               \
     } else {                                       \
@@ -1896,70 +1914,84 @@ SSRARNU(B, H, uint16_t, uint8_t, int16_t)
 SSRARNU(H, W, uint32_t, uint16_t, int32_t)
 SSRARNU(W, D, uint64_t, uint32_t, int64_t)
 
-#define VSSRARNU(NAME, BIT, T, E1, E2)                                     \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)             \
-{                                                                          \
-    int i;                                                                 \
-    VReg *Vd = (VReg *)vd;                                                 \
-    VReg *Vj = (VReg *)vj;                                                 \
-    VReg *Vk = (VReg *)vk;                                                 \
-                                                                           \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                    \
-        Vd->E1(i) = do_ssrarnu_ ## E1(Vj->E2(i), (T)Vk->E2(i)%BIT, BIT/2); \
-    }                                                                      \
-    Vd->D(1) = 0;                                                          \
-}
-
-VSSRARNU(vssrarn_bu_h, 16, uint16_t, B, H)
-VSSRARNU(vssrarn_hu_w, 32, uint32_t, H, W)
-VSSRARNU(vssrarn_wu_d, 64, uint64_t, W, D)
-
-#define VSSRLRNI(NAME, BIT, E1, E2)                                            \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)             \
-{                                                                              \
-    int i;                                                                     \
-    VReg temp;                                                                 \
-    VReg *Vd = (VReg *)vd;                                                     \
-    VReg *Vj = (VReg *)vj;                                                     \
-                                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                        \
-        temp.E1(i) = do_ssrlrns_ ## E1(Vj->E2(i), imm, BIT/2 -1);              \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrlrns_ ## E1(Vd->E2(i), imm, BIT/2 -1);\
-    }                                                                          \
-    *Vd = temp;                                                                \
+#define VSSRARNU(NAME, BIT, E1, E2, E3)                                      \
+void HELPER(NAME)(void *vd, void *vj, void  *vk, uint32_t desc)              \
+{                                                                            \
+    int i, j, ofs;                                                           \
+    VReg *Vd = (VReg *)vd;                                                   \
+    VReg *Vj = (VReg *)vj;                                                   \
+    VReg *Vk = (VReg *)vk;                                                   \
+    int oprsz = simd_oprsz(desc);                                            \
+                                                                             \
+    ofs = LSX_LEN / BIT;                                                     \
+    for (i = 0; i < oprsz / 16; i++) {                                       \
+        for (j = 0; j < ofs; j++) {                                          \
+            Vd->E1(j + ofs * 2 * i) = do_ssrarnu_ ## E1(Vj->E2(j + ofs * i), \
+                                                Vk->E3(j + ofs * i) % BIT,   \
+                                                BIT / 2);                    \
+        }                                                                    \
+        Vd->D(2 * i + 1) = 0;                                                \
+    }                                                                        \
 }
 
-#define VSSRLRNI_Q(NAME, sh)                                               \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)         \
-{                                                                          \
-    Int128 shft_res1, shft_res2, mask, r1, r2;                             \
-    VReg *Vd = (VReg *)vd;                                                 \
-    VReg *Vj = (VReg *)vj;                                                 \
-                                                                           \
-    if (imm == 0) {                                                        \
-        shft_res1 = Vj->Q(0);                                              \
-        shft_res2 = Vd->Q(0);                                              \
-    } else {                                                               \
-        r1 = int128_and(int128_urshift(Vj->Q(0), (imm -1)), int128_one()); \
-        r2 = int128_and(int128_urshift(Vd->Q(0), (imm -1)), int128_one()); \
-                                                                           \
-        shft_res1 = (int128_add(int128_urshift(Vj->Q(0), imm), r1));       \
-        shft_res2 = (int128_add(int128_urshift(Vd->Q(0), imm), r2));       \
-    }                                                                      \
-                                                                           \
-    mask = int128_sub(int128_lshift(int128_one(), sh), int128_one());      \
-                                                                           \
-    if (int128_ult(mask, shft_res1)) {                                     \
-        Vd->D(0) = int128_getlo(mask);                                     \
-    }else {                                                                \
-        Vd->D(0) = int128_getlo(shft_res1);                                \
-    }                                                                      \
-                                                                           \
-    if (int128_ult(mask, shft_res2)) {                                     \
-        Vd->D(1) = int128_getlo(mask);                                     \
-    }else {                                                                \
-        Vd->D(1) = int128_getlo(shft_res2);                                \
-    }                                                                      \
+VSSRARNU(vssrarn_bu_h, 16, B, H, UH)
+VSSRARNU(vssrarn_hu_w, 32, H, W, UW)
+VSSRARNU(vssrarn_wu_d, 64, W, D, UD)
+
+#define VSSRLRNI(NAME, BIT, E1, E2)                                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                  \
+{                                                                                   \
+    int i, j, ofs;                                                                  \
+    VReg temp = {};                                                                 \
+    VReg *Vd = (VReg *)vd;                                                          \
+    VReg *Vj = (VReg *)vj;                                                          \
+    int oprsz = simd_oprsz(desc);                                                   \
+                                                                                    \
+    ofs = LSX_LEN / BIT;                                                            \
+    for (i = 0; i < oprsz / 16; i++) {                                              \
+        for (j = 0; j < ofs; j++) {                                                 \
+            temp.E1(j + ofs * 2 * i) = do_ssrlrns_ ## E1(Vj->E2(j + ofs * i),       \
+                                                         imm, BIT / 2 - 1);         \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrlrns_ ## E1(Vd->E2(j + ofs * i), \
+                                                               imm, BIT / 2 - 1);   \
+        }                                                                           \
+    }                                                                               \
+    *Vd = temp;                                                                     \
+}
+
+#define VSSRLRNI_Q(NAME, sh)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)          \
+{                                                                           \
+    int i, j;                                                               \
+    Int128 shft_res[4], mask, r[4];                                         \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
+    int oprsz = simd_oprsz(desc);                                           \
+                                                                            \
+    mask = int128_sub(int128_lshift(int128_one(), sh), int128_one());       \
+                                                                            \
+    for (i = 0; i < oprsz / 16; i++) {                                      \
+        if (imm == 0) {                                                     \
+            shft_res[2 * i] = Vj->Q(i);                                     \
+            shft_res[2 * i + 1] = Vd->Q(i);                                 \
+        } else {                                                            \
+            r[2 * i] = int128_and(int128_urshift(Vj->Q(i),  (imm - 1)),     \
+                                  int128_one());                            \
+            r[2 * i + 1] = int128_and(int128_urshift(Vd->Q(i), (imm - 1)),  \
+                                      int128_one());                        \
+            shft_res[2 * i] = int128_add(int128_urshift(Vj->Q(i), imm),     \
+                                         r[2 * i]);                         \
+            shft_res[2 * i + 1] = int128_add(int128_urshift(Vd->Q(i), imm), \
+                                             r[2 * i + 1]);                 \
+        }                                                                   \
+        for (j = 2 * i; j <= 2 * i + 1; j++) {                              \
+            if (int128_ult(mask, shft_res[j])) {                            \
+                Vd->D(j) = int128_getlo(mask);                              \
+            }else {                                                         \
+                Vd->D(j) = int128_getlo(shft_res[j]);                       \
+            }                                                               \
+        }                                                                   \
+    }                                                                       \
 }
 
 VSSRLRNI(vssrlrni_b_h, 16, B, H)
@@ -1967,55 +1999,61 @@ VSSRLRNI(vssrlrni_h_w, 32, H, W)
 VSSRLRNI(vssrlrni_w_d, 64, W, D)
 VSSRLRNI_Q(vssrlrni_d_q, 63)
 
-#define VSSRARNI(NAME, BIT, E1, E2)                                             \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)              \
-{                                                                               \
-    int i;                                                                      \
-    VReg temp;                                                                  \
-    VReg *Vd = (VReg *)vd;                                                      \
-    VReg *Vj = (VReg *)vj;                                                      \
-                                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                         \
-        temp.E1(i) = do_ssrarns_ ## E1(Vj->E2(i), imm, BIT/2 -1);               \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrarns_ ## E1(Vd->E2(i), imm, BIT/2 -1); \
-    }                                                                           \
-    *Vd = temp;                                                                 \
+#define VSSRARNI(NAME, BIT, E1, E2)                                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                  \
+{                                                                                   \
+    int i, j, ofs;                                                                  \
+    VReg temp = {};                                                                 \
+    VReg *Vd = (VReg *)vd;                                                          \
+    VReg *Vj = (VReg *)vj;                                                          \
+    int oprsz = simd_oprsz(desc);                                                   \
+                                                                                    \
+    ofs = LSX_LEN / BIT;                                                            \
+    for (i = 0; i < oprsz / 16; i++) {                                              \
+        for (j = 0; j < ofs; j++) {                                                 \
+            temp.E1(j + ofs * 2 * i) = do_ssrarns_ ## E1(Vj->E2(j + ofs * i),       \
+                                                         imm, BIT / 2 - 1);         \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrarns_ ## E1(Vd->E2(j + ofs * i), \
+                                                               imm, BIT / 2 - 1);   \
+        }                                                                           \
+    }                                                                               \
+    *Vd = temp;                                                                     \
 }
 
 void HELPER(vssrarni_d_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
+    int i, j;
+    Int128 shft_res[4], mask1, mask2, r[4];
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
-
-    if (imm == 0) {
-        shft_res1 = Vj->Q(0);
-        shft_res2 = Vd->Q(0);
-    } else {
-        r1 = int128_and(int128_rshift(Vj->Q(0), (imm -1)), int128_one());
-        r2 = int128_and(int128_rshift(Vd->Q(0), (imm -1)), int128_one());
-
-        shft_res1 = int128_add(int128_rshift(Vj->Q(0), imm), r1);
-        shft_res2 = int128_add(int128_rshift(Vd->Q(0), imm), r2);
-    }
+    int oprsz = simd_oprsz(desc);
 
     mask1 = int128_sub(int128_lshift(int128_one(), 63), int128_one());
     mask2  = int128_lshift(int128_one(), 63);
 
-    if (int128_gt(shft_res1,  mask1)) {
-        Vd->D(0) = int128_getlo(mask1);
-    } else if (int128_lt(shft_res1, int128_neg(mask2))) {
-        Vd->D(0) = int128_getlo(mask2);
-    } else {
-        Vd->D(0) = int128_getlo(shft_res1);
-    }
-
-    if (int128_gt(shft_res2, mask1)) {
-        Vd->D(1) = int128_getlo(mask1);
-    } else if (int128_lt(shft_res2, int128_neg(mask2))) {
-        Vd->D(1) = int128_getlo(mask2);
-    } else {
-        Vd->D(1) = int128_getlo(shft_res2);
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            shft_res[2 * i] = Vj->Q(i);
+            shft_res[2 * i + 1] = Vd->Q(i);
+        } else {
+            r[2 * i] = int128_and(int128_rshift(Vj->Q(i), (imm - 1)),
+                                  int128_one());
+            r[2 * i + 1] = int128_and(int128_rshift(Vd->Q(i), (imm - 1)),
+                                      int128_one());
+            shft_res[2 * i] = int128_add(int128_rshift(Vj->Q(i), imm),
+                                         r[2 * i]);
+            shft_res[2 * i + 1] = int128_add(int128_rshift(Vd->Q(i), imm),
+                                             r[2 * i + 1]);
+        }
+        for (j = 2 * i; j <= 2 * i + 1; j++) {
+            if (int128_gt(shft_res[j], mask1)) {
+                Vd->D(j) = int128_getlo(mask1);
+            } else if (int128_lt(shft_res[j], int128_neg(mask2))) {
+                Vd->D(j) = int128_getlo(mask2);
+            } else {
+                Vd->D(j) = int128_getlo(shft_res[j]);
+            }
+        }
     }
 }
 
@@ -2023,19 +2061,25 @@ VSSRARNI(vssrarni_b_h, 16, B, H)
 VSSRARNI(vssrarni_h_w, 32, H, W)
 VSSRARNI(vssrarni_w_d, 64, W, D)
 
-#define VSSRLRNUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)           \
-{                                                                            \
-    int i;                                                                   \
-    VReg temp;                                                               \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
-        temp.E1(i) = do_ssrlrnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrlrnu_ ## E1(Vd->E2(i), imm, BIT/2); \
-    }                                                                        \
-    *Vd = temp;                                                              \
+#define VSSRLRNUI(NAME, BIT, E1, E2)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                  \
+{                                                                                   \
+    int i, j, ofs;                                                                  \
+    VReg temp = {};                                                                 \
+    VReg *Vd = (VReg *)vd;                                                          \
+    VReg *Vj = (VReg *)vj;                                                          \
+    int oprsz = simd_oprsz(desc);                                                   \
+                                                                                    \
+    ofs = LSX_LEN / BIT;                                                            \
+    for (i = 0; i < oprsz / 16; i++) {                                              \
+        for (j = 0; j < ofs; j++) {                                                 \
+            temp.E1(j + ofs * 2 * i) = do_ssrlrnu_ ## E1(Vj->E2(j + ofs * i),       \
+                                                         imm, BIT / 2);             \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrlrnu_ ## E1(Vd->E2(j + ofs * i), \
+                                                               imm, BIT / 2);       \
+        }                                                                           \
+    }                                                                               \
+    *Vd = temp;                                                                     \
 }
 
 VSSRLRNUI(vssrlrni_bu_h, 16, B, H)
@@ -2043,62 +2087,67 @@ VSSRLRNUI(vssrlrni_hu_w, 32, H, W)
 VSSRLRNUI(vssrlrni_wu_d, 64, W, D)
 VSSRLRNI_Q(vssrlrni_du_q, 64)
 
-#define VSSRARNUI(NAME, BIT, E1, E2)                                         \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)           \
-{                                                                            \
-    int i;                                                                   \
-    VReg temp;                                                               \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
-        temp.E1(i) = do_ssrarnu_ ## E1(Vj->E2(i), imm, BIT/2);               \
-        temp.E1(i + LSX_LEN/BIT) = do_ssrarnu_ ## E1(Vd->E2(i), imm, BIT/2); \
-    }                                                                        \
-    *Vd = temp;                                                              \
+#define VSSRARNUI(NAME, BIT, E1, E2)                                                \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)                  \
+{                                                                                   \
+    int i, j, ofs;                                                                  \
+    VReg temp = {};                                                                 \
+    VReg *Vd = (VReg *)vd;                                                          \
+    VReg *Vj = (VReg *)vj;                                                          \
+    int oprsz = simd_oprsz(desc);                                                   \
+                                                                                    \
+    ofs = LSX_LEN / BIT;                                                            \
+    for (i = 0; i < oprsz / 16; i++) {                                              \
+        for (j = 0; j < ofs; j++) {                                                 \
+            temp.E1(j + ofs * 2 * i) = do_ssrarnu_ ## E1(Vj->E2(j + ofs * i),       \
+                                                         imm, BIT / 2);             \
+            temp.E1(j + ofs * (2 * i + 1)) = do_ssrarnu_ ## E1(Vd->E2(j + ofs * i), \
+                                                               imm, BIT / 2);       \
+        }                                                                           \
+    }                                                                               \
+    *Vd = temp;                                                                     \
 }
 
 void HELPER(vssrarni_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
-    Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
+    int i, j;
+    Int128 shft_res[4], mask1, mask2, r[4];
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
-
-    if (imm == 0) {
-        shft_res1 = Vj->Q(0);
-        shft_res2 = Vd->Q(0);
-    } else {
-        r1 = int128_and(int128_rshift(Vj->Q(0), (imm -1)), int128_one());
-        r2 = int128_and(int128_rshift(Vd->Q(0), (imm -1)), int128_one());
-
-        shft_res1 = int128_add(int128_rshift(Vj->Q(0), imm), r1);
-        shft_res2 = int128_add(int128_rshift(Vd->Q(0), imm), r2);
-    }
-
-    if (int128_lt(Vj->Q(0), int128_zero())) {
-        shft_res1 = int128_zero();
-    }
-    if (int128_lt(Vd->Q(0), int128_zero())) {
-        shft_res2 = int128_zero();
-    }
+    int oprsz = simd_oprsz(desc);
 
     mask1 = int128_sub(int128_lshift(int128_one(), 64), int128_one());
     mask2  = int128_lshift(int128_one(), 64);
 
-    if (int128_gt(shft_res1,  mask1)) {
-        Vd->D(0) = int128_getlo(mask1);
-    } else if (int128_lt(shft_res1, int128_neg(mask2))) {
-        Vd->D(0) = int128_getlo(mask2);
-    } else {
-        Vd->D(0) = int128_getlo(shft_res1);
-    }
-
-    if (int128_gt(shft_res2, mask1)) {
-        Vd->D(1) = int128_getlo(mask1);
-    } else if (int128_lt(shft_res2, int128_neg(mask2))) {
-        Vd->D(1) = int128_getlo(mask2);
-    } else {
-        Vd->D(1) = int128_getlo(shft_res2);
+    for (i = 0; i < oprsz / 16; i++) {
+        if (imm == 0) {
+            shft_res[2 * i] = Vj->Q(i);
+            shft_res[2 * i + 1] = Vd->Q(i);
+        } else {
+            r[2 * i] = int128_and(int128_rshift(Vj->Q(i), (imm - 1)),
+                                  int128_one());
+            r[2 * i + 1] = int128_and(int128_rshift(Vd->Q(i), (imm - 1)),
+                                      int128_one());
+            shft_res[2 * i] = int128_add(int128_rshift(Vj->Q(i), imm),
+                                          r[2 * i]);
+            shft_res[2 * i + 1] = int128_add(int128_rshift(Vd->Q(i), imm),
+                                             r[2 * i + 1]);
+        }
+        if (int128_lt(Vj->Q(i), int128_zero())) {
+            shft_res[2 * i] = int128_zero();
+        }
+        if (int128_lt(Vd->Q(i), int128_zero())) {
+            shft_res[2 * i + 1] = int128_zero();
+        }
+        for (j = 2 * i; j <= 2 * i + 1; j++) {
+            if (int128_gt(shft_res[j], mask1)) {
+                Vd->D(j) = int128_getlo(mask1);
+            } else if (int128_lt(shft_res[j], int128_neg(mask2))) {
+                Vd->D(j) = int128_getlo(mask2);
+            } else {
+                Vd->D(j) = int128_getlo(shft_res[j]);
+            }
+        }
     }
 }
 
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 9c218abb6f..dc658fc2cb 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -489,6 +489,36 @@ TRANS(xvssrani_hu_w, LASX, gen_vv_i, 32, gen_helper_vssrani_hu_w)
 TRANS(xvssrani_wu_d, LASX, gen_vv_i, 32, gen_helper_vssrani_wu_d)
 TRANS(xvssrani_du_q, LASX, gen_vv_i, 32, gen_helper_vssrani_du_q)
 
+TRANS(xvssrlrn_b_h, LASX, gen_vvv, 32, gen_helper_vssrlrn_b_h)
+TRANS(xvssrlrn_h_w, LASX, gen_vvv, 32, gen_helper_vssrlrn_h_w)
+TRANS(xvssrlrn_w_d, LASX, gen_vvv, 32, gen_helper_vssrlrn_w_d)
+TRANS(xvssrarn_b_h, LASX, gen_vvv, 32, gen_helper_vssrarn_b_h)
+TRANS(xvssrarn_h_w, LASX, gen_vvv, 32, gen_helper_vssrarn_h_w)
+TRANS(xvssrarn_w_d, LASX, gen_vvv, 32, gen_helper_vssrarn_w_d)
+TRANS(xvssrlrn_bu_h, LASX, gen_vvv, 32, gen_helper_vssrlrn_bu_h)
+TRANS(xvssrlrn_hu_w, LASX, gen_vvv, 32, gen_helper_vssrlrn_hu_w)
+TRANS(xvssrlrn_wu_d, LASX, gen_vvv, 32, gen_helper_vssrlrn_wu_d)
+TRANS(xvssrarn_bu_h, LASX, gen_vvv, 32, gen_helper_vssrarn_bu_h)
+TRANS(xvssrarn_hu_w, LASX, gen_vvv, 32, gen_helper_vssrarn_hu_w)
+TRANS(xvssrarn_wu_d, LASX, gen_vvv, 32, gen_helper_vssrarn_wu_d)
+
+TRANS(xvssrlrni_b_h, LASX, gen_vv_i, 32, gen_helper_vssrlrni_b_h)
+TRANS(xvssrlrni_h_w, LASX, gen_vv_i, 32, gen_helper_vssrlrni_h_w)
+TRANS(xvssrlrni_w_d, LASX, gen_vv_i, 32, gen_helper_vssrlrni_w_d)
+TRANS(xvssrlrni_d_q, LASX, gen_vv_i, 32, gen_helper_vssrlrni_d_q)
+TRANS(xvssrarni_b_h, LASX, gen_vv_i, 32, gen_helper_vssrarni_b_h)
+TRANS(xvssrarni_h_w, LASX, gen_vv_i, 32, gen_helper_vssrarni_h_w)
+TRANS(xvssrarni_w_d, LASX, gen_vv_i, 32, gen_helper_vssrarni_w_d)
+TRANS(xvssrarni_d_q, LASX, gen_vv_i, 32, gen_helper_vssrarni_d_q)
+TRANS(xvssrlrni_bu_h, LASX, gen_vv_i, 32, gen_helper_vssrlrni_bu_h)
+TRANS(xvssrlrni_hu_w, LASX, gen_vv_i, 32, gen_helper_vssrlrni_hu_w)
+TRANS(xvssrlrni_wu_d, LASX, gen_vv_i, 32, gen_helper_vssrlrni_wu_d)
+TRANS(xvssrlrni_du_q, LASX, gen_vv_i, 32, gen_helper_vssrlrni_du_q)
+TRANS(xvssrarni_bu_h, LASX, gen_vv_i, 32, gen_helper_vssrarni_bu_h)
+TRANS(xvssrarni_hu_w, LASX, gen_vv_i, 32, gen_helper_vssrarni_hu_w)
+TRANS(xvssrarni_wu_d, LASX, gen_vv_i, 32, gen_helper_vssrarni_wu_d)
+TRANS(xvssrarni_du_q, LASX, gen_vv_i, 32, gen_helper_vssrarni_du_q)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 33/48] target/loongarch: Implement xvssrlrn xvssrarn
  2023-08-30  8:48 ` [PATCH v4 33/48] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
@ 2023-08-30 23:26   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:26 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
>   void HELPER(vssrarni_du_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
>   {
> -    Int128 shft_res1, shft_res2, mask1, mask2, r1, r2;
> +    int i, j;
> +    Int128 shft_res[4], mask1, mask2, r[4];

Likewise for the arrays.


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 34/48] target/loongarch: Implement xvclo xvclz
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (32 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 33/48] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:27   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 35/48] target/loongarch: Implement xvpcnt Song Gao
                   ` (13 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVCLO.{B/H/W/D};
- XVCLZ.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  9 +++++++++
 target/loongarch/insns.decode                |  9 +++++++++
 target/loongarch/disas.c                     |  9 +++++++++
 target/loongarch/vec_helper.c                | 13 ++-----------
 target/loongarch/insn_trans/trans_lasx.c.inc |  9 +++++++++
 5 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 67d829f9da..4497cd4a6d 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -76,4 +76,13 @@
 
 #define R_SHIFT(a, b) (a >> b)
 
+#define DO_CLO_B(N)  (clz32(~N & 0xff) - 24)
+#define DO_CLO_H(N)  (clz32(~N & 0xffff) - 16)
+#define DO_CLO_W(N)  (clz32(~N))
+#define DO_CLO_D(N)  (clz64(~N))
+#define DO_CLZ_B(N)  (clz32(N) - 24)
+#define DO_CLZ_H(N)  (clz32(N) - 16)
+#define DO_CLZ_W(N)  (clz32(N))
+#define DO_CLZ_D(N)  (clz64(N))
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index dc74bae7a5..3175532045 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1770,6 +1770,15 @@ xvssrarni_hu_w   0111 01110110 11001 ..... ..... .....    @vv_ui5
 xvssrarni_wu_d   0111 01110110 1101 ...... ..... .....    @vv_ui6
 xvssrarni_du_q   0111 01110110 111 ....... ..... .....    @vv_ui7
 
+xvclo_b          0111 01101001 11000 00000 ..... .....    @vv
+xvclo_h          0111 01101001 11000 00001 ..... .....    @vv
+xvclo_w          0111 01101001 11000 00010 ..... .....    @vv
+xvclo_d          0111 01101001 11000 00011 ..... .....    @vv
+xvclz_b          0111 01101001 11000 00100 ..... .....    @vv
+xvclz_h          0111 01101001 11000 00101 ..... .....    @vv
+xvclz_w          0111 01101001 11000 00110 ..... .....    @vv
+xvclz_d          0111 01101001 11000 00111 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index f043a2f9b6..0fc58735b9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2196,6 +2196,15 @@ INSN_LASX(xvssrarni_hu_w,    vv_i)
 INSN_LASX(xvssrarni_wu_d,    vv_i)
 INSN_LASX(xvssrarni_du_q,    vv_i)
 
+INSN_LASX(xvclo_b,           vv)
+INSN_LASX(xvclo_h,           vv)
+INSN_LASX(xvclo_w,           vv)
+INSN_LASX(xvclo_d,           vv)
+INSN_LASX(xvclz_b,           vv)
+INSN_LASX(xvclz_h,           vv)
+INSN_LASX(xvclz_w,           vv)
+INSN_LASX(xvclz_d,           vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 852c65716e..789f6b303e 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2161,22 +2161,13 @@ void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
     int i;                                           \
     VReg *Vd = (VReg *)vd;                           \
     VReg *Vj = (VReg *)vj;                           \
+    int oprsz = simd_oprsz(desc);                    \
                                                      \
-    for (i = 0; i < LSX_LEN/BIT; i++)                \
-    {                                                \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {        \
         Vd->E(i) = DO_OP(Vj->E(i));                  \
     }                                                \
 }
 
-#define DO_CLO_B(N)  (clz32(~N & 0xff) - 24)
-#define DO_CLO_H(N)  (clz32(~N & 0xffff) - 16)
-#define DO_CLO_W(N)  (clz32(~N))
-#define DO_CLO_D(N)  (clz64(~N))
-#define DO_CLZ_B(N)  (clz32(N) - 24)
-#define DO_CLZ_H(N)  (clz32(N) - 16)
-#define DO_CLZ_W(N)  (clz32(N))
-#define DO_CLZ_D(N)  (clz64(N))
-
 DO_2OP(vclo_b, 8, UB, DO_CLO_B)
 DO_2OP(vclo_h, 16, UH, DO_CLO_H)
 DO_2OP(vclo_w, 32, UW, DO_CLO_W)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index dc658fc2cb..4227fbe629 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -519,6 +519,15 @@ TRANS(xvssrarni_hu_w, LASX, gen_vv_i, 32, gen_helper_vssrarni_hu_w)
 TRANS(xvssrarni_wu_d, LASX, gen_vv_i, 32, gen_helper_vssrarni_wu_d)
 TRANS(xvssrarni_du_q, LASX, gen_vv_i, 32, gen_helper_vssrarni_du_q)
 
+TRANS(xvclo_b, LASX, gen_vv, 32, gen_helper_vclo_b)
+TRANS(xvclo_h, LASX, gen_vv, 32, gen_helper_vclo_h)
+TRANS(xvclo_w, LASX, gen_vv, 32, gen_helper_vclo_w)
+TRANS(xvclo_d, LASX, gen_vv, 32, gen_helper_vclo_d)
+TRANS(xvclz_b, LASX, gen_vv, 32, gen_helper_vclz_b)
+TRANS(xvclz_h, LASX, gen_vv, 32, gen_helper_vclz_h)
+TRANS(xvclz_w, LASX, gen_vv, 32, gen_helper_vclz_w)
+TRANS(xvclz_d, LASX, gen_vv, 32, gen_helper_vclz_d)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 34/48] target/loongarch: Implement xvclo xvclz
  2023-08-30  8:48 ` [PATCH v4 34/48] target/loongarch: Implement xvclo xvclz Song Gao
@ 2023-08-30 23:27   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:27 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVCLO.{B/H/W/D};
> - XVCLZ.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |  9 +++++++++
>   target/loongarch/insns.decode                |  9 +++++++++
>   target/loongarch/disas.c                     |  9 +++++++++
>   target/loongarch/vec_helper.c                | 13 ++-----------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  9 +++++++++
>   5 files changed, 38 insertions(+), 11 deletions(-)
> 
> diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
> index 67d829f9da..4497cd4a6d 100644
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -76,4 +76,13 @@
>   
>   #define R_SHIFT(a, b) (a >> b)
>   
> +#define DO_CLO_B(N)  (clz32(~N & 0xff) - 24)
> +#define DO_CLO_H(N)  (clz32(~N & 0xffff) - 16)
> +#define DO_CLO_W(N)  (clz32(~N))
> +#define DO_CLO_D(N)  (clz64(~N))
> +#define DO_CLZ_B(N)  (clz32(N) - 24)
> +#define DO_CLZ_H(N)  (clz32(N) - 16)
> +#define DO_CLZ_W(N)  (clz32(N))
> +#define DO_CLZ_D(N)  (clz64(N))
> +

Aside from this movement,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 35/48] target/loongarch: Implement xvpcnt
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (33 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 34/48] target/loongarch: Implement xvclo xvclz Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:28   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 36/48] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
                   ` (12 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- VPCNT.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                | 5 +++++
 target/loongarch/disas.c                     | 5 +++++
 target/loongarch/vec_helper.c                | 4 ++--
 target/loongarch/insn_trans/trans_lasx.c.inc | 5 +++++
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 3175532045..d683c6a6ab 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1779,6 +1779,11 @@ xvclz_h          0111 01101001 11000 00101 ..... .....    @vv
 xvclz_w          0111 01101001 11000 00110 ..... .....    @vv
 xvclz_d          0111 01101001 11000 00111 ..... .....    @vv
 
+xvpcnt_b         0111 01101001 11000 01000 ..... .....    @vv
+xvpcnt_h         0111 01101001 11000 01001 ..... .....    @vv
+xvpcnt_w         0111 01101001 11000 01010 ..... .....    @vv
+xvpcnt_d         0111 01101001 11000 01011 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0fc58735b9..9e31f9bbbc 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2205,6 +2205,11 @@ INSN_LASX(xvclz_h,           vv)
 INSN_LASX(xvclz_w,           vv)
 INSN_LASX(xvclz_d,           vv)
 
+INSN_LASX(xvpcnt_b,          vv)
+INSN_LASX(xvpcnt_h,          vv)
+INSN_LASX(xvpcnt_w,          vv)
+INSN_LASX(xvpcnt_d,          vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 789f6b303e..9c2b52fd7d 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2183,9 +2183,9 @@ void HELPER(NAME)(void *vd, void *vj, uint32_t desc) \
     int i;                                           \
     VReg *Vd = (VReg *)vd;                           \
     VReg *Vj = (VReg *)vj;                           \
+    int oprsz = simd_oprsz(desc);                    \
                                                      \
-    for (i = 0; i < LSX_LEN/BIT; i++)                \
-    {                                                \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {        \
         Vd->E(i) = FN(Vj->E(i));                     \
     }                                                \
 }
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 4227fbe629..2a24de178d 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -528,6 +528,11 @@ TRANS(xvclz_h, LASX, gen_vv, 32, gen_helper_vclz_h)
 TRANS(xvclz_w, LASX, gen_vv, 32, gen_helper_vclz_w)
 TRANS(xvclz_d, LASX, gen_vv, 32, gen_helper_vclz_d)
 
+TRANS(xvpcnt_b, LASX, gen_vv, 32, gen_helper_vpcnt_b)
+TRANS(xvpcnt_h, LASX, gen_vv, 32, gen_helper_vpcnt_h)
+TRANS(xvpcnt_w, LASX, gen_vv, 32, gen_helper_vpcnt_w)
+TRANS(xvpcnt_d, LASX, gen_vv, 32, gen_helper_vpcnt_d)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 35/48] target/loongarch: Implement xvpcnt
  2023-08-30  8:48 ` [PATCH v4 35/48] target/loongarch: Implement xvpcnt Song Gao
@ 2023-08-30 23:28   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:28 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - VPCNT.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                | 5 +++++
>   target/loongarch/disas.c                     | 5 +++++
>   target/loongarch/vec_helper.c                | 4 ++--
>   target/loongarch/insn_trans/trans_lasx.c.inc | 5 +++++
>   4 files changed, 17 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 36/48] target/loongarch: Implement xvbitclr xvbitset xvbitrev
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (34 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 35/48] target/loongarch: Implement xvpcnt Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:30   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 37/48] target/loongarch: Implement xvfrstp Song Gao
                   ` (11 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVBITCLR[I].{B/H/W/D};
- XVBITSET[I].{B/H/W/D};
- XVBITREV[I].{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |  4 ++
 target/loongarch/insns.decode                | 27 +++++++++++
 target/loongarch/disas.c                     | 25 ++++++++++
 target/loongarch/vec_helper.c                | 48 ++++++++++----------
 target/loongarch/insn_trans/trans_lasx.c.inc | 27 +++++++++++
 5 files changed, 106 insertions(+), 25 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index 4497cd4a6d..aae70f9de9 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -85,4 +85,8 @@
 #define DO_CLZ_W(N)  (clz32(N))
 #define DO_CLZ_D(N)  (clz64(N))
 
+#define DO_BITCLR(a, bit) (a & ~(1ull << bit))
+#define DO_BITSET(a, bit) (a | 1ull << bit)
+#define DO_BITREV(a, bit) (a ^ (1ull << bit))
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index d683c6a6ab..cb6db8002a 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1784,6 +1784,33 @@ xvpcnt_h         0111 01101001 11000 01001 ..... .....    @vv
 xvpcnt_w         0111 01101001 11000 01010 ..... .....    @vv
 xvpcnt_d         0111 01101001 11000 01011 ..... .....    @vv
 
+xvbitclr_b       0111 01010000 11000 ..... ..... .....    @vvv
+xvbitclr_h       0111 01010000 11001 ..... ..... .....    @vvv
+xvbitclr_w       0111 01010000 11010 ..... ..... .....    @vvv
+xvbitclr_d       0111 01010000 11011 ..... ..... .....    @vvv
+xvbitclri_b      0111 01110001 00000 01 ... ..... .....   @vv_ui3
+xvbitclri_h      0111 01110001 00000 1 .... ..... .....   @vv_ui4
+xvbitclri_w      0111 01110001 00001 ..... ..... .....    @vv_ui5
+xvbitclri_d      0111 01110001 0001 ...... ..... .....    @vv_ui6
+
+xvbitset_b       0111 01010000 11100 ..... ..... .....    @vvv
+xvbitset_h       0111 01010000 11101 ..... ..... .....    @vvv
+xvbitset_w       0111 01010000 11110 ..... ..... .....    @vvv
+xvbitset_d       0111 01010000 11111 ..... ..... .....    @vvv
+xvbitseti_b      0111 01110001 01000 01 ... ..... .....   @vv_ui3
+xvbitseti_h      0111 01110001 01000 1 .... ..... .....   @vv_ui4
+xvbitseti_w      0111 01110001 01001 ..... ..... .....    @vv_ui5
+xvbitseti_d      0111 01110001 0101 ...... ..... .....    @vv_ui6
+
+xvbitrev_b       0111 01010001 00000 ..... ..... .....    @vvv
+xvbitrev_h       0111 01010001 00001 ..... ..... .....    @vvv
+xvbitrev_w       0111 01010001 00010 ..... ..... .....    @vvv
+xvbitrev_d       0111 01010001 00011 ..... ..... .....    @vvv
+xvbitrevi_b      0111 01110001 10000 01 ... ..... .....   @vv_ui3
+xvbitrevi_h      0111 01110001 10000 1 .... ..... .....   @vv_ui4
+xvbitrevi_w      0111 01110001 10001 ..... ..... .....    @vv_ui5
+xvbitrevi_d      0111 01110001 1001 ...... ..... .....    @vv_ui6
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 9e31f9bbbc..dad9243fd7 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2210,6 +2210,31 @@ INSN_LASX(xvpcnt_h,          vv)
 INSN_LASX(xvpcnt_w,          vv)
 INSN_LASX(xvpcnt_d,          vv)
 
+INSN_LASX(xvbitclr_b,        vvv)
+INSN_LASX(xvbitclr_h,        vvv)
+INSN_LASX(xvbitclr_w,        vvv)
+INSN_LASX(xvbitclr_d,        vvv)
+INSN_LASX(xvbitclri_b,       vv_i)
+INSN_LASX(xvbitclri_h,       vv_i)
+INSN_LASX(xvbitclri_w,       vv_i)
+INSN_LASX(xvbitclri_d,       vv_i)
+INSN_LASX(xvbitset_b,        vvv)
+INSN_LASX(xvbitset_h,        vvv)
+INSN_LASX(xvbitset_w,        vvv)
+INSN_LASX(xvbitset_d,        vvv)
+INSN_LASX(xvbitseti_b,       vv_i)
+INSN_LASX(xvbitseti_h,       vv_i)
+INSN_LASX(xvbitseti_w,       vv_i)
+INSN_LASX(xvbitseti_d,       vv_i)
+INSN_LASX(xvbitrev_b,        vvv)
+INSN_LASX(xvbitrev_h,        vvv)
+INSN_LASX(xvbitrev_w,        vvv)
+INSN_LASX(xvbitrev_d,        vvv)
+INSN_LASX(xvbitrevi_b,       vv_i)
+INSN_LASX(xvbitrevi_h,       vv_i)
+INSN_LASX(xvbitrevi_w,       vv_i)
+INSN_LASX(xvbitrevi_d,       vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 9c2b52fd7d..03b42dc887 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2195,21 +2195,18 @@ VPCNT(vpcnt_h, 16, UH, ctpop16)
 VPCNT(vpcnt_w, 32, UW, ctpop32)
 VPCNT(vpcnt_d, 64, UD, ctpop64)
 
-#define DO_BITCLR(a, bit) (a & ~(1ull << bit))
-#define DO_BITSET(a, bit) (a | 1ull << bit)
-#define DO_BITREV(a, bit) (a ^ (1ull << bit))
-
-#define DO_BIT(NAME, BIT, E, DO_OP)                         \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t v) \
-{                                                           \
-    int i;                                                  \
-    VReg *Vd = (VReg *)vd;                                  \
-    VReg *Vj = (VReg *)vj;                                  \
-    VReg *Vk = (VReg *)vk;                                  \
-                                                            \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
-        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i)%BIT);           \
-    }                                                       \
+#define DO_BIT(NAME, BIT, E, DO_OP)                            \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
+{                                                              \
+    int i;                                                     \
+    VReg *Vd = (VReg *)vd;                                     \
+    VReg *Vj = (VReg *)vj;                                     \
+    VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
+                                                               \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
+        Vd->E(i) = DO_OP(Vj->E(i), Vk->E(i) % BIT);            \
+    }                                                          \
 }
 
 DO_BIT(vbitclr_b, 8, UB, DO_BITCLR)
@@ -2225,16 +2222,17 @@ DO_BIT(vbitrev_h, 16, UH, DO_BITREV)
 DO_BIT(vbitrev_w, 32, UW, DO_BITREV)
 DO_BIT(vbitrev_d, 64, UD, DO_BITREV)
 
-#define DO_BITI(NAME, BIT, E, DO_OP)                            \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
-{                                                               \
-    int i;                                                      \
-    VReg *Vd = (VReg *)vd;                                      \
-    VReg *Vj = (VReg *)vj;                                      \
-                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
-        Vd->E(i) = DO_OP(Vj->E(i), imm);                        \
-    }                                                           \
+#define DO_BITI(NAME, BIT, E, DO_OP)                               \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    int oprsz = simd_oprsz(desc);                                  \
+                                                                   \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
+        Vd->E(i) = DO_OP(Vj->E(i), imm);                           \
+    }                                                              \
 }
 
 DO_BITI(vbitclri_b, 8, UB, DO_BITCLR)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 2a24de178d..92c6506e04 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -533,6 +533,33 @@ TRANS(xvpcnt_h, LASX, gen_vv, 32, gen_helper_vpcnt_h)
 TRANS(xvpcnt_w, LASX, gen_vv, 32, gen_helper_vpcnt_w)
 TRANS(xvpcnt_d, LASX, gen_vv, 32, gen_helper_vpcnt_d)
 
+TRANS(xvbitclr_b, LASX, gvec_vvv, 32, MO_8, do_vbitclr)
+TRANS(xvbitclr_h, LASX, gvec_vvv, 32, MO_16, do_vbitclr)
+TRANS(xvbitclr_w, LASX, gvec_vvv, 32, MO_32, do_vbitclr)
+TRANS(xvbitclr_d, LASX, gvec_vvv, 32, MO_64, do_vbitclr)
+TRANS(xvbitclri_b, LASX, gvec_vv_i, 32, MO_8, do_vbitclri)
+TRANS(xvbitclri_h, LASX, gvec_vv_i, 32, MO_16, do_vbitclri)
+TRANS(xvbitclri_w, LASX, gvec_vv_i, 32, MO_32, do_vbitclri)
+TRANS(xvbitclri_d, LASX, gvec_vv_i, 32, MO_64, do_vbitclri)
+
+TRANS(xvbitset_b, LASX, gvec_vvv, 32, MO_8, do_vbitset)
+TRANS(xvbitset_h, LASX, gvec_vvv, 32, MO_16, do_vbitset)
+TRANS(xvbitset_w, LASX, gvec_vvv, 32, MO_32, do_vbitset)
+TRANS(xvbitset_d, LASX, gvec_vvv, 32, MO_64, do_vbitset)
+TRANS(xvbitseti_b, LASX, gvec_vv_i, 32, MO_8, do_vbitseti)
+TRANS(xvbitseti_h, LASX, gvec_vv_i, 32, MO_16, do_vbitseti)
+TRANS(xvbitseti_w, LASX, gvec_vv_i, 32, MO_32, do_vbitseti)
+TRANS(xvbitseti_d, LASX, gvec_vv_i, 32, MO_64, do_vbitseti)
+
+TRANS(xvbitrev_b, LASX, gvec_vvv, 32, MO_8, do_vbitrev)
+TRANS(xvbitrev_h, LASX, gvec_vvv, 32, MO_16, do_vbitrev)
+TRANS(xvbitrev_w, LASX, gvec_vvv, 32, MO_32, do_vbitrev)
+TRANS(xvbitrev_d, LASX, gvec_vvv, 32, MO_64, do_vbitrev)
+TRANS(xvbitrevi_b, LASX, gvec_vv_i, 32, MO_8, do_vbitrevi)
+TRANS(xvbitrevi_h, LASX, gvec_vv_i, 32, MO_16, do_vbitrevi)
+TRANS(xvbitrevi_w, LASX, gvec_vv_i, 32, MO_32, do_vbitrevi)
+TRANS(xvbitrevi_d, LASX, gvec_vv_i, 32, MO_64, do_vbitrevi)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 36/48] target/loongarch: Implement xvbitclr xvbitset xvbitrev
  2023-08-30  8:48 ` [PATCH v4 36/48] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
@ 2023-08-30 23:30   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:30 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVBITCLR[I].{B/H/W/D};
> - XVBITSET[I].{B/H/W/D};
> - XVBITREV[I].{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |  4 ++
>   target/loongarch/insns.decode                | 27 +++++++++++
>   target/loongarch/disas.c                     | 25 ++++++++++
>   target/loongarch/vec_helper.c                | 48 ++++++++++----------
>   target/loongarch/insn_trans/trans_lasx.c.inc | 27 +++++++++++
>   5 files changed, 106 insertions(+), 25 deletions(-)
> 
> diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
> index 4497cd4a6d..aae70f9de9 100644
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -85,4 +85,8 @@
>   #define DO_CLZ_W(N)  (clz32(N))
>   #define DO_CLZ_D(N)  (clz64(N))
>   
> +#define DO_BITCLR(a, bit) (a & ~(1ull << bit))
> +#define DO_BITSET(a, bit) (a | 1ull << bit)
> +#define DO_BITREV(a, bit) (a ^ (1ull << bit))
> +


Aside from this movement,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 37/48] target/loongarch: Implement xvfrstp
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (35 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 36/48] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:34   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 38/48] target/loongarch: Implement LASX fpu arith instructions Song Gao
                   ` (10 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVFRSTP[I].{B/H}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  5 ++
 target/loongarch/disas.c                     |  5 ++
 target/loongarch/vec_helper.c                | 48 ++++++++++++--------
 target/loongarch/insn_trans/trans_lasx.c.inc |  5 ++
 4 files changed, 43 insertions(+), 20 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index cb6db8002a..6035fe139c 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1811,6 +1811,11 @@ xvbitrevi_h      0111 01110001 10000 1 .... ..... .....   @vv_ui4
 xvbitrevi_w      0111 01110001 10001 ..... ..... .....    @vv_ui5
 xvbitrevi_d      0111 01110001 1001 ...... ..... .....    @vv_ui6
 
+xvfrstp_b        0111 01010010 10110 ..... ..... .....    @vvv
+xvfrstp_h        0111 01010010 10111 ..... ..... .....    @vvv
+xvfrstpi_b       0111 01101001 10100 ..... ..... .....    @vv_ui5
+xvfrstpi_h       0111 01101001 10101 ..... ..... .....    @vv_ui5
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index dad9243fd7..27d6252686 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2235,6 +2235,11 @@ INSN_LASX(xvbitrevi_h,       vv_i)
 INSN_LASX(xvbitrevi_w,       vv_i)
 INSN_LASX(xvbitrevi_d,       vv_i)
 
+INSN_LASX(xvfrstp_b,         vvv)
+INSN_LASX(xvfrstp_h,         vvv)
+INSN_LASX(xvfrstpi_b,        vv_i)
+INSN_LASX(xvfrstpi_h,        vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 03b42dc887..5c53cc8962 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2251,37 +2251,45 @@ DO_BITI(vbitrevi_d, 64, UD, DO_BITREV)
 #define VFRSTP(NAME, BIT, MASK, E)                             \
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                              \
-    int i, m;                                                  \
+    int i, j, m, ofs;                                          \
     VReg *Vd = (VReg *)vd;                                     \
     VReg *Vj = (VReg *)vj;                                     \
     VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
-        if (Vj->E(i) < 0) {                                    \
-            break;                                             \
+    ofs = LSX_LEN / BIT;                                       \
+    for (i = 0; i < oprsz / 16; i++) {                         \
+        m = Vk->E(i * ofs) & MASK;                             \
+        for (j = 0; j < ofs; j++) {                            \
+            if (Vj->E(j + ofs * i) < 0) {                      \
+                break;                                         \
+            }                                                  \
         }                                                      \
+        Vd->E(m + i * ofs) = j;                                \
     }                                                          \
-    m = Vk->E(0) & MASK;                                       \
-    Vd->E(m) = i;                                              \
 }
 
 VFRSTP(vfrstp_b, 8, 0xf, B)
 VFRSTP(vfrstp_h, 16, 0x7, H)
 
-#define VFRSTPI(NAME, BIT, E)                                     \
-void HELPER(NAME)(void *vd, void vj, uint64_t imm, uint32_t desc) \
-{                                                                 \
-    int i, m;                                                     \
-    VReg *Vd = (VReg *)vd;                                        \
-    VReg *Vj = (VReg *)vj;                                        \
-                                                                  \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                           \
-        if (Vj->E(i) < 0) {                                       \
-            break;                                                \
-        }                                                         \
-    }                                                             \
-    m = imm % (LSX_LEN/BIT);                                      \
-    Vd->E(m) = i;                                                 \
+#define VFRSTPI(NAME, BIT, E)                                       \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc)  \
+{                                                                   \
+    int i, j, m, ofs;                                               \
+    VReg *Vd = (VReg *)vd;                                          \
+    VReg *Vj = (VReg *)vj;                                          \
+    int oprsz = simd_oprsz(desc);                                   \
+                                                                    \
+    ofs = LSX_LEN / BIT;                                            \
+    m = imm % ofs;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                              \
+        for (j = 0; j < ofs; j++) {                                 \
+            if (Vj->E(j + ofs * i) < 0) {                           \
+                break;                                              \
+            }                                                       \
+        }                                                           \
+        Vd->E(m + i * ofs) = j;                                     \
+    }                                                               \
 }
 
 VFRSTPI(vfrstpi_b, 8,  B)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 92c6506e04..8a7d1b41e1 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -560,6 +560,11 @@ TRANS(xvbitrevi_h, LASX, gvec_vv_i, 32, MO_16, do_vbitrevi)
 TRANS(xvbitrevi_w, LASX, gvec_vv_i, 32, MO_32, do_vbitrevi)
 TRANS(xvbitrevi_d, LASX, gvec_vv_i, 32, MO_64, do_vbitrevi)
 
+TRANS(xvfrstp_b, LASX, gen_vvv, 32, gen_helper_vfrstp_b)
+TRANS(xvfrstp_h, LASX, gen_vvv, 32, gen_helper_vfrstp_h)
+TRANS(xvfrstpi_b, LASX, gen_vv_i, 32, gen_helper_vfrstpi_b)
+TRANS(xvfrstpi_h, LASX, gen_vv_i, 32, gen_helper_vfrstpi_h)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 37/48] target/loongarch: Implement xvfrstp
  2023-08-30  8:48 ` [PATCH v4 37/48] target/loongarch: Implement xvfrstp Song Gao
@ 2023-08-30 23:34   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:34 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVFRSTP[I].{B/H}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  5 ++
>   target/loongarch/disas.c                     |  5 ++
>   target/loongarch/vec_helper.c                | 48 ++++++++++++--------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  5 ++
>   4 files changed, 43 insertions(+), 20 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 38/48] target/loongarch: Implement LASX fpu arith instructions
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (36 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 37/48] target/loongarch: Implement xvfrstp Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:37   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 39/48] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
                   ` (9 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVF{ADD/SUB/MUL/DIV}.{S/D};
- XVF{MADD/MSUB/NMADD/NMSUB}.{S/D};
- XVF{MAX/MIN}.{S/D};
- XVF{MAXA/MINA}.{S/D};
- XVFLOGB.{S/D};
- XVFCLASS.{S/D};
- XVF{SQRT/RECIP/RSQRT}.{S/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                | 41 ++++++++++
 target/loongarch/disas.c                     | 46 +++++++++++
 target/loongarch/vec_helper.c                | 82 +++++++++++---------
 target/loongarch/insn_trans/trans_lasx.c.inc | 41 ++++++++++
 4 files changed, 172 insertions(+), 38 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 6035fe139c..4224b0a4b1 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1816,6 +1816,47 @@ xvfrstp_h        0111 01010010 10111 ..... ..... .....    @vvv
 xvfrstpi_b       0111 01101001 10100 ..... ..... .....    @vv_ui5
 xvfrstpi_h       0111 01101001 10101 ..... ..... .....    @vv_ui5
 
+xvfadd_s         0111 01010011 00001 ..... ..... .....    @vvv
+xvfadd_d         0111 01010011 00010 ..... ..... .....    @vvv
+xvfsub_s         0111 01010011 00101 ..... ..... .....    @vvv
+xvfsub_d         0111 01010011 00110 ..... ..... .....    @vvv
+xvfmul_s         0111 01010011 10001 ..... ..... .....    @vvv
+xvfmul_d         0111 01010011 10010 ..... ..... .....    @vvv
+xvfdiv_s         0111 01010011 10101 ..... ..... .....    @vvv
+xvfdiv_d         0111 01010011 10110 ..... ..... .....    @vvv
+
+xvfmadd_s        0000 10100001 ..... ..... ..... .....    @vvvv
+xvfmadd_d        0000 10100010 ..... ..... ..... .....    @vvvv
+xvfmsub_s        0000 10100101 ..... ..... ..... .....    @vvvv
+xvfmsub_d        0000 10100110 ..... ..... ..... .....    @vvvv
+xvfnmadd_s       0000 10101001 ..... ..... ..... .....    @vvvv
+xvfnmadd_d       0000 10101010 ..... ..... ..... .....    @vvvv
+xvfnmsub_s       0000 10101101 ..... ..... ..... .....    @vvvv
+xvfnmsub_d       0000 10101110 ..... ..... ..... .....    @vvvv
+
+xvfmax_s         0111 01010011 11001 ..... ..... .....    @vvv
+xvfmax_d         0111 01010011 11010 ..... ..... .....    @vvv
+xvfmin_s         0111 01010011 11101 ..... ..... .....    @vvv
+xvfmin_d         0111 01010011 11110 ..... ..... .....    @vvv
+
+xvfmaxa_s        0111 01010100 00001 ..... ..... .....    @vvv
+xvfmaxa_d        0111 01010100 00010 ..... ..... .....    @vvv
+xvfmina_s        0111 01010100 00101 ..... ..... .....    @vvv
+xvfmina_d        0111 01010100 00110 ..... ..... .....    @vvv
+
+xvflogb_s        0111 01101001 11001 10001 ..... .....    @vv
+xvflogb_d        0111 01101001 11001 10010 ..... .....    @vv
+
+xvfclass_s       0111 01101001 11001 10101 ..... .....    @vv
+xvfclass_d       0111 01101001 11001 10110 ..... .....    @vv
+
+xvfsqrt_s        0111 01101001 11001 11001 ..... .....    @vv
+xvfsqrt_d        0111 01101001 11001 11010 ..... .....    @vv
+xvfrecip_s       0111 01101001 11001 11101 ..... .....    @vv
+xvfrecip_d       0111 01101001 11001 11110 ..... .....    @vv
+xvfrsqrt_s       0111 01101001 11010 00001 ..... .....    @vv
+xvfrsqrt_d       0111 01101001 11010 00010 ..... .....    @vv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 27d6252686..4af74f1ae9 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1708,6 +1708,11 @@ static void output_v_i_x(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
     output(ctx, mnemonic, "x%d, 0x%x", a->vd, a->imm);
 }
 
+static void output_vvvv_x(DisasContext *ctx, arg_vvvv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, x%d, x%d, x%d", a->vd, a->vj, a->vk, a->va);
+}
+
 static void output_vvv_x(DisasContext *ctx, arg_vvv * a, const char *mnemonic)
 {
     output(ctx, mnemonic, "x%d, x%d, x%d", a->vd, a->vj, a->vk);
@@ -2240,6 +2245,47 @@ INSN_LASX(xvfrstp_h,         vvv)
 INSN_LASX(xvfrstpi_b,        vv_i)
 INSN_LASX(xvfrstpi_h,        vv_i)
 
+INSN_LASX(xvfadd_s,          vvv)
+INSN_LASX(xvfadd_d,          vvv)
+INSN_LASX(xvfsub_s,          vvv)
+INSN_LASX(xvfsub_d,          vvv)
+INSN_LASX(xvfmul_s,          vvv)
+INSN_LASX(xvfmul_d,          vvv)
+INSN_LASX(xvfdiv_s,          vvv)
+INSN_LASX(xvfdiv_d,          vvv)
+
+INSN_LASX(xvfmadd_s,         vvvv)
+INSN_LASX(xvfmadd_d,         vvvv)
+INSN_LASX(xvfmsub_s,         vvvv)
+INSN_LASX(xvfmsub_d,         vvvv)
+INSN_LASX(xvfnmadd_s,        vvvv)
+INSN_LASX(xvfnmadd_d,        vvvv)
+INSN_LASX(xvfnmsub_s,        vvvv)
+INSN_LASX(xvfnmsub_d,        vvvv)
+
+INSN_LASX(xvfmax_s,          vvv)
+INSN_LASX(xvfmax_d,          vvv)
+INSN_LASX(xvfmin_s,          vvv)
+INSN_LASX(xvfmin_d,          vvv)
+
+INSN_LASX(xvfmaxa_s,         vvv)
+INSN_LASX(xvfmaxa_d,         vvv)
+INSN_LASX(xvfmina_s,         vvv)
+INSN_LASX(xvfmina_d,         vvv)
+
+INSN_LASX(xvflogb_s,         vv)
+INSN_LASX(xvflogb_d,         vv)
+
+INSN_LASX(xvfclass_s,        vv)
+INSN_LASX(xvfclass_d,        vv)
+
+INSN_LASX(xvfsqrt_s,         vv)
+INSN_LASX(xvfsqrt_d,         vv)
+INSN_LASX(xvfrecip_s,        vv)
+INSN_LASX(xvfrecip_d,        vv)
+INSN_LASX(xvfrsqrt_s,        vv)
+INSN_LASX(xvfrsqrt_d,        vv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 5c53cc8962..684b023ee5 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2334,9 +2334,10 @@ void HELPER(NAME)(void *vd, void *vj, void *vk,             \
     VReg *Vd = (VReg *)vd;                                  \
     VReg *Vj = (VReg *)vj;                                  \
     VReg *Vk = (VReg *)vk;                                  \
+    int oprsz = simd_oprsz(desc);                           \
                                                             \
     vec_clear_cause(env);                                   \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                     \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {               \
         Vd->E(i) = FN(Vj->E(i), Vk->E(i), &env->fp_status); \
         vec_update_fcsr0(env, GETPC());                     \
     }                                                       \
@@ -2368,9 +2369,10 @@ void HELPER(NAME)(void *vd, void *vj, void *vk, void *va,                    \
     VReg *Vj = (VReg *)vj;                                                   \
     VReg *Vk = (VReg *)vk;                                                   \
     VReg *Va = (VReg *)va;                                                   \
+    int oprsz = simd_oprsz(desc);                                            \
                                                                              \
     vec_clear_cause(env);                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                                \
         Vd->E(i) = FN(Vj->E(i), Vk->E(i), Va->E(i), flags, &env->fp_status); \
         vec_update_fcsr0(env, GETPC());                                      \
     }                                                                        \
@@ -2387,47 +2389,51 @@ DO_4OP_F(vfnmsub_s, 32, UW, float32_muladd,
 DO_4OP_F(vfnmsub_d, 64, UD, float64_muladd,
          float_muladd_negate_c | float_muladd_negate_result)
 
-#define DO_2OP_F(NAME, BIT, E, FN)                                           \
-void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
-{                                                                            \
-    int i;                                                                   \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    vec_clear_cause(env);                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
-        Vd->E(i) = FN(env, Vj->E(i));                                        \
-    }                                                                        \
-}
-
-#define FLOGB(BIT, T)                                            \
-static T do_flogb_## BIT(CPULoongArchState *env, T fj)           \
-{                                                                \
-    T fp, fd;                                                    \
-    float_status *status = &env->fp_status;                      \
-    FloatRoundMode old_mode = get_float_rounding_mode(status);   \
-                                                                 \
-    set_float_rounding_mode(float_round_down, status);           \
-    fp = float ## BIT ##_log2(fj, status);                       \
-    fd = float ## BIT ##_round_to_int(fp, status);               \
-    set_float_rounding_mode(old_mode, status);                   \
-    vec_update_fcsr0_mask(env, GETPC(), float_flag_inexact);     \
-    return fd;                                                   \
+#define DO_2OP_F(NAME, BIT, E, FN)                       \
+void HELPER(NAME)(void *vd, void * vj,                   \
+                  CPULoongArchState *env, uint32_t desc) \
+{                                                        \
+    int i;                                               \
+    VReg *Vd = (VReg *)vd;                               \
+    VReg *Vj = (VReg *)vj;                               \
+    int oprsz = simd_oprsz(desc);                        \
+                                                         \
+    vec_clear_cause(env);                                \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {            \
+        Vd->E(i) = FN(env, Vj->E(i));                    \
+    }                                                    \
+}
+
+#define FLOGB(BIT, T)                                          \
+static T do_flogb_## BIT(CPULoongArchState *env, T fj)         \
+{                                                              \
+    T fp, fd;                                                  \
+    float_status *status = &env->fp_status;                    \
+    FloatRoundMode old_mode = get_float_rounding_mode(status); \
+                                                               \
+    set_float_rounding_mode(float_round_down, status);         \
+    fp = float ## BIT ##_log2(fj, status);                     \
+    fd = float ## BIT ##_round_to_int(fp, status);             \
+    set_float_rounding_mode(old_mode, status);                 \
+    vec_update_fcsr0_mask(env, GETPC(), float_flag_inexact);   \
+    return fd;                                                 \
 }
 
 FLOGB(32, uint32_t)
 FLOGB(64, uint64_t)
 
-#define FCLASS(NAME, BIT, E, FN)                                             \
-void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
-{                                                                            \
-    int i;                                                                   \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
-        Vd->E(i) = FN(env, Vj->E(i));                                        \
-    }                                                                        \
+#define FCLASS(NAME, BIT, E, FN)                        \
+void HELPER(NAME)(void *vd, void* vj,                   \
+                  CPULoongArchState *env,uint32_t desc) \
+{                                                       \
+    int i;                                              \
+    VReg *Vd = (VReg *)vd;                              \
+    VReg *Vj = (VReg *)vj;                              \
+    int oprsz = simd_oprsz(desc);                       \
+                                                        \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {           \
+        Vd->E(i) = FN(env, Vj->E(i));                   \
+    }                                                   \
 }
 
 FCLASS(vfclass_s, 32, UW, helper_fclass_s)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 8a7d1b41e1..b1b1fb939b 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -565,6 +565,47 @@ TRANS(xvfrstp_h, LASX, gen_vvv, 32, gen_helper_vfrstp_h)
 TRANS(xvfrstpi_b, LASX, gen_vv_i, 32, gen_helper_vfrstpi_b)
 TRANS(xvfrstpi_h, LASX, gen_vv_i, 32, gen_helper_vfrstpi_h)
 
+TRANS(xvfadd_s, LASX, gen_vvv_f, 32, gen_helper_vfadd_s)
+TRANS(xvfadd_d, LASX, gen_vvv_f, 32, gen_helper_vfadd_d)
+TRANS(xvfsub_s, LASX, gen_vvv_f, 32, gen_helper_vfsub_s)
+TRANS(xvfsub_d, LASX, gen_vvv_f, 32, gen_helper_vfsub_d)
+TRANS(xvfmul_s, LASX, gen_vvv_f, 32, gen_helper_vfmul_s)
+TRANS(xvfmul_d, LASX, gen_vvv_f, 32, gen_helper_vfmul_d)
+TRANS(xvfdiv_s, LASX, gen_vvv_f, 32, gen_helper_vfdiv_s)
+TRANS(xvfdiv_d, LASX, gen_vvv_f, 32, gen_helper_vfdiv_d)
+
+TRANS(xvfmadd_s, LASX, gen_vvvv_f, 32, gen_helper_vfmadd_s)
+TRANS(xvfmadd_d, LASX, gen_vvvv_f, 32, gen_helper_vfmadd_d)
+TRANS(xvfmsub_s, LASX, gen_vvvv_f, 32, gen_helper_vfmsub_s)
+TRANS(xvfmsub_d, LASX, gen_vvvv_f, 32, gen_helper_vfmsub_d)
+TRANS(xvfnmadd_s, LASX, gen_vvvv_f, 32, gen_helper_vfnmadd_s)
+TRANS(xvfnmadd_d, LASX, gen_vvvv_f, 32, gen_helper_vfnmadd_d)
+TRANS(xvfnmsub_s, LASX, gen_vvvv_f, 32, gen_helper_vfnmsub_s)
+TRANS(xvfnmsub_d, LASX, gen_vvvv_f, 32, gen_helper_vfnmsub_d)
+
+TRANS(xvfmax_s, LASX, gen_vvv_f, 32, gen_helper_vfmax_s)
+TRANS(xvfmax_d, LASX, gen_vvv_f, 32, gen_helper_vfmax_d)
+TRANS(xvfmin_s, LASX, gen_vvv_f, 32, gen_helper_vfmin_s)
+TRANS(xvfmin_d, LASX, gen_vvv_f, 32, gen_helper_vfmin_d)
+
+TRANS(xvfmaxa_s, LASX, gen_vvv_f, 32, gen_helper_vfmaxa_s)
+TRANS(xvfmaxa_d, LASX, gen_vvv_f, 32, gen_helper_vfmaxa_d)
+TRANS(xvfmina_s, LASX, gen_vvv_f, 32, gen_helper_vfmina_s)
+TRANS(xvfmina_d, LASX, gen_vvv_f, 32, gen_helper_vfmina_d)
+
+TRANS(xvflogb_s, LASX, gen_vv_f, 32, gen_helper_vflogb_s)
+TRANS(xvflogb_d, LASX, gen_vv_f, 32, gen_helper_vflogb_d)
+
+TRANS(xvfclass_s, LASX, gen_vv_f, 32, gen_helper_vfclass_s)
+TRANS(xvfclass_d, LASX, gen_vv_f, 32, gen_helper_vfclass_d)
+
+TRANS(xvfsqrt_s, LASX, gen_vv_f, 32, gen_helper_vfsqrt_s)
+TRANS(xvfsqrt_d, LASX, gen_vv_f, 32, gen_helper_vfsqrt_d)
+TRANS(xvfrecip_s, LASX, gen_vv_f, 32, gen_helper_vfrecip_s)
+TRANS(xvfrecip_d, LASX, gen_vv_f, 32, gen_helper_vfrecip_d)
+TRANS(xvfrsqrt_s, LASX, gen_vv_f, 32, gen_helper_vfrsqrt_s)
+TRANS(xvfrsqrt_d, LASX, gen_vv_f, 32, gen_helper_vfrsqrt_d)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 38/48] target/loongarch: Implement LASX fpu arith instructions
  2023-08-30  8:48 ` [PATCH v4 38/48] target/loongarch: Implement LASX fpu arith instructions Song Gao
@ 2023-08-30 23:37   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:37 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVF{ADD/SUB/MUL/DIV}.{S/D};
> - XVF{MADD/MSUB/NMADD/NMSUB}.{S/D};
> - XVF{MAX/MIN}.{S/D};
> - XVF{MAXA/MINA}.{S/D};
> - XVFLOGB.{S/D};
> - XVFCLASS.{S/D};
> - XVF{SQRT/RECIP/RSQRT}.{S/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                | 41 ++++++++++
>   target/loongarch/disas.c                     | 46 +++++++++++
>   target/loongarch/vec_helper.c                | 82 +++++++++++---------
>   target/loongarch/insn_trans/trans_lasx.c.inc | 41 ++++++++++
>   4 files changed, 172 insertions(+), 38 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 39/48] target/loongarch: Implement LASX fpu fcvt instructions
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (37 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 38/48] target/loongarch: Implement LASX fpu arith instructions Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:40   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 40/48] target/loongarch: Implement xvseq xvsle xvslt Song Gao
                   ` (8 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVFCVT{L/H}.{S.H/D.S};
- XVFCVT.{H.S/S.D};
- XVFRINT[{RNE/RZ/RP/RM}].{S/D};
- XVFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
- XVFTINT[RZ].{WU.S/LU.D};
- XVFTINT[{RNE/RZ/RP/RM}].W.D;
- XVFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
- XVFFINT.{S.W/D.L}[U];
- X[CVFFINT.S.L, VFFINT{L/H}.D.W.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  58 ++++
 target/loongarch/disas.c                     |  56 ++++
 target/loongarch/vec_helper.c                | 263 ++++++++++++-------
 target/loongarch/insn_trans/trans_lasx.c.inc |  56 ++++
 4 files changed, 335 insertions(+), 98 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 4224b0a4b1..ed4f82e7fe 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1857,6 +1857,64 @@ xvfrecip_d       0111 01101001 11001 11110 ..... .....    @vv
 xvfrsqrt_s       0111 01101001 11010 00001 ..... .....    @vv
 xvfrsqrt_d       0111 01101001 11010 00010 ..... .....    @vv
 
+xvfcvtl_s_h      0111 01101001 11011 11010 ..... .....    @vv
+xvfcvth_s_h      0111 01101001 11011 11011 ..... .....    @vv
+xvfcvtl_d_s      0111 01101001 11011 11100 ..... .....    @vv
+xvfcvth_d_s      0111 01101001 11011 11101 ..... .....    @vv
+xvfcvt_h_s       0111 01010100 01100 ..... ..... .....    @vvv
+xvfcvt_s_d       0111 01010100 01101 ..... ..... .....    @vvv
+
+xvfrintrne_s     0111 01101001 11010 11101 ..... .....    @vv
+xvfrintrne_d     0111 01101001 11010 11110 ..... .....    @vv
+xvfrintrz_s      0111 01101001 11010 11001 ..... .....    @vv
+xvfrintrz_d      0111 01101001 11010 11010 ..... .....    @vv
+xvfrintrp_s      0111 01101001 11010 10101 ..... .....    @vv
+xvfrintrp_d      0111 01101001 11010 10110 ..... .....    @vv
+xvfrintrm_s      0111 01101001 11010 10001 ..... .....    @vv
+xvfrintrm_d      0111 01101001 11010 10010 ..... .....    @vv
+xvfrint_s        0111 01101001 11010 01101 ..... .....    @vv
+xvfrint_d        0111 01101001 11010 01110 ..... .....    @vv
+
+xvftintrne_w_s   0111 01101001 11100 10100 ..... .....    @vv
+xvftintrne_l_d   0111 01101001 11100 10101 ..... .....    @vv
+xvftintrz_w_s    0111 01101001 11100 10010 ..... .....    @vv
+xvftintrz_l_d    0111 01101001 11100 10011 ..... .....    @vv
+xvftintrp_w_s    0111 01101001 11100 10000 ..... .....    @vv
+xvftintrp_l_d    0111 01101001 11100 10001 ..... .....    @vv
+xvftintrm_w_s    0111 01101001 11100 01110 ..... .....    @vv
+xvftintrm_l_d    0111 01101001 11100 01111 ..... .....    @vv
+xvftint_w_s      0111 01101001 11100 01100 ..... .....    @vv
+xvftint_l_d      0111 01101001 11100 01101 ..... .....    @vv
+xvftintrz_wu_s   0111 01101001 11100 11100 ..... .....    @vv
+xvftintrz_lu_d   0111 01101001 11100 11101 ..... .....    @vv
+xvftint_wu_s     0111 01101001 11100 10110 ..... .....    @vv
+xvftint_lu_d     0111 01101001 11100 10111 ..... .....    @vv
+
+xvftintrne_w_d   0111 01010100 10111 ..... ..... .....    @vvv
+xvftintrz_w_d    0111 01010100 10110 ..... ..... .....    @vvv
+xvftintrp_w_d    0111 01010100 10101 ..... ..... .....    @vvv
+xvftintrm_w_d    0111 01010100 10100 ..... ..... .....    @vvv
+xvftint_w_d      0111 01010100 10011 ..... ..... .....    @vvv
+
+xvftintrnel_l_s  0111 01101001 11101 01000 ..... .....    @vv
+xvftintrneh_l_s  0111 01101001 11101 01001 ..... .....    @vv
+xvftintrzl_l_s   0111 01101001 11101 00110 ..... .....    @vv
+xvftintrzh_l_s   0111 01101001 11101 00111 ..... .....    @vv
+xvftintrpl_l_s   0111 01101001 11101 00100 ..... .....    @vv
+xvftintrph_l_s   0111 01101001 11101 00101 ..... .....    @vv
+xvftintrml_l_s   0111 01101001 11101 00010 ..... .....    @vv
+xvftintrmh_l_s   0111 01101001 11101 00011 ..... .....    @vv
+xvftintl_l_s     0111 01101001 11101 00000 ..... .....    @vv
+xvftinth_l_s     0111 01101001 11101 00001 ..... .....    @vv
+
+xvffint_s_w      0111 01101001 11100 00000 ..... .....    @vv
+xvffint_d_l      0111 01101001 11100 00010 ..... .....    @vv
+xvffint_s_wu     0111 01101001 11100 00001 ..... .....    @vv
+xvffint_d_lu     0111 01101001 11100 00011 ..... .....    @vv
+xvffintl_d_w     0111 01101001 11100 00100 ..... .....    @vv
+xvffinth_d_w     0111 01101001 11100 00101 ..... .....    @vv
+xvffint_s_l      0111 01010100 10000 ..... ..... .....    @vvv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 4af74f1ae9..3fd3dc3591 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2286,6 +2286,62 @@ INSN_LASX(xvfrecip_d,        vv)
 INSN_LASX(xvfrsqrt_s,        vv)
 INSN_LASX(xvfrsqrt_d,        vv)
 
+INSN_LASX(xvfcvtl_s_h,       vv)
+INSN_LASX(xvfcvth_s_h,       vv)
+INSN_LASX(xvfcvtl_d_s,       vv)
+INSN_LASX(xvfcvth_d_s,       vv)
+INSN_LASX(xvfcvt_h_s,        vvv)
+INSN_LASX(xvfcvt_s_d,        vvv)
+
+INSN_LASX(xvfrint_s,         vv)
+INSN_LASX(xvfrint_d,         vv)
+INSN_LASX(xvfrintrm_s,       vv)
+INSN_LASX(xvfrintrm_d,       vv)
+INSN_LASX(xvfrintrp_s,       vv)
+INSN_LASX(xvfrintrp_d,       vv)
+INSN_LASX(xvfrintrz_s,       vv)
+INSN_LASX(xvfrintrz_d,       vv)
+INSN_LASX(xvfrintrne_s,      vv)
+INSN_LASX(xvfrintrne_d,      vv)
+
+INSN_LASX(xvftint_w_s,       vv)
+INSN_LASX(xvftint_l_d,       vv)
+INSN_LASX(xvftintrm_w_s,     vv)
+INSN_LASX(xvftintrm_l_d,     vv)
+INSN_LASX(xvftintrp_w_s,     vv)
+INSN_LASX(xvftintrp_l_d,     vv)
+INSN_LASX(xvftintrz_w_s,     vv)
+INSN_LASX(xvftintrz_l_d,     vv)
+INSN_LASX(xvftintrne_w_s,    vv)
+INSN_LASX(xvftintrne_l_d,    vv)
+INSN_LASX(xvftint_wu_s,      vv)
+INSN_LASX(xvftint_lu_d,      vv)
+INSN_LASX(xvftintrz_wu_s,    vv)
+INSN_LASX(xvftintrz_lu_d,    vv)
+INSN_LASX(xvftint_w_d,       vvv)
+INSN_LASX(xvftintrm_w_d,     vvv)
+INSN_LASX(xvftintrp_w_d,     vvv)
+INSN_LASX(xvftintrz_w_d,     vvv)
+INSN_LASX(xvftintrne_w_d,    vvv)
+INSN_LASX(xvftintl_l_s,      vv)
+INSN_LASX(xvftinth_l_s,      vv)
+INSN_LASX(xvftintrml_l_s,    vv)
+INSN_LASX(xvftintrmh_l_s,    vv)
+INSN_LASX(xvftintrpl_l_s,    vv)
+INSN_LASX(xvftintrph_l_s,    vv)
+INSN_LASX(xvftintrzl_l_s,    vv)
+INSN_LASX(xvftintrzh_l_s,    vv)
+INSN_LASX(xvftintrnel_l_s,   vv)
+INSN_LASX(xvftintrneh_l_s,   vv)
+
+INSN_LASX(xvffint_s_w,       vv)
+INSN_LASX(xvffint_s_wu,      vv)
+INSN_LASX(xvffint_d_l,       vv)
+INSN_LASX(xvffint_d_lu,      vv)
+INSN_LASX(xvffintl_d_w,      vv)
+INSN_LASX(xvffinth_d_w,      vv)
+INSN_LASX(xvffint_s_l,       vvv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 684b023ee5..3e2757d57b 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2506,14 +2506,19 @@ static uint32_t float64_cvt_float32(uint64_t d, float_status *status)
 void HELPER(vfcvtl_s_h)(void *vd, void *vj,
                         CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 32;
     vec_clear_cause(env);
-    for (i = 0; i < LSX_LEN/32; i++) {
-        temp.UW(i) = float16_cvt_float32(Vj->UH(i), &env->fp_status);
+    for (i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.UW(j + ofs * i) =float16_cvt_float32(Vj->UH(j + ofs * 2 * i),
+                                                      &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2522,14 +2527,19 @@ void HELPER(vfcvtl_s_h)(void *vd, void *vj,
 void HELPER(vfcvtl_d_s)(void *vd, void *vj,
                         CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 64;
     vec_clear_cause(env);
-    for (i = 0; i < LSX_LEN/64; i++) {
-        temp.UD(i) = float32_cvt_float64(Vj->UW(i), &env->fp_status);
+    for (i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.UD(j + ofs * i) = float32_cvt_float64(Vj->UW(j + ofs * 2 * i),
+                                                       &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2538,14 +2548,19 @@ void HELPER(vfcvtl_d_s)(void *vd, void *vj,
 void HELPER(vfcvth_s_h)(void *vd, void *vj,
                         CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 32;
     vec_clear_cause(env);
-    for (i = 0; i < LSX_LEN/32; i++) {
-        temp.UW(i) = float16_cvt_float32(Vj->UH(i + 4), &env->fp_status);
+    for (i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.UW(j + ofs * i) = float16_cvt_float32(Vj->UH(j + ofs * (2 * i + 1)),
+                                                       &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2554,14 +2569,19 @@ void HELPER(vfcvth_s_h)(void *vd, void *vj,
 void HELPER(vfcvth_d_s)(void *vd, void *vj,
                         CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 64;
     vec_clear_cause(env);
-    for (i = 0; i < LSX_LEN/64; i++) {
-        temp.UD(i) = float32_cvt_float64(Vj->UW(i + 2), &env->fp_status);
+    for (i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.UD(j + ofs * i) = float32_cvt_float64(Vj->UW(j + ofs * (2 * i + 1)),
+                                                        &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2570,16 +2590,22 @@ void HELPER(vfcvth_d_s)(void *vd, void *vj,
 void HELPER(vfcvt_h_s)(void *vd, void *vj, void *vk,
                        CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 32;
     vec_clear_cause(env);
-    for(i = 0; i < LSX_LEN/32; i++) {
-        temp.UH(i + 4) = float32_cvt_float16(Vj->UW(i), &env->fp_status);
-        temp.UH(i)  = float32_cvt_float16(Vk->UW(i), &env->fp_status);
+    for(i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.UH(j + ofs * (2 * i + 1)) = float32_cvt_float16(Vj->UW(j + ofs * i),
+                                                                 &env->fp_status);
+            temp.UH(j + ofs * 2 * i) = float32_cvt_float16(Vk->UW(j + ofs * i),
+                                                           &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2588,16 +2614,22 @@ void HELPER(vfcvt_h_s)(void *vd, void *vj, void *vk,
 void HELPER(vfcvt_s_d)(void *vd, void *vj, void *vk,
                        CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 64;
     vec_clear_cause(env);
-    for(i = 0; i < LSX_LEN/64; i++) {
-        temp.UW(i + 2) = float64_cvt_float32(Vj->UD(i), &env->fp_status);
-        temp.UW(i)  = float64_cvt_float32(Vk->UD(i), &env->fp_status);
+    for(i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.UW(j + ofs * (2 * i + 1)) = float64_cvt_float32(Vj->UD(j + ofs * i),
+                                                                 &env->fp_status);
+            temp.UW(j + ofs * 2 * i) = float64_cvt_float32(Vk->UD(j + ofs * i),
+                                                           &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2609,12 +2641,14 @@ void HELPER(vfrint_s)(void *vd, void *vj,
     int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
     vec_clear_cause(env);
-    for (i = 0; i < 4; i++) {
+    for (i = 0; i < oprsz / 4; i++) {
         Vd->W(i) = float32_round_to_int(Vj->UW(i), &env->fp_status);
         vec_update_fcsr0(env, GETPC());
     }
+}
 
 
 void HELPER(vfrint_d)(void *vd, void *vj,
@@ -2623,29 +2657,32 @@ void HELPER(vfrint_d)(void *vd, void *vj,
     int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
     vec_clear_cause(env);
-    for (i = 0; i < 2; i++) {
+    for (i = 0; i < oprsz / 8; i++) {
         Vd->D(i) = float64_round_to_int(Vj->UD(i), &env->fp_status);
         vec_update_fcsr0(env, GETPC());
     }
 }
 
-#define FCVT_2OP(NAME, BIT, E, MODE)                                         \
-void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
-{                                                                            \
-    int i;                                                                   \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    vec_clear_cause(env);                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                                      \
-        FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status);  \
-        set_float_rounding_mode(MODE, &env->fp_status);                      \
-        Vd->E(i) = float## BIT ## _round_to_int(Vj->E(i), &env->fp_status);  \
-        set_float_rounding_mode(old_mode, &env->fp_status);                  \
-        vec_update_fcsr0(env, GETPC());                                      \
-    }                                                                        \
+#define FCVT_2OP(NAME, BIT, E, MODE)                                        \
+void HELPER(NAME)(void *vd, void *vj,                                       \
+                  CPULoongArchState *env, uint32_t desc)                    \
+{                                                                           \
+    int i;                                                                  \
+    VReg *Vd = (VReg *)vd;                                                  \
+    VReg *Vj = (VReg *)vj;                                                  \
+    int oprsz = simd_oprsz(desc);                                           \
+                                                                            \
+    vec_clear_cause(env);                                                   \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                               \
+        FloatRoundMode old_mode = get_float_rounding_mode(&env->fp_status); \
+        set_float_rounding_mode(MODE, &env->fp_status);                     \
+        Vd->E(i) = float## BIT ## _round_to_int(Vj->E(i), &env->fp_status); \
+        set_float_rounding_mode(old_mode, &env->fp_status);                 \
+        vec_update_fcsr0(env, GETPC());                                     \
+    }                                                                       \
 }
 
 FCVT_2OP(vfrintrne_s, 32, UW, float_round_nearest_even)
@@ -2724,22 +2761,26 @@ FTINT(rp_w_d, float64, int32, uint64_t, uint32_t, float_round_up)
 FTINT(rz_w_d, float64, int32, uint64_t, uint32_t, float_round_to_zero)
 FTINT(rne_w_d, float64, int32, uint64_t, uint32_t, float_round_nearest_even)
 
-#define FTINT_W_D(NAME, FN)                             \
-void HELPER(NAME)(void *vd, void *vj, void *vk,         \
-                  CPULoongArchState *env,uint32_t desc) \
-{                                                       \
-    int i;                                              \
-    VReg temp;                                          \
-    VReg *Vd = (VReg *)vd;                              \
-    VReg *Vj = (VReg *)vj;                              \
-    VReg *Vk = (VReg *)vk;                              \
-                                                        \
-    vec_clear_cause(env);                               \
-    for (i = 0; i < 2; i++) {                           \
-        temp.W(i + 2) = FN(env, Vj->UD(i));             \
-        temp.W(i) = FN(env, Vk->UD(i));                 \
-    }                                                   \
-    *Vd = temp;                                         \
+#define FTINT_W_D(NAME, FN)                                               \
+void HELPER(NAME)(void *vd, void *vj, void *vk,                           \
+                  CPULoongArchState *env, uint32_t desc)                  \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg temp = {};                                                       \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / 64;                                                   \
+    vec_clear_cause(env);                                                 \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            temp.W(j + ofs * (2 * i + 1)) = FN(env, Vj->UD(j + ofs * i)); \
+            temp.W(j + ofs * 2 * i) = FN(env, Vk->UD(j + ofs * i));       \
+        }                                                                 \
+    }                                                                     \
+    *Vd = temp;                                                           \
 }
 
 FTINT_W_D(vftint_w_d, do_float64_to_int32)
@@ -2757,19 +2798,24 @@ FTINT(rph_l_s, float32, int64, uint32_t, uint64_t, float_round_up)
 FTINT(rzh_l_s, float32, int64, uint32_t, uint64_t, float_round_to_zero)
 FTINT(rneh_l_s, float32, int64, uint32_t, uint64_t, float_round_nearest_even)
 
-#define FTINTL_L_S(NAME, FN)                                                 \
-void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
-{                                                                            \
-    int i;                                                                   \
-    VReg temp;                                                               \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    vec_clear_cause(env);                                                    \
-    for (i = 0; i < 2; i++) {                                                \
-        temp.D(i) = FN(env, Vj->UW(i));                                      \
-    }                                                                        \
-    *Vd = temp;                                                              \
+#define FTINTL_L_S(NAME, FN)                                        \
+void HELPER(NAME)(void *vd, void *vj,                               \
+                  CPULoongArchState *env, uint32_t desc)            \
+{                                                                   \
+    int i, j, ofs;                                                  \
+    VReg temp;                                                      \
+    VReg *Vd = (VReg *)vd;                                          \
+    VReg *Vj = (VReg *)vj;                                          \
+    int oprsz = simd_oprsz(desc);                                   \
+                                                                    \
+    ofs = LSX_LEN / 64;                                             \
+    vec_clear_cause(env);                                           \
+    for (i = 0; i < oprsz / 16; i++) {                              \
+        for (j = 0; j < ofs; j++) {                                 \
+            temp.D(j + ofs * i) = FN(env, Vj->UW(j + ofs * 2 * i)); \
+        }                                                           \
+    }                                                               \
+    *Vd = temp;                                                     \
 }
 
 FTINTL_L_S(vftintl_l_s, do_float32_to_int64)
@@ -2778,19 +2824,24 @@ FTINTL_L_S(vftintrpl_l_s, do_ftintrpl_l_s)
 FTINTL_L_S(vftintrzl_l_s, do_ftintrzl_l_s)
 FTINTL_L_S(vftintrnel_l_s, do_ftintrnel_l_s)
 
-#define FTINTH_L_S(NAME, FN)                                                 \
-void HELPER(NAME)(void *vd, void *vj, CPULoongArchState *env, uint32_t desc) \
-{                                                                            \
-    int i;                                                                   \
-    VReg temp;                                                               \
-    VReg *Vd = (VReg *)vd;                                                   \
-    VReg *Vj = (VReg *)vj;                                                   \
-                                                                             \
-    vec_clear_cause(env);                                                    \
-    for (i = 0; i < 2; i++) {                                                \
-        temp.D(i) = FN(env, Vj->UW(i + 2));                                  \
-    }                                                                        \
-    *Vd = temp;                                                              \
+#define FTINTH_L_S(NAME, FN)                                              \
+void HELPER(NAME)(void *vd, void *vj,                                     \
+                  CPULoongArchState *env, uint32_t desc)                  \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg temp = {};                                                       \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / 64;                                                   \
+    vec_clear_cause(env);                                                 \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            temp.D(j + ofs * i) = FN(env, Vj->UW(j + ofs * (2 * i + 1))); \
+        }                                                                 \
+    }                                                                     \
+    *Vd = temp;                                                           \
 }
 
 FTINTH_L_S(vftinth_l_s, do_float32_to_int64)
@@ -2822,14 +2873,19 @@ DO_2OP_F(vffint_d_lu, 64, UD, do_ffint_d_lu)
 void HELPER(vffintl_d_w)(void *vd, void *vj,
                          CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc); 
 
+    ofs = LSX_LEN / 64;
     vec_clear_cause(env);
-    for (i = 0; i < 2; i++) {
-        temp.D(i) = int32_to_float64(Vj->W(i), &env->fp_status);
+    for (i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.D(j + ofs * i) = int32_to_float64(Vj->W(j + ofs * 2 * i),
+                                                   &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2838,14 +2894,19 @@ void HELPER(vffintl_d_w)(void *vd, void *vj,
 void HELPER(vffinth_d_w)(void *vd, void *vj,
                          CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 64;
     vec_clear_cause(env);
-    for (i = 0; i < 2; i++) {
-        temp.D(i) = int32_to_float64(Vj->W(i + 2), &env->fp_status);
+    for (i = 0; i < oprsz /16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.D(j + ofs * i) = int32_to_float64(Vj->W(j + ofs * (2 * i + 1)),
+                                                   &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
@@ -2854,16 +2915,22 @@ void HELPER(vffinth_d_w)(void *vd, void *vj,
 void HELPER(vffint_s_l)(void *vd, void *vj, void *vk,
                         CPULoongArchState *env, uint32_t desc)
 {
-    int i;
-    VReg temp;
+    int i, j, ofs;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
+    int oprsz = simd_oprsz(desc);
 
+    ofs = LSX_LEN / 64;
     vec_clear_cause(env);
-    for (i = 0; i < 2; i++) {
-        temp.W(i + 2) = int64_to_float32(Vj->D(i), &env->fp_status);
-        temp.W(i) = int64_to_float32(Vk->D(i), &env->fp_status);
+    for (i = 0; i < oprsz / 16; i++) {
+        for (j = 0; j < ofs; j++) {
+            temp.W(j + ofs * (2 * i + 1)) = int64_to_float32(Vj->D(j + ofs * i),
+                                                             &env->fp_status);
+            temp.W(j + ofs * 2 * i) = int64_to_float32(Vk->D(j + ofs * i),
+                                                       &env->fp_status);
+        }
         vec_update_fcsr0(env, GETPC());
     }
     *Vd = temp;
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index b1b1fb939b..760160184c 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -606,6 +606,62 @@ TRANS(xvfrecip_d, LASX, gen_vv_f, 32, gen_helper_vfrecip_d)
 TRANS(xvfrsqrt_s, LASX, gen_vv_f, 32, gen_helper_vfrsqrt_s)
 TRANS(xvfrsqrt_d, LASX, gen_vv_f, 32, gen_helper_vfrsqrt_d)
 
+TRANS(xvfcvtl_s_h, LASX, gen_vv_f, 32, gen_helper_vfcvtl_s_h)
+TRANS(xvfcvth_s_h, LASX, gen_vv_f, 32, gen_helper_vfcvth_s_h)
+TRANS(xvfcvtl_d_s, LASX, gen_vv_f, 32, gen_helper_vfcvtl_d_s)
+TRANS(xvfcvth_d_s, LASX, gen_vv_f, 32, gen_helper_vfcvth_d_s)
+TRANS(xvfcvt_h_s, LASX, gen_vvv_f, 32, gen_helper_vfcvt_h_s)
+TRANS(xvfcvt_s_d, LASX, gen_vvv_f, 32, gen_helper_vfcvt_s_d)
+
+TRANS(xvfrintrne_s, LASX, gen_vv_f, 32, gen_helper_vfrintrne_s)
+TRANS(xvfrintrne_d, LASX, gen_vv_f, 32, gen_helper_vfrintrne_d)
+TRANS(xvfrintrz_s, LASX, gen_vv_f, 32, gen_helper_vfrintrz_s)
+TRANS(xvfrintrz_d, LASX, gen_vv_f, 32, gen_helper_vfrintrz_d)
+TRANS(xvfrintrp_s, LASX, gen_vv_f, 32, gen_helper_vfrintrp_s)
+TRANS(xvfrintrp_d, LASX, gen_vv_f, 32, gen_helper_vfrintrp_d)
+TRANS(xvfrintrm_s, LASX, gen_vv_f, 32, gen_helper_vfrintrm_s)
+TRANS(xvfrintrm_d, LASX, gen_vv_f, 32, gen_helper_vfrintrm_d)
+TRANS(xvfrint_s, LASX, gen_vv_f, 32, gen_helper_vfrint_s)
+TRANS(xvfrint_d, LASX, gen_vv_f, 32, gen_helper_vfrint_d)
+
+TRANS(xvftintrne_w_s, LASX, gen_vv_f, 32, gen_helper_vftintrne_w_s)
+TRANS(xvftintrne_l_d, LASX, gen_vv_f, 32, gen_helper_vftintrne_l_d)
+TRANS(xvftintrz_w_s, LASX, gen_vv_f, 32, gen_helper_vftintrz_w_s)
+TRANS(xvftintrz_l_d, LASX, gen_vv_f, 32, gen_helper_vftintrz_l_d)
+TRANS(xvftintrp_w_s, LASX, gen_vv_f, 32, gen_helper_vftintrp_w_s)
+TRANS(xvftintrp_l_d, LASX, gen_vv_f, 32, gen_helper_vftintrp_l_d)
+TRANS(xvftintrm_w_s, LASX, gen_vv_f, 32, gen_helper_vftintrm_w_s)
+TRANS(xvftintrm_l_d, LASX, gen_vv_f, 32, gen_helper_vftintrm_l_d)
+TRANS(xvftint_w_s, LASX, gen_vv_f, 32, gen_helper_vftint_w_s)
+TRANS(xvftint_l_d, LASX, gen_vv_f, 32, gen_helper_vftint_l_d)
+TRANS(xvftintrz_wu_s, LASX, gen_vv_f, 32, gen_helper_vftintrz_wu_s)
+TRANS(xvftintrz_lu_d, LASX, gen_vv_f, 32, gen_helper_vftintrz_lu_d)
+TRANS(xvftint_wu_s, LASX, gen_vv_f, 32, gen_helper_vftint_wu_s)
+TRANS(xvftint_lu_d, LASX, gen_vv_f, 32, gen_helper_vftint_lu_d)
+TRANS(xvftintrne_w_d, LASX, gen_vvv_f, 32, gen_helper_vftintrne_w_d)
+TRANS(xvftintrz_w_d, LASX, gen_vvv_f, 32, gen_helper_vftintrz_w_d)
+TRANS(xvftintrp_w_d, LASX, gen_vvv_f, 32, gen_helper_vftintrp_w_d)
+TRANS(xvftintrm_w_d, LASX, gen_vvv_f, 32, gen_helper_vftintrm_w_d)
+TRANS(xvftint_w_d, LASX, gen_vvv_f, 32, gen_helper_vftint_w_d)
+TRANS(xvftintrnel_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrnel_l_s)
+TRANS(xvftintrneh_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrneh_l_s)
+TRANS(xvftintrzl_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrzl_l_s)
+TRANS(xvftintrzh_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrzh_l_s)
+TRANS(xvftintrpl_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrpl_l_s)
+TRANS(xvftintrph_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrph_l_s)
+TRANS(xvftintrml_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrml_l_s)
+TRANS(xvftintrmh_l_s, LASX, gen_vv_f, 32, gen_helper_vftintrmh_l_s)
+TRANS(xvftintl_l_s, LASX, gen_vv_f, 32, gen_helper_vftintl_l_s)
+TRANS(xvftinth_l_s, LASX, gen_vv_f, 32, gen_helper_vftinth_l_s)
+
+TRANS(xvffint_s_w, LASX, gen_vv_f, 32, gen_helper_vffint_s_w)
+TRANS(xvffint_d_l, LASX, gen_vv_f, 32, gen_helper_vffint_d_l)
+TRANS(xvffint_s_wu, LASX, gen_vv_f, 32, gen_helper_vffint_s_wu)
+TRANS(xvffint_d_lu, LASX, gen_vv_f, 32, gen_helper_vffint_d_lu)
+TRANS(xvffintl_d_w, LASX, gen_vv_f, 32, gen_helper_vffintl_d_w)
+TRANS(xvffinth_d_w, LASX, gen_vv_f, 32, gen_helper_vffinth_d_w)
+TRANS(xvffint_s_l, LASX, gen_vvv_f, 32, gen_helper_vffint_s_l)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 39/48] target/loongarch: Implement LASX fpu fcvt instructions
  2023-08-30  8:48 ` [PATCH v4 39/48] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
@ 2023-08-30 23:40   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:40 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVFCVT{L/H}.{S.H/D.S};
> - XVFCVT.{H.S/S.D};
> - XVFRINT[{RNE/RZ/RP/RM}].{S/D};
> - XVFTINT[{RNE/RZ/RP/RM}].{W.S/L.D};
> - XVFTINT[RZ].{WU.S/LU.D};
> - XVFTINT[{RNE/RZ/RP/RM}].W.D;
> - XVFTINT[{RNE/RZ/RP/RM}]{L/H}.L.S;
> - XVFFINT.{S.W/D.L}[U];
> - X[CVFFINT.S.L, VFFINT{L/H}.D.W.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  58 ++++
>   target/loongarch/disas.c                     |  56 ++++
>   target/loongarch/vec_helper.c                | 263 ++++++++++++-------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  56 ++++
>   4 files changed, 335 insertions(+), 98 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 40/48] target/loongarch: Implement xvseq xvsle xvslt
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (38 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 39/48] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30 23:41   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 41/48] target/loongarch: Implement xvfcmp Song Gao
                   ` (7 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSEQ[I].{B/H/W/D};
- XVSLE[I].{B/H/W/D}[U];
- XVSLT[I].{B/H/W/D/}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/vec.h                       |   4 +
 target/loongarch/insns.decode                |  43 +++
 target/loongarch/disas.c                     |  43 +++
 target/loongarch/vec_helper.c                |  27 +-
 target/loongarch/insn_trans/trans_lasx.c.inc |  43 +++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 263 ++++++++++---------
 6 files changed, 278 insertions(+), 145 deletions(-)

diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index aae70f9de9..bc74effb7c 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -89,4 +89,8 @@
 #define DO_BITSET(a, bit) (a | 1ull << bit)
 #define DO_BITREV(a, bit) (a ^ (1ull << bit))
 
+#define VSEQ(a, b) (a == b ? -1 : 0)
+#define VSLE(a, b) (a <= b ? -1 : 0)
+#define VSLT(a, b) (a < b ? -1 : 0)
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ed4f82e7fe..82c26a318b 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1915,6 +1915,49 @@ xvffintl_d_w     0111 01101001 11100 00100 ..... .....    @vv
 xvffinth_d_w     0111 01101001 11100 00101 ..... .....    @vv
 xvffint_s_l      0111 01010100 10000 ..... ..... .....    @vvv
 
+xvseq_b          0111 01000000 00000 ..... ..... .....    @vvv
+xvseq_h          0111 01000000 00001 ..... ..... .....    @vvv
+xvseq_w          0111 01000000 00010 ..... ..... .....    @vvv
+xvseq_d          0111 01000000 00011 ..... ..... .....    @vvv
+xvseqi_b         0111 01101000 00000 ..... ..... .....    @vv_i5
+xvseqi_h         0111 01101000 00001 ..... ..... .....    @vv_i5
+xvseqi_w         0111 01101000 00010 ..... ..... .....    @vv_i5
+xvseqi_d         0111 01101000 00011 ..... ..... .....    @vv_i5
+
+xvsle_b          0111 01000000 00100 ..... ..... .....    @vvv
+xvsle_h          0111 01000000 00101 ..... ..... .....    @vvv
+xvsle_w          0111 01000000 00110 ..... ..... .....    @vvv
+xvsle_d          0111 01000000 00111 ..... ..... .....    @vvv
+xvslei_b         0111 01101000 00100 ..... ..... .....    @vv_i5
+xvslei_h         0111 01101000 00101 ..... ..... .....    @vv_i5
+xvslei_w         0111 01101000 00110 ..... ..... .....    @vv_i5
+xvslei_d         0111 01101000 00111 ..... ..... .....    @vv_i5
+xvsle_bu         0111 01000000 01000 ..... ..... .....    @vvv
+xvsle_hu         0111 01000000 01001 ..... ..... .....    @vvv
+xvsle_wu         0111 01000000 01010 ..... ..... .....    @vvv
+xvsle_du         0111 01000000 01011 ..... ..... .....    @vvv
+xvslei_bu        0111 01101000 01000 ..... ..... .....    @vv_ui5
+xvslei_hu        0111 01101000 01001 ..... ..... .....    @vv_ui5
+xvslei_wu        0111 01101000 01010 ..... ..... .....    @vv_ui5
+xvslei_du        0111 01101000 01011 ..... ..... .....    @vv_ui5
+
+xvslt_b          0111 01000000 01100 ..... ..... .....    @vvv
+xvslt_h          0111 01000000 01101 ..... ..... .....    @vvv
+xvslt_w          0111 01000000 01110 ..... ..... .....    @vvv
+xvslt_d          0111 01000000 01111 ..... ..... .....    @vvv
+xvslti_b         0111 01101000 01100 ..... ..... .....    @vv_i5
+xvslti_h         0111 01101000 01101 ..... ..... .....    @vv_i5
+xvslti_w         0111 01101000 01110 ..... ..... .....    @vv_i5
+xvslti_d         0111 01101000 01111 ..... ..... .....    @vv_i5
+xvslt_bu         0111 01000000 10000 ..... ..... .....    @vvv
+xvslt_hu         0111 01000000 10001 ..... ..... .....    @vvv
+xvslt_wu         0111 01000000 10010 ..... ..... .....    @vvv
+xvslt_du         0111 01000000 10011 ..... ..... .....    @vvv
+xvslti_bu        0111 01101000 10000 ..... ..... .....    @vv_ui5
+xvslti_hu        0111 01101000 10001 ..... ..... .....    @vv_ui5
+xvslti_wu        0111 01101000 10010 ..... ..... .....    @vv_ui5
+xvslti_du        0111 01101000 10011 ..... ..... .....    @vv_ui5
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3fd3dc3591..295ba74f2b 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2342,6 +2342,49 @@ INSN_LASX(xvffintl_d_w,      vv)
 INSN_LASX(xvffinth_d_w,      vv)
 INSN_LASX(xvffint_s_l,       vvv)
 
+INSN_LASX(xvseq_b,           vvv)
+INSN_LASX(xvseq_h,           vvv)
+INSN_LASX(xvseq_w,           vvv)
+INSN_LASX(xvseq_d,           vvv)
+INSN_LASX(xvseqi_b,          vv_i)
+INSN_LASX(xvseqi_h,          vv_i)
+INSN_LASX(xvseqi_w,          vv_i)
+INSN_LASX(xvseqi_d,          vv_i)
+
+INSN_LASX(xvsle_b,           vvv)
+INSN_LASX(xvsle_h,           vvv)
+INSN_LASX(xvsle_w,           vvv)
+INSN_LASX(xvsle_d,           vvv)
+INSN_LASX(xvslei_b,          vv_i)
+INSN_LASX(xvslei_h,          vv_i)
+INSN_LASX(xvslei_w,          vv_i)
+INSN_LASX(xvslei_d,          vv_i)
+INSN_LASX(xvsle_bu,          vvv)
+INSN_LASX(xvsle_hu,          vvv)
+INSN_LASX(xvsle_wu,          vvv)
+INSN_LASX(xvsle_du,          vvv)
+INSN_LASX(xvslei_bu,         vv_i)
+INSN_LASX(xvslei_hu,         vv_i)
+INSN_LASX(xvslei_wu,         vv_i)
+INSN_LASX(xvslei_du,         vv_i)
+
+INSN_LASX(xvslt_b,           vvv)
+INSN_LASX(xvslt_h,           vvv)
+INSN_LASX(xvslt_w,           vvv)
+INSN_LASX(xvslt_d,           vvv)
+INSN_LASX(xvslti_b,          vv_i)
+INSN_LASX(xvslti_h,          vv_i)
+INSN_LASX(xvslti_w,          vv_i)
+INSN_LASX(xvslti_d,          vv_i)
+INSN_LASX(xvslt_bu,          vvv)
+INSN_LASX(xvslt_hu,          vvv)
+INSN_LASX(xvslt_wu,          vvv)
+INSN_LASX(xvslt_du,          vvv)
+INSN_LASX(xvslti_bu,         vv_i)
+INSN_LASX(xvslti_hu,         vv_i)
+INSN_LASX(xvslti_wu,         vv_i)
+INSN_LASX(xvslti_du,         vv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 3e2757d57b..19958c054c 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -2936,21 +2936,18 @@ void HELPER(vffint_s_l)(void *vd, void *vj, void *vk,
     *Vd = temp;
 }
 
-#define VSEQ(a, b) (a == b ? -1 : 0)
-#define VSLE(a, b) (a <= b ? -1 : 0)
-#define VSLT(a, b) (a < b ? -1 : 0)
-
-#define VCMPI(NAME, BIT, E, DO_OP)                              \
-void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t v) \
-{                                                               \
-    int i;                                                      \
-    VReg *Vd = (VReg *)vd;                                      \
-    VReg *Vj = (VReg *)vj;                                      \
-    typedef __typeof(Vd->E(0)) TD;                              \
-                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                         \
-        Vd->E(i) = DO_OP(Vj->E(i), (TD)imm);                    \
-    }                                                           \
+#define VCMPI(NAME, BIT, E, DO_OP)                                 \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    typedef __typeof(Vd->E(0)) TD;                                 \
+    int oprsz = simd_oprsz(desc);                                  \
+                                                                   \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                      \
+        Vd->E(i) = DO_OP(Vj->E(i), (TD)imm);                       \
+    }                                                              \
 }
 
 VCMPI(vseqi_b, 8, B, VSEQ)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 760160184c..c1cd02d6a1 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -662,6 +662,49 @@ TRANS(xvffintl_d_w, LASX, gen_vv_f, 32, gen_helper_vffintl_d_w)
 TRANS(xvffinth_d_w, LASX, gen_vv_f, 32, gen_helper_vffinth_d_w)
 TRANS(xvffint_s_l, LASX, gen_vvv_f, 32, gen_helper_vffint_s_l)
 
+TRANS(xvseq_b, LASX, do_cmp, 32, MO_8, TCG_COND_EQ)
+TRANS(xvseq_h, LASX, do_cmp, 32, MO_16, TCG_COND_EQ)
+TRANS(xvseq_w, LASX, do_cmp, 32, MO_32, TCG_COND_EQ)
+TRANS(xvseq_d, LASX, do_cmp, 32, MO_64, TCG_COND_EQ)
+TRANS(xvseqi_b, LASX, do_vseqi_s, 32, MO_8)
+TRANS(xvseqi_h, LASX, do_vseqi_s, 32, MO_16)
+TRANS(xvseqi_w, LASX, do_vseqi_s, 32, MO_32)
+TRANS(xvseqi_d, LASX, do_vseqi_s, 32, MO_64)
+
+TRANS(xvsle_b, LASX, do_cmp, 32, MO_8, TCG_COND_LE)
+TRANS(xvsle_h, LASX, do_cmp, 32, MO_16, TCG_COND_LE)
+TRANS(xvsle_w, LASX, do_cmp, 32, MO_32, TCG_COND_LE)
+TRANS(xvsle_d, LASX, do_cmp, 32, MO_64, TCG_COND_LE)
+TRANS(xvslei_b, LASX, do_vslei_s, 32, MO_8)
+TRANS(xvslei_h, LASX, do_vslei_s, 32, MO_16)
+TRANS(xvslei_w, LASX, do_vslei_s, 32, MO_32)
+TRANS(xvslei_d, LASX, do_vslei_s, 32, MO_64)
+TRANS(xvsle_bu, LASX, do_cmp, 32, MO_8, TCG_COND_LEU)
+TRANS(xvsle_hu, LASX, do_cmp, 32, MO_16, TCG_COND_LEU)
+TRANS(xvsle_wu, LASX, do_cmp, 32, MO_32, TCG_COND_LEU)
+TRANS(xvsle_du, LASX, do_cmp, 32, MO_64, TCG_COND_LEU)
+TRANS(xvslei_bu, LASX, do_vslei_u, 32, MO_8)
+TRANS(xvslei_hu, LASX, do_vslei_u, 32, MO_16)
+TRANS(xvslei_wu, LASX, do_vslei_u, 32, MO_32)
+TRANS(xvslei_du, LASX, do_vslei_u, 32, MO_64)
+
+TRANS(xvslt_b, LASX, do_cmp, 32, MO_8, TCG_COND_LT)
+TRANS(xvslt_h, LASX, do_cmp, 32, MO_16, TCG_COND_LT)
+TRANS(xvslt_w, LASX, do_cmp, 32, MO_32, TCG_COND_LT)
+TRANS(xvslt_d, LASX, do_cmp, 32, MO_64, TCG_COND_LT)
+TRANS(xvslti_b, LASX, do_vslti_s, 32, MO_8)
+TRANS(xvslti_h, LASX, do_vslti_s, 32, MO_16)
+TRANS(xvslti_w, LASX, do_vslti_s, 32, MO_32)
+TRANS(xvslti_d, LASX, do_vslti_s, 32, MO_64)
+TRANS(xvslt_bu, LASX, do_cmp, 32, MO_8, TCG_COND_LTU)
+TRANS(xvslt_hu, LASX, do_cmp, 32, MO_16, TCG_COND_LTU)
+TRANS(xvslt_wu, LASX, do_cmp, 32, MO_32, TCG_COND_LTU)
+TRANS(xvslt_du, LASX, do_cmp, 32, MO_64, TCG_COND_LTU)
+TRANS(xvslti_bu, LASX, do_vslti_u, 32, MO_8)
+TRANS(xvslti_hu, LASX, do_vslti_u, 32, MO_16)
+TRANS(xvslti_wu, LASX, do_vslti_u, 32, MO_32)
+TRANS(xvslti_du, LASX, do_vslti_u, 32, MO_64)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 64de014a58..f757db7a76 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3733,7 +3733,8 @@ TRANS(vffintl_d_w, LSX, gen_vv_f, 16, gen_helper_vffintl_d_w)
 TRANS(vffinth_d_w, LSX, gen_vv_f, 16, gen_helper_vffinth_d_w)
 TRANS(vffint_s_l, LSX, gen_vvv_f, 16, gen_helper_vffint_s_l)
 
-static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond)
+static bool do_cmp(DisasContext *ctx, arg_vvv *a,
+                   uint32_t oprsz, MemOp mop, TCGCond cond)
 {
     uint32_t vd_ofs, vj_ofs, vk_ofs;
 
@@ -3743,7 +3744,7 @@ static bool do_cmp(DisasContext *ctx, arg_vvv *a, MemOp mop, TCGCond cond)
     vj_ofs = vec_full_offset(a->vj);
     vk_ofs = vec_full_offset(a->vk);
 
-    tcg_gen_gvec_cmp(cond, mop, vd_ofs, vj_ofs, vk_ofs, 16, ctx->vl/8);
+    tcg_gen_gvec_cmp(cond, mop, vd_ofs, vj_ofs, vk_ofs, oprsz, ctx->vl / 8);
     return true;
 }
 
@@ -3778,145 +3779,147 @@ static void gen_vslti_u_vec(unsigned vece, TCGv_vec t, TCGv_vec a, int64_t imm)
     do_cmpi_vec(TCG_COND_LTU, vece, t, a, imm);
 }
 
-#define DO_CMPI_S(NAME)                                                \
-static bool do_## NAME ##_s(DisasContext *ctx, arg_vv_i *a, MemOp mop) \
-{                                                                      \
-    uint32_t vd_ofs, vj_ofs;                                           \
-                                                                       \
-    CHECK_VEC;                                                         \
-                                                                       \
-    static const TCGOpcode vecop_list[] = {                            \
-        INDEX_op_cmp_vec, 0                                            \
-    };                                                                 \
-    static const GVecGen2i op[4] = {                                   \
-        {                                                              \
-            .fniv = gen_## NAME ##_s_vec,                              \
-            .fnoi = gen_helper_## NAME ##_b,                           \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_8                                               \
-        },                                                             \
-        {                                                              \
-            .fniv = gen_## NAME ##_s_vec,                              \
-            .fnoi = gen_helper_## NAME ##_h,                           \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_16                                              \
-        },                                                             \
-        {                                                              \
-            .fniv = gen_## NAME ##_s_vec,                              \
-            .fnoi = gen_helper_## NAME ##_w,                           \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_32                                              \
-        },                                                             \
-        {                                                              \
-            .fniv = gen_## NAME ##_s_vec,                              \
-            .fnoi = gen_helper_## NAME ##_d,                           \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_64                                              \
-        }                                                              \
-    };                                                                 \
-                                                                       \
-    vd_ofs = vec_full_offset(a->vd);                                   \
-    vj_ofs = vec_full_offset(a->vj);                                   \
-                                                                       \
-    tcg_gen_gvec_2i(vd_ofs, vj_ofs, 16, ctx->vl/8, a->imm, &op[mop]);  \
-                                                                       \
-    return true;                                                       \
+#define DO_CMPI_S(NAME)                                                    \
+static bool do_## NAME ##_s(DisasContext *ctx,                             \
+                            arg_vv_i *a, uint32_t oprsz, MemOp mop)        \
+{                                                                          \
+    uint32_t vd_ofs, vj_ofs;                                               \
+                                                                           \
+    CHECK_VEC;                                                             \
+                                                                           \
+    static const TCGOpcode vecop_list[] = {                                \
+        INDEX_op_cmp_vec, 0                                                \
+    };                                                                     \
+    static const GVecGen2i op[4] = {                                       \
+        {                                                                  \
+            .fniv = gen_## NAME ##_s_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_b,                               \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_8                                                   \
+        },                                                                 \
+        {                                                                  \
+            .fniv = gen_## NAME ##_s_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_h,                               \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_16                                                  \
+        },                                                                 \
+        {                                                                  \
+            .fniv = gen_## NAME ##_s_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_w,                               \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_32                                                  \
+        },                                                                 \
+        {                                                                  \
+            .fniv = gen_## NAME ##_s_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_d,                               \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_64                                                  \
+        }                                                                  \
+    };                                                                     \
+                                                                           \
+    vd_ofs = vec_full_offset(a->vd);                                       \
+    vj_ofs = vec_full_offset(a->vj);                                       \
+                                                                           \
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, ctx->vl / 8, a->imm, &op[mop]); \
+                                                                           \
+    return true;                                                           \
 }
 
 DO_CMPI_S(vseqi)
 DO_CMPI_S(vslei)
 DO_CMPI_S(vslti)
 
-#define DO_CMPI_U(NAME)                                                \
-static bool do_## NAME ##_u(DisasContext *ctx, arg_vv_i *a, MemOp mop) \
-{                                                                      \
-    uint32_t vd_ofs, vj_ofs;                                           \
-                                                                       \
-    CHECK_VEC;                                                         \
-                                                                       \
-    static const TCGOpcode vecop_list[] = {                            \
-        INDEX_op_cmp_vec, 0                                            \
-    };                                                                 \
-    static const GVecGen2i op[4] = {                                   \
-        {                                                              \
-            .fniv = gen_## NAME ##_u_vec,                              \
-            .fnoi = gen_helper_## NAME ##_bu,                          \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_8                                               \
-        },                                                             \
-        {                                                              \
-            .fniv = gen_## NAME ##_u_vec,                              \
-            .fnoi = gen_helper_## NAME ##_hu,                          \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_16                                              \
-        },                                                             \
-        {                                                              \
-            .fniv = gen_## NAME ##_u_vec,                              \
-            .fnoi = gen_helper_## NAME ##_wu,                          \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_32                                              \
-        },                                                             \
-        {                                                              \
-            .fniv = gen_## NAME ##_u_vec,                              \
-            .fnoi = gen_helper_## NAME ##_du,                          \
-            .opt_opc = vecop_list,                                     \
-            .vece = MO_64                                              \
-        }                                                              \
-    };                                                                 \
-                                                                       \
-    vd_ofs = vec_full_offset(a->vd);                                   \
-    vj_ofs = vec_full_offset(a->vj);                                   \
-                                                                       \
-    tcg_gen_gvec_2i(vd_ofs, vj_ofs, 16, ctx->vl/8, a->imm, &op[mop]);  \
-                                                                       \
-    return true;                                                       \
+#define DO_CMPI_U(NAME)                                                    \
+static bool do_## NAME ##_u(DisasContext *ctx,                             \
+                            arg_vv_i *a, uint32_t oprsz, MemOp mop)        \
+{                                                                          \
+    uint32_t vd_ofs, vj_ofs;                                               \
+                                                                           \
+    CHECK_VEC;                                                             \
+                                                                           \
+    static const TCGOpcode vecop_list[] = {                                \
+        INDEX_op_cmp_vec, 0                                                \
+    };                                                                     \
+    static const GVecGen2i op[4] = {                                       \
+        {                                                                  \
+            .fniv = gen_## NAME ##_u_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_bu,                              \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_8                                                   \
+        },                                                                 \
+        {                                                                  \
+            .fniv = gen_## NAME ##_u_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_hu,                              \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_16                                                  \
+        },                                                                 \
+        {                                                                  \
+            .fniv = gen_## NAME ##_u_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_wu,                              \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_32                                                  \
+        },                                                                 \
+        {                                                                  \
+            .fniv = gen_## NAME ##_u_vec,                                  \
+            .fnoi = gen_helper_## NAME ##_du,                              \
+            .opt_opc = vecop_list,                                         \
+            .vece = MO_64                                                  \
+        }                                                                  \
+    };                                                                     \
+                                                                           \
+    vd_ofs = vec_full_offset(a->vd);                                       \
+    vj_ofs = vec_full_offset(a->vj);                                       \
+                                                                           \
+    tcg_gen_gvec_2i(vd_ofs, vj_ofs, oprsz, ctx->vl / 8, a->imm, &op[mop]); \
+                                                                           \
+    return true;                                                           \
 }
 
 DO_CMPI_U(vslei)
 DO_CMPI_U(vslti)
 
-TRANS(vseq_b, LSX, do_cmp, MO_8, TCG_COND_EQ)
-TRANS(vseq_h, LSX, do_cmp, MO_16, TCG_COND_EQ)
-TRANS(vseq_w, LSX, do_cmp, MO_32, TCG_COND_EQ)
-TRANS(vseq_d, LSX, do_cmp, MO_64, TCG_COND_EQ)
-TRANS(vseqi_b, LSX, do_vseqi_s, MO_8)
-TRANS(vseqi_h, LSX, do_vseqi_s, MO_16)
-TRANS(vseqi_w, LSX, do_vseqi_s, MO_32)
-TRANS(vseqi_d, LSX, do_vseqi_s, MO_64)
-
-TRANS(vsle_b, LSX, do_cmp, MO_8, TCG_COND_LE)
-TRANS(vsle_h, LSX, do_cmp, MO_16, TCG_COND_LE)
-TRANS(vsle_w, LSX, do_cmp, MO_32, TCG_COND_LE)
-TRANS(vsle_d, LSX, do_cmp, MO_64, TCG_COND_LE)
-TRANS(vslei_b, LSX, do_vslei_s, MO_8)
-TRANS(vslei_h, LSX, do_vslei_s, MO_16)
-TRANS(vslei_w, LSX, do_vslei_s, MO_32)
-TRANS(vslei_d, LSX, do_vslei_s, MO_64)
-TRANS(vsle_bu, LSX, do_cmp, MO_8, TCG_COND_LEU)
-TRANS(vsle_hu, LSX, do_cmp, MO_16, TCG_COND_LEU)
-TRANS(vsle_wu, LSX, do_cmp, MO_32, TCG_COND_LEU)
-TRANS(vsle_du, LSX, do_cmp, MO_64, TCG_COND_LEU)
-TRANS(vslei_bu, LSX, do_vslei_u, MO_8)
-TRANS(vslei_hu, LSX, do_vslei_u, MO_16)
-TRANS(vslei_wu, LSX, do_vslei_u, MO_32)
-TRANS(vslei_du, LSX, do_vslei_u, MO_64)
-
-TRANS(vslt_b, LSX, do_cmp, MO_8, TCG_COND_LT)
-TRANS(vslt_h, LSX, do_cmp, MO_16, TCG_COND_LT)
-TRANS(vslt_w, LSX, do_cmp, MO_32, TCG_COND_LT)
-TRANS(vslt_d, LSX, do_cmp, MO_64, TCG_COND_LT)
-TRANS(vslti_b, LSX, do_vslti_s, MO_8)
-TRANS(vslti_h, LSX, do_vslti_s, MO_16)
-TRANS(vslti_w, LSX, do_vslti_s, MO_32)
-TRANS(vslti_d, LSX, do_vslti_s, MO_64)
-TRANS(vslt_bu, LSX, do_cmp, MO_8, TCG_COND_LTU)
-TRANS(vslt_hu, LSX, do_cmp, MO_16, TCG_COND_LTU)
-TRANS(vslt_wu, LSX, do_cmp, MO_32, TCG_COND_LTU)
-TRANS(vslt_du, LSX, do_cmp, MO_64, TCG_COND_LTU)
-TRANS(vslti_bu, LSX, do_vslti_u, MO_8)
-TRANS(vslti_hu, LSX, do_vslti_u, MO_16)
-TRANS(vslti_wu, LSX, do_vslti_u, MO_32)
-TRANS(vslti_du, LSX, do_vslti_u, MO_64)
+TRANS(vseq_b, LSX, do_cmp, 16, MO_8, TCG_COND_EQ)
+TRANS(vseq_h, LSX, do_cmp, 16, MO_16, TCG_COND_EQ)
+TRANS(vseq_w, LSX, do_cmp, 16, MO_32, TCG_COND_EQ)
+TRANS(vseq_d, LSX, do_cmp, 16, MO_64, TCG_COND_EQ)
+TRANS(vseqi_b, LSX, do_vseqi_s, 16, MO_8)
+TRANS(vseqi_h, LSX, do_vseqi_s, 16, MO_16)
+TRANS(vseqi_w, LSX, do_vseqi_s, 16, MO_32)
+TRANS(vseqi_d, LSX, do_vseqi_s, 16, MO_64)
+
+TRANS(vsle_b, LSX, do_cmp, 16, MO_8, TCG_COND_LE)
+TRANS(vsle_h, LSX, do_cmp, 16, MO_16, TCG_COND_LE)
+TRANS(vsle_w, LSX, do_cmp, 16, MO_32, TCG_COND_LE)
+TRANS(vsle_d, LSX, do_cmp, 16, MO_64, TCG_COND_LE)
+TRANS(vslei_b, LSX, do_vslei_s, 16, MO_8)
+TRANS(vslei_h, LSX, do_vslei_s, 16, MO_16)
+TRANS(vslei_w, LSX, do_vslei_s, 16, MO_32)
+TRANS(vslei_d, LSX, do_vslei_s, 16, MO_64)
+TRANS(vsle_bu, LSX, do_cmp, 16, MO_8, TCG_COND_LEU)
+TRANS(vsle_hu, LSX, do_cmp, 16, MO_16, TCG_COND_LEU)
+TRANS(vsle_wu, LSX, do_cmp, 16, MO_32, TCG_COND_LEU)
+TRANS(vsle_du, LSX, do_cmp, 16, MO_64, TCG_COND_LEU)
+TRANS(vslei_bu, LSX, do_vslei_u, 16, MO_8)
+TRANS(vslei_hu, LSX, do_vslei_u, 16, MO_16)
+TRANS(vslei_wu, LSX, do_vslei_u, 16, MO_32)
+TRANS(vslei_du, LSX, do_vslei_u, 16, MO_64)
+
+TRANS(vslt_b, LSX, do_cmp, 16, MO_8, TCG_COND_LT)
+TRANS(vslt_h, LSX, do_cmp, 16, MO_16, TCG_COND_LT)
+TRANS(vslt_w, LSX, do_cmp, 16, MO_32, TCG_COND_LT)
+TRANS(vslt_d, LSX, do_cmp, 16, MO_64, TCG_COND_LT)
+TRANS(vslti_b, LSX, do_vslti_s, 16, MO_8)
+TRANS(vslti_h, LSX, do_vslti_s, 16, MO_16)
+TRANS(vslti_w, LSX, do_vslti_s, 16, MO_32)
+TRANS(vslti_d, LSX, do_vslti_s, 16, MO_64)
+TRANS(vslt_bu, LSX, do_cmp, 16, MO_8, TCG_COND_LTU)
+TRANS(vslt_hu, LSX, do_cmp, 16, MO_16, TCG_COND_LTU)
+TRANS(vslt_wu, LSX, do_cmp, 16, MO_32, TCG_COND_LTU)
+TRANS(vslt_du, LSX, do_cmp, 16, MO_64, TCG_COND_LTU)
+TRANS(vslti_bu, LSX, do_vslti_u, 16, MO_8)
+TRANS(vslti_hu, LSX, do_vslti_u, 16, MO_16)
+TRANS(vslti_wu, LSX, do_vslti_u, 16, MO_32)
+TRANS(vslti_du, LSX, do_vslti_u, 16, MO_64)
 
 static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
 {
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 40/48] target/loongarch: Implement xvseq xvsle xvslt
  2023-08-30  8:48 ` [PATCH v4 40/48] target/loongarch: Implement xvseq xvsle xvslt Song Gao
@ 2023-08-30 23:41   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-30 23:41 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVSEQ[I].{B/H/W/D};
> - XVSLE[I].{B/H/W/D}[U];
> - XVSLT[I].{B/H/W/D/}[U].
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/vec.h                       |   4 +
>   target/loongarch/insns.decode                |  43 +++
>   target/loongarch/disas.c                     |  43 +++
>   target/loongarch/vec_helper.c                |  27 +-
>   target/loongarch/insn_trans/trans_lasx.c.inc |  43 +++
>   target/loongarch/insn_trans/trans_lsx.c.inc  | 263 ++++++++++---------
>   6 files changed, 278 insertions(+), 145 deletions(-)
> 
> diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
> index aae70f9de9..bc74effb7c 100644
> --- a/target/loongarch/vec.h
> +++ b/target/loongarch/vec.h
> @@ -89,4 +89,8 @@
>   #define DO_BITSET(a, bit) (a | 1ull << bit)
>   #define DO_BITREV(a, bit) (a ^ (1ull << bit))
>   
> +#define VSEQ(a, b) (a == b ? -1 : 0)
> +#define VSLE(a, b) (a <= b ? -1 : 0)
> +#define VSLT(a, b) (a < b ? -1 : 0)
> +



Aside from this movement,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 41/48] target/loongarch: Implement xvfcmp
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (39 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 40/48] target/loongarch: Implement xvseq xvsle xvslt Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-31  0:30   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 42/48] target/loongarch: Implement xvbitsel xvset Song Gao
                   ` (6 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVFCMP.cond.{S/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h                    |  8 +-
 target/loongarch/insns.decode                |  3 +
 target/loongarch/disas.c                     | 94 ++++++++++++++++++++
 target/loongarch/vec_helper.c                |  4 +-
 target/loongarch/insn_trans/trans_lasx.c.inc |  3 +
 target/loongarch/insn_trans/trans_lsx.c.inc  | 17 ++--
 6 files changed, 117 insertions(+), 12 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index e9c5412267..b54ce68077 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -652,10 +652,10 @@ DEF_HELPER_FLAGS_4(vslti_hu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_wu, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vslti_du, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_5(vfcmp_c_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfcmp_s_s, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfcmp_c_d, void, env, i32, i32, i32, i32)
-DEF_HELPER_5(vfcmp_s_d, void, env, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_c_s, void, env, i32, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_s_s, void, env, i32, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_c_d, void, env, i32, i32, i32, i32, i32)
+DEF_HELPER_6(vfcmp_s_d, void, env, i32, i32, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(vbitseli_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 82c26a318b..0d46bd5e5e 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1958,6 +1958,9 @@ xvslti_hu        0111 01101000 10001 ..... ..... .....    @vv_ui5
 xvslti_wu        0111 01101000 10010 ..... ..... .....    @vv_ui5
 xvslti_du        0111 01101000 10011 ..... ..... .....    @vv_ui5
 
+xvfcmp_cond_s    0000 11001001 ..... ..... ..... .....    @vvv_fcond
+xvfcmp_cond_d    0000 11001010 ..... ..... ..... .....    @vvv_fcond
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 295ba74f2b..607774375c 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2385,6 +2385,100 @@ INSN_LASX(xvslti_hu,         vv_i)
 INSN_LASX(xvslti_wu,         vv_i)
 INSN_LASX(xvslti_du,         vv_i)
 
+#define output_xvfcmp(C, PREFIX, SUFFIX)                                    \
+{                                                                           \
+    (C)->info->fprintf_func((C)->info->stream, "%08x  %s%s\tx%d, x%d, x%d", \
+                            (C)->insn, PREFIX, SUFFIX, a->vd,               \
+                            a->vj, a->vk);                                  \
+}
+
+static bool output_xxx_fcond(DisasContext *ctx, arg_vvv_fcond * a,
+                             const char *suffix)
+{
+    bool ret = true;
+    switch (a->fcond) {
+    case 0x0:
+        output_xvfcmp(ctx, "xvfcmp_caf_", suffix);
+        break;
+    case 0x1:
+        output_xvfcmp(ctx, "xvfcmp_saf_", suffix);
+        break;
+    case 0x2:
+        output_xvfcmp(ctx, "xvfcmp_clt_", suffix);
+        break;
+    case 0x3:
+        output_xvfcmp(ctx, "xvfcmp_slt_", suffix);
+        break;
+    case 0x4:
+        output_xvfcmp(ctx, "xvfcmp_ceq_", suffix);
+        break;
+    case 0x5:
+        output_xvfcmp(ctx, "xvfcmp_seq_", suffix);
+        break;
+    case 0x6:
+        output_xvfcmp(ctx, "xvfcmp_cle_", suffix);
+        break;
+    case 0x7:
+        output_xvfcmp(ctx, "xvfcmp_sle_", suffix);
+        break;
+    case 0x8:
+        output_xvfcmp(ctx, "xvfcmp_cun_", suffix);
+        break;
+    case 0x9:
+        output_xvfcmp(ctx, "xvfcmp_sun_", suffix);
+        break;
+    case 0xA:
+        output_xvfcmp(ctx, "xvfcmp_cult_", suffix);
+        break;
+    case 0xB:
+        output_xvfcmp(ctx, "xvfcmp_sult_", suffix);
+        break;
+    case 0xC:
+        output_xvfcmp(ctx, "xvfcmp_cueq_", suffix);
+        break;
+    case 0xD:
+        output_xvfcmp(ctx, "xvfcmp_sueq_", suffix);
+        break;
+    case 0xE:
+        output_xvfcmp(ctx, "xvfcmp_cule_", suffix);
+        break;
+    case 0xF:
+        output_xvfcmp(ctx, "xvfcmp_sule_", suffix);
+        break;
+    case 0x10:
+        output_xvfcmp(ctx, "xvfcmp_cne_", suffix);
+        break;
+    case 0x11:
+        output_xvfcmp(ctx, "xvfcmp_sne_", suffix);
+        break;
+    case 0x14:
+        output_xvfcmp(ctx, "xvfcmp_cor_", suffix);
+        break;
+    case 0x15:
+        output_xvfcmp(ctx, "xvfcmp_sor_", suffix);
+        break;
+    case 0x18:
+        output_xvfcmp(ctx, "xvfcmp_cune_", suffix);
+        break;
+    case 0x19:
+        output_xvfcmp(ctx, "xvfcmp_sune_", suffix);
+        break;
+    default:
+        ret = false;
+    }
+    return ret;
+}
+
+#define LASX_FCMP_INSN(suffix)                            \
+static bool trans_xvfcmp_cond_##suffix(DisasContext *ctx, \
+                                       arg_vvv_fcond * a) \
+{                                                         \
+    return output_xxx_fcond(ctx, a, #suffix);             \
+}
+
+LASX_FCMP_INSN(s)
+LASX_FCMP_INSN(d)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 19958c054c..4970a4b39a 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3001,7 +3001,7 @@ static uint64_t vfcmp_common(CPULoongArchState *env,
 }
 
 #define VFCMP(NAME, BIT, E, FN)                                          \
-void HELPER(NAME)(CPULoongArchState *env,                                \
+void HELPER(NAME)(CPULoongArchState *env, uint32_t oprsz,                \
                   uint32_t vd, uint32_t vj, uint32_t vk, uint32_t flags) \
 {                                                                        \
     int i;                                                               \
@@ -3011,7 +3011,7 @@ void HELPER(NAME)(CPULoongArchState *env,                                \
     VReg *Vk = &(env->fpr[vk].vreg);                                     \
                                                                          \
     vec_clear_cause(env);                                                \
-    for (i = 0; i < LSX_LEN/BIT ; i++) {                                 \
+    for (i = 0; i < oprsz / (BIT / 8) ; i++) {                           \
         FloatRelation cmp;                                               \
         cmp = FN(Vj->E(i), Vk->E(i), &env->fp_status);                   \
         t.E(i) = vfcmp_common(env, cmp, flags);                          \
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index c1cd02d6a1..6efb9733a3 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -705,6 +705,9 @@ TRANS(xvslti_hu, LASX, do_vslti_u, 32, MO_16)
 TRANS(xvslti_wu, LASX, do_vslti_u, 32, MO_32)
 TRANS(xvslti_du, LASX, do_vslti_u, 32, MO_64)
 
+TRANS(xvfcmp_cond_s, LASX, do_vfcmp_cond_s, 32)
+TRANS(xvfcmp_cond_d, LASX, do_vfcmp_cond_d, 32)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index f757db7a76..a5d6cc834d 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -3921,13 +3921,14 @@ TRANS(vslti_hu, LSX, do_vslti_u, 16, MO_16)
 TRANS(vslti_wu, LSX, do_vslti_u, 16, MO_32)
 TRANS(vslti_du, LSX, do_vslti_u, 16, MO_64)
 
-static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
+static bool do_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a, uint32_t sz)
 {
     uint32_t flags;
-    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
     TCGv_i32 vd = tcg_constant_i32(a->vd);
     TCGv_i32 vj = tcg_constant_i32(a->vj);
     TCGv_i32 vk = tcg_constant_i32(a->vk);
+    TCGv_i32 oprsz = tcg_constant_i32(sz);
 
     if (!avail_LSX(ctx)) {
         return false;
@@ -3937,18 +3938,19 @@ static bool trans_vfcmp_cond_s(DisasContext *ctx, arg_vvv_fcond *a)
 
     fn = (a->fcond & 1 ? gen_helper_vfcmp_s_s : gen_helper_vfcmp_c_s);
     flags = get_fcmp_flags(a->fcond >> 1);
-    fn(cpu_env, vd, vj, vk,  tcg_constant_i32(flags));
+    fn(cpu_env, oprsz, vd, vj, vk,  tcg_constant_i32(flags));
 
     return true;
 }
 
-static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
+static bool do_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a, uint32_t sz)
 {
     uint32_t flags;
-    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
+    void (*fn)(TCGv_env, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32, TCGv_i32);
     TCGv_i32 vd = tcg_constant_i32(a->vd);
     TCGv_i32 vj = tcg_constant_i32(a->vj);
     TCGv_i32 vk = tcg_constant_i32(a->vk);
+    TCGv_i32 oprsz = tcg_constant_i32(sz);
 
     if (!avail_LSX(ctx)) {
         return false;
@@ -3958,11 +3960,14 @@ static bool trans_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a)
 
     fn = (a->fcond & 1 ? gen_helper_vfcmp_s_d : gen_helper_vfcmp_c_d);
     flags = get_fcmp_flags(a->fcond >> 1);
-    fn(cpu_env, vd, vj, vk, tcg_constant_i32(flags));
+    fn(cpu_env, oprsz, vd, vj, vk, tcg_constant_i32(flags));
 
     return true;
 }
 
+TRANS(vfcmp_cond_s, LSX, do_vfcmp_cond_s, 16)
+TRANS(vfcmp_cond_d, LSX, do_vfcmp_cond_d, 16)
+
 static bool trans_vbitsel_v(DisasContext *ctx, arg_vvvv *a)
 {
     if (!avail_LSX(ctx)) {
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 41/48] target/loongarch: Implement xvfcmp
  2023-08-30  8:48 ` [PATCH v4 41/48] target/loongarch: Implement xvfcmp Song Gao
@ 2023-08-31  0:30   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-31  0:30 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVFCMP.cond.{S/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/helper.h                    |  8 +-
>   target/loongarch/insns.decode                |  3 +
>   target/loongarch/disas.c                     | 94 ++++++++++++++++++++
>   target/loongarch/vec_helper.c                |  4 +-
>   target/loongarch/insn_trans/trans_lasx.c.inc |  3 +
>   target/loongarch/insn_trans/trans_lsx.c.inc  | 17 ++--
>   6 files changed, 117 insertions(+), 12 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 42/48] target/loongarch: Implement xvbitsel xvset
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (40 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 41/48] target/loongarch: Implement xvfcmp Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-31  0:32   ` Richard Henderson
  2023-08-30  8:48 ` [PATCH v4 43/48] target/loongarch: Implement xvinsgr2vr xvpickve2gr Song Gao
                   ` (5 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVBITSEL.V;
- XVBITSELI.B;
- XVSET{EQZ/NEZ}.V;
- XVSETANYEQZ.{B/H/W/D};
- XVSETALLNEZ.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h                    | 16 +++----
 target/loongarch/insns.decode                | 15 +++++++
 target/loongarch/disas.c                     | 19 ++++++++
 target/loongarch/vec_helper.c                | 40 ++++++++++-------
 target/loongarch/insn_trans/trans_lasx.c.inc | 46 ++++++++++++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 44 +++++++++----------
 6 files changed, 134 insertions(+), 46 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index b54ce68077..85233586e3 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -659,14 +659,14 @@ DEF_HELPER_6(vfcmp_s_d, void, env, i32, i32, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(vbitseli_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
-DEF_HELPER_3(vsetanyeqz_b, void, env, i32, i32)
-DEF_HELPER_3(vsetanyeqz_h, void, env, i32, i32)
-DEF_HELPER_3(vsetanyeqz_w, void, env, i32, i32)
-DEF_HELPER_3(vsetanyeqz_d, void, env, i32, i32)
-DEF_HELPER_3(vsetallnez_b, void, env, i32, i32)
-DEF_HELPER_3(vsetallnez_h, void, env, i32, i32)
-DEF_HELPER_3(vsetallnez_w, void, env, i32, i32)
-DEF_HELPER_3(vsetallnez_d, void, env, i32, i32)
+DEF_HELPER_4(vsetanyeqz_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetanyeqz_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetanyeqz_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetanyeqz_d, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetallnez_b, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetallnez_h, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetallnez_w, void, env, i32, i32, i32)
+DEF_HELPER_4(vsetallnez_d, void, env, i32, i32, i32)
 
 DEF_HELPER_FLAGS_4(vpackev_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vpackev_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 0d46bd5e5e..ad6751fdfb 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1961,6 +1961,21 @@ xvslti_du        0111 01101000 10011 ..... ..... .....    @vv_ui5
 xvfcmp_cond_s    0000 11001001 ..... ..... ..... .....    @vvv_fcond
 xvfcmp_cond_d    0000 11001010 ..... ..... ..... .....    @vvv_fcond
 
+xvbitsel_v       0000 11010010 ..... ..... ..... .....    @vvvv
+
+xvbitseli_b      0111 01111100 01 ........ ..... .....    @vv_ui8
+
+xvseteqz_v       0111 01101001 11001 00110 ..... 00 ...   @cv
+xvsetnez_v       0111 01101001 11001 00111 ..... 00 ...   @cv
+xvsetanyeqz_b    0111 01101001 11001 01000 ..... 00 ...   @cv
+xvsetanyeqz_h    0111 01101001 11001 01001 ..... 00 ...   @cv
+xvsetanyeqz_w    0111 01101001 11001 01010 ..... 00 ...   @cv
+xvsetanyeqz_d    0111 01101001 11001 01011 ..... 00 ...   @cv
+xvsetallnez_b    0111 01101001 11001 01100 ..... 00 ...   @cv
+xvsetallnez_h    0111 01101001 11001 01101 ..... 00 ...   @cv
+xvsetallnez_w    0111 01101001 11001 01110 ..... 00 ...   @cv
+xvsetallnez_d    0111 01101001 11001 01111 ..... 00 ...   @cv
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 607774375c..3a06b5cb80 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1703,6 +1703,11 @@ static bool trans_##insn(DisasContext *ctx, arg_##type * a) \
     return true;                                            \
 }
 
+static void output_cv_x(DisasContext *ctx, arg_cv *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "fcc%d, x%d", a->cd, a->vj);
+}
+
 static void output_v_i_x(DisasContext *ctx, arg_v_i *a, const char *mnemonic)
 {
     output(ctx, mnemonic, "x%d, 0x%x", a->vd, a->imm);
@@ -2479,6 +2484,20 @@ static bool trans_xvfcmp_cond_##suffix(DisasContext *ctx, \
 LASX_FCMP_INSN(s)
 LASX_FCMP_INSN(d)
 
+INSN_LASX(xvbitsel_v,        vvvv)
+INSN_LASX(xvbitseli_b,       vv_i)
+
+INSN_LASX(xvseteqz_v,        cv)
+INSN_LASX(xvsetnez_v,        cv)
+INSN_LASX(xvsetanyeqz_b,     cv)
+INSN_LASX(xvsetanyeqz_h,     cv)
+INSN_LASX(xvsetanyeqz_w,     cv)
+INSN_LASX(xvsetanyeqz_d,     cv)
+INSN_LASX(xvsetallnez_b,     cv)
+INSN_LASX(xvsetallnez_h,     cv)
+INSN_LASX(xvsetallnez_w,     cv)
+INSN_LASX(xvsetallnez_d,     cv)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 4970a4b39a..1a13342c86 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3025,13 +3025,13 @@ VFCMP(vfcmp_s_s, 32, UW, float32_compare)
 VFCMP(vfcmp_c_d, 64, UD, float64_compare_quiet)
 VFCMP(vfcmp_s_d, 64, UD, float64_compare)
 
-void HELPER(vbitseli_b)(void *vd, void *vj,  uint64_t imm, uint32_t v)
+void HELPER(vbitseli_b)(void *vd, void *vj,  uint64_t imm, uint32_t desc)
 {
     int i;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
 
-    for (i = 0; i < 16; i++) {
+    for (i = 0; i < simd_oprsz(desc); i++) {
         Vd->B(i) = (~Vd->B(i) & Vj->B(i)) | (Vd->B(i) & imm);
     }
 }
@@ -3039,7 +3039,7 @@ void HELPER(vbitseli_b)(void *vd, void *vj,  uint64_t imm, uint32_t v)
 /* Copy from target/arm/tcg/sve_helper.c */
 static inline bool do_match2(uint64_t n, uint64_t m0, uint64_t m1, int esz)
 {
-    uint64_t bits = 8 << esz;
+    int bits = 8 << esz;
     uint64_t ones = dup_const(esz, 1);
     uint64_t signs = ones << (bits - 1);
     uint64_t cmp0, cmp1;
@@ -3052,24 +3052,34 @@ static inline bool do_match2(uint64_t n, uint64_t m0, uint64_t m1, int esz)
     return (cmp0 | cmp1) & signs;
 }
 
-#define SETANYEQZ(NAME, MO)                                         \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
-{                                                                   \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    env->cf[cd & 0x7] = do_match2(0, Vj->D(0), Vj->D(1), MO);       \
+#define SETANYEQZ(NAME, MO)                                        \
+void HELPER(NAME)(CPULoongArchState *env,                          \
+                  uint32_t oprsz, uint32_t cd, uint32_t vj)        \
+{                                                                  \
+    VReg *Vj = &(env->fpr[vj].vreg);                               \
+                                                                   \
+    env->cf[cd & 0x7] = do_match2(0, Vj->D(0), Vj->D(1), MO);      \
+    if (oprsz == 32) {                                             \
+        env->cf[cd & 0x7] =  env->cf[cd & 0x7] ||                  \
+                             do_match2(0, Vj->D(2), Vj->D(3), MO); \
+    }                                                              \
 }
 SETANYEQZ(vsetanyeqz_b, MO_8)
 SETANYEQZ(vsetanyeqz_h, MO_16)
 SETANYEQZ(vsetanyeqz_w, MO_32)
 SETANYEQZ(vsetanyeqz_d, MO_64)
 
-#define SETALLNEZ(NAME, MO)                                         \
-void HELPER(NAME)(CPULoongArchState *env, uint32_t cd, uint32_t vj) \
-{                                                                   \
-    VReg *Vj = &(env->fpr[vj].vreg);                                \
-                                                                    \
-    env->cf[cd & 0x7]= !do_match2(0, Vj->D(0), Vj->D(1), MO);       \
+#define SETALLNEZ(NAME, MO)                                        \
+void HELPER(NAME)(CPULoongArchState *env,                          \
+                  uint32_t oprsz, uint32_t cd, uint32_t vj)        \
+{                                                                  \
+    VReg *Vj = &(env->fpr[vj].vreg);                               \
+                                                                   \
+    env->cf[cd & 0x7]= !do_match2(0, Vj->D(0), Vj->D(1), MO);      \
+    if (oprsz == 32) {                                             \
+        env->cf[cd & 0x7] = env->cf[cd & 0x7] &&                   \
+                            !do_match2(0, Vj->D(2), Vj->D(3), MO); \
+    }                                                              \
 }
 SETALLNEZ(vsetallnez_b, MO_8)
 SETALLNEZ(vsetallnez_h, MO_16)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 6efb9733a3..190fe3eecb 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -708,6 +708,52 @@ TRANS(xvslti_du, LASX, do_vslti_u, 32, MO_64)
 TRANS(xvfcmp_cond_s, LASX, do_vfcmp_cond_s, 32)
 TRANS(xvfcmp_cond_d, LASX, do_vfcmp_cond_d, 32)
 
+TRANS(xvbitsel_v, LASX, do_vbitsel_v, 32)
+TRANS(xvbitseli_b, LASX, do_vbitseli_b, 32)
+
+#define XVSET(NAME, COND)                                                      \
+static bool trans_## NAME(DisasContext *ctx, arg_cv * a)                       \
+{                                                                              \
+    TCGv_i64 t1, t2, d[4];                                                     \
+                                                                               \
+    d[0] = tcg_temp_new_i64();                                                 \
+    d[1] = tcg_temp_new_i64();                                                 \
+    d[2] = tcg_temp_new_i64();                                                 \
+    d[3] = tcg_temp_new_i64();                                                 \
+    t1 = tcg_temp_new_i64();                                                   \
+    t2 = tcg_temp_new_i64();                                                   \
+                                                                               \
+    get_vreg64(d[0], a->vj, 0);                                                \
+    get_vreg64(d[1], a->vj, 1);                                                \
+    get_vreg64(d[2], a->vj, 2);                                                \
+    get_vreg64(d[3], a->vj, 3);                                                \
+                                                                               \
+    if (!avail_LASX(ctx)) {                                                    \
+        return false;                                                          \
+    }                                                                          \
+                                                                               \
+    CHECK_VEC;                                                                 \
+    tcg_gen_or_i64(t1, d[0], d[1]);                                            \
+    tcg_gen_or_i64(t2, d[2], d[3]);                                            \
+    tcg_gen_or_i64(t1, t2, t1);                                                \
+    tcg_gen_setcondi_i64(COND, t1, t1, 0);                                     \
+    tcg_gen_st8_tl(t1, cpu_env, offsetof(CPULoongArchState, cf[a->cd & 0x7])); \
+                                                                               \
+    return true;                                                               \
+}
+
+XVSET(xvseteqz_v, TCG_COND_EQ)
+XVSET(xvsetnez_v, TCG_COND_NE)
+
+TRANS(xvsetanyeqz_b, LASX, gen_cv, 32, gen_helper_vsetanyeqz_b)
+TRANS(xvsetanyeqz_h, LASX, gen_cv, 32, gen_helper_vsetanyeqz_h)
+TRANS(xvsetanyeqz_w, LASX, gen_cv, 32, gen_helper_vsetanyeqz_w)
+TRANS(xvsetanyeqz_d, LASX, gen_cv, 32, gen_helper_vsetanyeqz_d)
+TRANS(xvsetallnez_b, LASX, gen_cv, 32, gen_helper_vsetallnez_b)
+TRANS(xvsetallnez_h, LASX, gen_cv, 32, gen_helper_vsetallnez_h)
+TRANS(xvsetallnez_w, LASX, gen_cv, 32, gen_helper_vsetallnez_w)
+TRANS(xvsetallnez_d, LASX, gen_cv, 32, gen_helper_vsetallnez_d)
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index a5d6cc834d..2928e878cf 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -91,14 +91,16 @@ static bool gen_vv_i(DisasContext *ctx, arg_vv_i *a, int oprsz,
     return true;
 }
 
-static bool gen_cv(DisasContext *ctx, arg_cv *a,
-                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32))
+static bool gen_cv(DisasContext *ctx, arg_cv *a, uint32_t sz,
+                    void (*func)(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32))
 {
     TCGv_i32 vj = tcg_constant_i32(a->vj);
     TCGv_i32 cd = tcg_constant_i32(a->cd);
+    TCGv_i32 oprsz = tcg_constant_i32(sz);
 
     CHECK_VEC;
-    func(cpu_env, cd, vj);
+
+    func(cpu_env, oprsz, cd, vj);
     return true;
 }
 
@@ -3968,26 +3970,24 @@ static bool do_vfcmp_cond_d(DisasContext *ctx, arg_vvv_fcond *a, uint32_t sz)
 TRANS(vfcmp_cond_s, LSX, do_vfcmp_cond_s, 16)
 TRANS(vfcmp_cond_d, LSX, do_vfcmp_cond_d, 16)
 
-static bool trans_vbitsel_v(DisasContext *ctx, arg_vvvv *a)
+static bool do_vbitsel_v(DisasContext *ctx, arg_vvvv *a, uint32_t oprsz)
 {
-    if (!avail_LSX(ctx)) {
-        return false;
-    }
-
     CHECK_VEC;
 
     tcg_gen_gvec_bitsel(MO_64, vec_full_offset(a->vd), vec_full_offset(a->va),
                         vec_full_offset(a->vk), vec_full_offset(a->vj),
-                        16, ctx->vl/8);
+                        oprsz, ctx->vl / 8);
     return true;
 }
 
+TRANS(vbitsel_v, LASX, do_vbitsel_v, 16)
+
 static void gen_vbitseli(unsigned vece, TCGv_vec a, TCGv_vec b, int64_t imm)
 {
     tcg_gen_bitsel_vec(vece, a, a, tcg_constant_vec_matching(a, vece, imm), b);
 }
 
-static bool trans_vbitseli_b(DisasContext *ctx, arg_vv_i *a)
+static bool do_vbitseli_b(DisasContext *ctx, arg_vv_i *a, uint32_t oprsz)
 {
     static const GVecGen2i op = {
        .fniv = gen_vbitseli,
@@ -3996,17 +3996,15 @@ static bool trans_vbitseli_b(DisasContext *ctx, arg_vv_i *a)
        .load_dest = true
     };
 
-    if (!avail_LSX(ctx)) {
-        return false;
-    }
-
     CHECK_VEC;
 
     tcg_gen_gvec_2i(vec_full_offset(a->vd), vec_full_offset(a->vj),
-                    16, ctx->vl/8, a->imm, &op);
+                    oprsz, ctx->vl / 8, a->imm, &op);
     return true;
 }
 
+TRANS(vbitseli_b, LASX, do_vbitseli_b, 16)
+
 #define VSET(NAME, COND)                                                       \
 static bool trans_## NAME (DisasContext *ctx, arg_cv *a)                       \
 {                                                                              \
@@ -4034,14 +4032,14 @@ static bool trans_## NAME (DisasContext *ctx, arg_cv *a)                       \
 VSET(vseteqz_v, TCG_COND_EQ)
 VSET(vsetnez_v, TCG_COND_NE)
 
-TRANS(vsetanyeqz_b, LSX, gen_cv, gen_helper_vsetanyeqz_b)
-TRANS(vsetanyeqz_h, LSX, gen_cv, gen_helper_vsetanyeqz_h)
-TRANS(vsetanyeqz_w, LSX, gen_cv, gen_helper_vsetanyeqz_w)
-TRANS(vsetanyeqz_d, LSX, gen_cv, gen_helper_vsetanyeqz_d)
-TRANS(vsetallnez_b, LSX, gen_cv, gen_helper_vsetallnez_b)
-TRANS(vsetallnez_h, LSX, gen_cv, gen_helper_vsetallnez_h)
-TRANS(vsetallnez_w, LSX, gen_cv, gen_helper_vsetallnez_w)
-TRANS(vsetallnez_d, LSX, gen_cv, gen_helper_vsetallnez_d)
+TRANS(vsetanyeqz_b, LSX, gen_cv, 16, gen_helper_vsetanyeqz_b)
+TRANS(vsetanyeqz_h, LSX, gen_cv, 16, gen_helper_vsetanyeqz_h)
+TRANS(vsetanyeqz_w, LSX, gen_cv, 16, gen_helper_vsetanyeqz_w)
+TRANS(vsetanyeqz_d, LSX, gen_cv, 16, gen_helper_vsetanyeqz_d)
+TRANS(vsetallnez_b, LSX, gen_cv, 16, gen_helper_vsetallnez_b)
+TRANS(vsetallnez_h, LSX, gen_cv, 16, gen_helper_vsetallnez_h)
+TRANS(vsetallnez_w, LSX, gen_cv, 16, gen_helper_vsetallnez_w)
+TRANS(vsetallnez_d, LSX, gen_cv, 16, gen_helper_vsetallnez_d)
 
 static bool trans_vinsgr2vr_b(DisasContext *ctx, arg_vr_i *a)
 {
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 42/48] target/loongarch: Implement xvbitsel xvset
  2023-08-30  8:48 ` [PATCH v4 42/48] target/loongarch: Implement xvbitsel xvset Song Gao
@ 2023-08-31  0:32   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-31  0:32 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVBITSEL.V;
> - XVBITSELI.B;
> - XVSET{EQZ/NEZ}.V;
> - XVSETANYEQZ.{B/H/W/D};
> - XVSETALLNEZ.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/helper.h                    | 16 +++----
>   target/loongarch/insns.decode                | 15 +++++++
>   target/loongarch/disas.c                     | 19 ++++++++
>   target/loongarch/vec_helper.c                | 40 ++++++++++-------
>   target/loongarch/insn_trans/trans_lasx.c.inc | 46 ++++++++++++++++++++
>   target/loongarch/insn_trans/trans_lsx.c.inc  | 44 +++++++++----------
>   6 files changed, 134 insertions(+), 46 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 43/48] target/loongarch: Implement xvinsgr2vr xvpickve2gr
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (41 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 42/48] target/loongarch: Implement xvbitsel xvset Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 44/48] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v Song Gao
                   ` (4 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVINSGR2VR.{W/D};
- XVPICKVE2GR.{W/D}[U].

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  7 +++
 target/loongarch/disas.c                     | 18 ++++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 48 ++++++++++++++++++++
 3 files changed, 73 insertions(+)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index ad6751fdfb..bb3bb447ae 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1976,6 +1976,13 @@ xvsetallnez_h    0111 01101001 11001 01101 ..... 00 ...   @cv
 xvsetallnez_w    0111 01101001 11001 01110 ..... 00 ...   @cv
 xvsetallnez_d    0111 01101001 11001 01111 ..... 00 ...   @cv
 
+xvinsgr2vr_w     0111 01101110 10111 10 ... ..... .....   @vr_ui3
+xvinsgr2vr_d     0111 01101110 10111 110 .. ..... .....   @vr_ui2
+xvpickve2gr_w    0111 01101110 11111 10 ... ..... .....   @rv_ui3
+xvpickve2gr_d    0111 01101110 11111 110 .. ..... .....   @rv_ui2
+xvpickve2gr_wu   0111 01101111 00111 10 ... ..... .....   @rv_ui3
+xvpickve2gr_du   0111 01101111 00111 110 .. ..... .....   @rv_ui2
+
 xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 3a06b5cb80..0995d9b794 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1738,6 +1738,17 @@ static void output_vr_x(DisasContext *ctx, arg_vr *a, const char *mnemonic)
     output(ctx, mnemonic, "x%d, r%d", a->vd, a->rj);
 }
 
+static void output_vr_i_x(DisasContext *ctx, arg_vr_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, r%d, 0x%x", a->vd, a->rj, a->imm);
+}
+
+static void output_rv_i_x(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "r%d, x%d, 0x%x", a->rd, a->vj, a->imm);
+}
+
+
 INSN_LASX(xvadd_b,           vvv)
 INSN_LASX(xvadd_h,           vvv)
 INSN_LASX(xvadd_w,           vvv)
@@ -2498,6 +2509,13 @@ INSN_LASX(xvsetallnez_h,     cv)
 INSN_LASX(xvsetallnez_w,     cv)
 INSN_LASX(xvsetallnez_d,     cv)
 
+INSN_LASX(xvinsgr2vr_w,      vr_i)
+INSN_LASX(xvinsgr2vr_d,      vr_i)
+INSN_LASX(xvpickve2gr_w,     rv_i)
+INSN_LASX(xvpickve2gr_d,     rv_i)
+INSN_LASX(xvpickve2gr_wu,    rv_i)
+INSN_LASX(xvpickve2gr_du,    rv_i)
+
 INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 190fe3eecb..541e2b1728 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -754,6 +754,54 @@ TRANS(xvsetallnez_h, LASX, gen_cv, 32, gen_helper_vsetallnez_h)
 TRANS(xvsetallnez_w, LASX, gen_cv, 32, gen_helper_vsetallnez_w)
 TRANS(xvsetallnez_d, LASX, gen_cv, 32, gen_helper_vsetallnez_d)
 
+static bool trans_xvinsgr2vr_w(DisasContext *ctx, arg_vr_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+    return trans_vinsgr2vr_w(ctx, a);
+}
+
+static bool trans_xvinsgr2vr_d(DisasContext *ctx, arg_vr_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+    return trans_vinsgr2vr_d(ctx, a);
+}
+
+static bool trans_xvpickve2gr_w(DisasContext *ctx, arg_rv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+    return trans_vpickve2gr_w(ctx, a);
+}
+
+static bool trans_xvpickve2gr_d(DisasContext *ctx, arg_rv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+    return trans_vpickve2gr_d(ctx, a);
+}
+
+static bool trans_xvpickve2gr_wu(DisasContext *ctx, arg_rv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+    return trans_vpickve2gr_wu(ctx, a);
+}
+
+static bool trans_xvpickve2gr_du(DisasContext *ctx, arg_rv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+    return trans_vpickve2gr_du(ctx, a);
+}
+
 TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 44/48] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (42 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 43/48] target/loongarch: Implement xvinsgr2vr xvpickve2gr Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-30  8:48 ` [PATCH v4 45/48] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
                   ` (3 subsequent siblings)
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVREPLVE.{B/H/W/D};
- XVREPL128VEI.{B/H/W/D};
- XVREPLVE0.{B/H/W/D/Q};
- XVINSVE0.{W/D};
- XVPICKVE.{W/D};
- XVBSLL.V, XVBSRL.V.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h                    |   5 +
 target/loongarch/insns.decode                |  25 ++++
 target/loongarch/disas.c                     |  28 +++++
 target/loongarch/vec_helper.c                |  28 +++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 118 +++++++++++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 111 +++++++++--------
 6 files changed, 269 insertions(+), 46 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index 85233586e3..fb489dda2d 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -668,6 +668,11 @@ DEF_HELPER_4(vsetallnez_h, void, env, i32, i32, i32)
 DEF_HELPER_4(vsetallnez_w, void, env, i32, i32, i32)
 DEF_HELPER_4(vsetallnez_d, void, env, i32, i32, i32)
 
+DEF_HELPER_FLAGS_4(xvinsve0_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvinsve0_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvpickve_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(xvpickve_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+
 DEF_HELPER_FLAGS_4(vpackev_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vpackev_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vpackev_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index bb3bb447ae..74383ba3bc 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -1987,3 +1987,28 @@ xvreplgr2vr_b    0111 01101001 11110 00000 ..... .....    @vr
 xvreplgr2vr_h    0111 01101001 11110 00001 ..... .....    @vr
 xvreplgr2vr_w    0111 01101001 11110 00010 ..... .....    @vr
 xvreplgr2vr_d    0111 01101001 11110 00011 ..... .....    @vr
+
+xvreplve_b       0111 01010010 00100 ..... ..... .....    @vvr
+xvreplve_h       0111 01010010 00101 ..... ..... .....    @vvr
+xvreplve_w       0111 01010010 00110 ..... ..... .....    @vvr
+xvreplve_d       0111 01010010 00111 ..... ..... .....    @vvr
+
+xvrepl128vei_b   0111 01101111 01111 0 .... ..... .....   @vv_ui4
+xvrepl128vei_h   0111 01101111 01111 10 ... ..... .....   @vv_ui3
+xvrepl128vei_w   0111 01101111 01111 110 .. ..... .....   @vv_ui2
+xvrepl128vei_d   0111 01101111 01111 1110 . ..... .....   @vv_ui1
+
+xvreplve0_b      0111 01110000 01110 00000 ..... .....    @vv
+xvreplve0_h      0111 01110000 01111 00000 ..... .....    @vv
+xvreplve0_w      0111 01110000 01111 10000 ..... .....    @vv
+xvreplve0_d      0111 01110000 01111 11000 ..... .....    @vv
+xvreplve0_q      0111 01110000 01111 11100 ..... .....    @vv
+
+xvinsve0_w       0111 01101111 11111 10 ... ..... .....   @vv_ui3
+xvinsve0_d       0111 01101111 11111 110 .. ..... .....   @vv_ui2
+
+xvpickve_w       0111 01110000 00111 10 ... ..... .....   @vv_ui3
+xvpickve_d       0111 01110000 00111 110 .. ..... .....   @vv_ui2
+
+xvbsll_v         0111 01101000 11100 ..... ..... .....    @vv_ui5
+xvbsrl_v         0111 01101000 11101 ..... ..... .....    @vv_ui5
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 0995d9b794..ac7dd3021d 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1748,6 +1748,10 @@ static void output_rv_i_x(DisasContext *ctx, arg_rv_i *a, const char *mnemonic)
     output(ctx, mnemonic, "r%d, x%d, 0x%x", a->rd, a->vj, a->imm);
 }
 
+static void output_vvr_x(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, x%d, r%d", a->vd, a->vj, a->rk);
+}
 
 INSN_LASX(xvadd_b,           vvv)
 INSN_LASX(xvadd_h,           vvv)
@@ -2520,3 +2524,27 @@ INSN_LASX(xvreplgr2vr_b,     vr)
 INSN_LASX(xvreplgr2vr_h,     vr)
 INSN_LASX(xvreplgr2vr_w,     vr)
 INSN_LASX(xvreplgr2vr_d,     vr)
+
+INSN_LASX(xvreplve_b,        vvr)
+INSN_LASX(xvreplve_h,        vvr)
+INSN_LASX(xvreplve_w,        vvr)
+INSN_LASX(xvreplve_d,        vvr)
+INSN_LASX(xvrepl128vei_b,    vv_i)
+INSN_LASX(xvrepl128vei_h,    vv_i)
+INSN_LASX(xvrepl128vei_w,    vv_i)
+INSN_LASX(xvrepl128vei_d,    vv_i)
+
+INSN_LASX(xvreplve0_b,       vv)
+INSN_LASX(xvreplve0_h,       vv)
+INSN_LASX(xvreplve0_w,       vv)
+INSN_LASX(xvreplve0_d,       vv)
+INSN_LASX(xvreplve0_q,       vv)
+
+INSN_LASX(xvinsve0_w,        vv_i)
+INSN_LASX(xvinsve0_d,        vv_i)
+
+INSN_LASX(xvpickve_w,        vv_i)
+INSN_LASX(xvpickve_d,        vv_i)
+
+INSN_LASX(xvbsll_v,          vv_i)
+INSN_LASX(xvbsrl_v,          vv_i)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 1a13342c86..8da95f20a9 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3086,6 +3086,34 @@ SETALLNEZ(vsetallnez_h, MO_16)
 SETALLNEZ(vsetallnez_w, MO_32)
 SETALLNEZ(vsetallnez_d, MO_64)
 
+#define XVINSVE0(NAME, E, MASK)                                    \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    Vd->E(imm & MASK) = Vj->E(0);                                  \
+}
+
+XVINSVE0(xvinsve0_w, W, 0x7)
+XVINSVE0(xvinsve0_d, D, 0x3)
+
+#define XVPICKVE(NAME, E, BIT, MASK)                               \
+void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
+{                                                                  \
+    int i;                                                         \
+    VReg *Vd = (VReg *)vd;                                         \
+    VReg *Vj = (VReg *)vj;                                         \
+    int oprsz = simd_oprsz(desc);                                  \
+                                                                   \
+    Vd->E(0) = Vj->E(imm & MASK);                                  \
+    for (i = 1; i < oprsz / (BIT / 8); i++) {                      \
+        Vd->E(i) = 0;                                              \
+    }                                                              \
+}
+
+XVPICKVE(xvpickve_w, W, 32, 0x7)
+XVPICKVE(xvpickve_d, D, 64, 0x3)
+
 #define VPACKEV(NAME, BIT, E)                                  \
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                              \
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 541e2b1728..5fed2d2b91 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -806,3 +806,121 @@ TRANS(xvreplgr2vr_b, LASX, gvec_dup, 32, MO_8)
 TRANS(xvreplgr2vr_h, LASX, gvec_dup, 32, MO_16)
 TRANS(xvreplgr2vr_w, LASX, gvec_dup, 32, MO_32)
 TRANS(xvreplgr2vr_d, LASX, gvec_dup, 32, MO_64)
+
+TRANS(xvreplve_b, LASX, gen_vreplve, 32, MO_8, 8, tcg_gen_ld8u_i64)
+TRANS(xvreplve_h, LASX, gen_vreplve, 32, MO_16, 16, tcg_gen_ld16u_i64)
+TRANS(xvreplve_w, LASX, gen_vreplve, 32, MO_32, 32, tcg_gen_ld32u_i64)
+TRANS(xvreplve_d, LASX, gen_vreplve, 32, MO_64, 64, tcg_gen_ld_i64)
+
+static bool trans_xvrepl128vei_b(DisasContext *ctx, arg_vv_i * a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+
+    CHECK_VEC;
+
+    tcg_gen_gvec_dup_mem(MO_8,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.B(0)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.B((a->imm))),
+                         16, 16);
+    tcg_gen_gvec_dup_mem(MO_8,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.B(16)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.B((a->imm + 16))),
+                         16, 16);
+    return true;
+}
+
+static bool trans_xvrepl128vei_h(DisasContext *ctx, arg_vv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+
+    CHECK_VEC;
+
+    tcg_gen_gvec_dup_mem(MO_16,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.H(0)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.H((a->imm))),
+                         16, 16);
+    tcg_gen_gvec_dup_mem(MO_16,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.H(8)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.H((a->imm + 8))),
+                         16, 16);
+    return true;
+}
+
+static bool trans_xvrepl128vei_w(DisasContext *ctx, arg_vv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+
+    CHECK_VEC;
+
+    tcg_gen_gvec_dup_mem(MO_32,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.W(0)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.W((a->imm))),
+                         16, 16);
+    tcg_gen_gvec_dup_mem(MO_32,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.W(4)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.W((a->imm + 4))),
+                         16, 16);
+    return true;
+}
+
+static bool trans_xvrepl128vei_d(DisasContext *ctx, arg_vv_i *a)
+{
+    if (!avail_LASX(ctx)) {
+        return false;
+    }
+
+    CHECK_VEC;
+
+    tcg_gen_gvec_dup_mem(MO_64,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.D(0)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.D((a->imm))),
+                         16, 16);
+    tcg_gen_gvec_dup_mem(MO_64,
+                         offsetof(CPULoongArchState, fpr[a->vd].vreg.D(2)),
+                         offsetof(CPULoongArchState,
+                                  fpr[a->vj].vreg.D((a->imm + 2))),
+                         16, 16);
+    return true;
+}
+
+#define XVREPLVE0(NAME, MOP)                                                  \
+static bool trans_## NAME(DisasContext *ctx, arg_vv * a)                      \
+{                                                                             \
+    if (!avail_LASX(ctx)) {                                                   \
+        return false;                                                         \
+    }                                                                         \
+                                                                              \
+    CHECK_VEC;                                                                \
+                                                                              \
+    tcg_gen_gvec_dup_mem(MOP, vec_full_offset(a->vd), vec_full_offset(a->vj), \
+                         32, 32);                                             \
+    return true;                                                              \
+}
+
+XVREPLVE0(xvreplve0_b, MO_8)
+XVREPLVE0(xvreplve0_h, MO_16)
+XVREPLVE0(xvreplve0_w, MO_32)
+XVREPLVE0(xvreplve0_d, MO_64)
+XVREPLVE0(xvreplve0_q, MO_128)
+
+TRANS(xvinsve0_w, LASX, gen_vv_i, 32, gen_helper_xvinsve0_w)
+TRANS(xvinsve0_d, LASX, gen_vv_i, 32, gen_helper_xvinsve0_d)
+
+TRANS(xvpickve_w, LASX, gen_vv_i, 32, gen_helper_xvpickve_w)
+TRANS(xvpickve_d, LASX, gen_vv_i, 32, gen_helper_xvpickve_d)
+
+TRANS(xvbsll_v, LASX, do_vbsll_v, 32)
+TRANS(xvbsrl_v, LASX, do_vbsrl_v, 32)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 2928e878cf..4abb03485a 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -4283,7 +4283,8 @@ static bool trans_vreplvei_d(DisasContext *ctx, arg_vv_i *a)
     return true;
 }
 
-static bool gen_vreplve(DisasContext *ctx, arg_vvr *a, int vece, int bit,
+static bool gen_vreplve(DisasContext *ctx, arg_vvr *a,
+                        uint32_t  oprsz, int vece, int bit,
                         void (*func)(TCGv_i64, TCGv_ptr, tcg_target_long))
 {
     TCGv_i64 t0 = tcg_temp_new_i64();
@@ -4296,62 +4297,73 @@ static bool gen_vreplve(DisasContext *ctx, arg_vvr *a, int vece, int bit,
 
     CHECK_VEC;
 
-    tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), (LSX_LEN/bit) -1);
+    tcg_gen_andi_i64(t0, gpr_src(ctx, a->rk, EXT_NONE), (LSX_LEN / bit) - 1);
     tcg_gen_shli_i64(t0, t0, vece);
     if (HOST_BIG_ENDIAN) {
-        tcg_gen_xori_i64(t0, t0, vece << ((LSX_LEN/bit) -1));
+        tcg_gen_xori_i64(t0, t0, vece << ((LSX_LEN / bit) - 1));
     }
 
     tcg_gen_trunc_i64_ptr(t1, t0);
     tcg_gen_add_ptr(t1, t1, cpu_env);
     func(t2, t1, vec_full_offset(a->vj));
-    tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->vd), 16, ctx->vl/8, t2);
+    tcg_gen_gvec_dup_i64(vece, vec_full_offset(a->vd), 16, 16, t2);
+    if (oprsz == 32) {
+        func(t2, t1,  offsetof(CPULoongArchState, fpr[a->vj].vreg.Q(1)));
+        tcg_gen_gvec_dup_i64(vece,
+                             offsetof(CPULoongArchState, fpr[a->vd].vreg.Q(1)),
+                             16, 16, t2);
+    }
 
     return true;
 }
 
-TRANS(vreplve_b, LSX, gen_vreplve, MO_8,  8, tcg_gen_ld8u_i64)
-TRANS(vreplve_h, LSX, gen_vreplve, MO_16, 16, tcg_gen_ld16u_i64)
-TRANS(vreplve_w, LSX, gen_vreplve, MO_32, 32, tcg_gen_ld32u_i64)
-TRANS(vreplve_d, LSX, gen_vreplve, MO_64, 64, tcg_gen_ld_i64)
+TRANS(vreplve_b, LSX, gen_vreplve, 16, MO_8, 8, tcg_gen_ld8u_i64)
+TRANS(vreplve_h, LSX, gen_vreplve, 16, MO_16, 16, tcg_gen_ld16u_i64)
+TRANS(vreplve_w, LSX, gen_vreplve, 16, MO_32, 32, tcg_gen_ld32u_i64)
+TRANS(vreplve_d, LSX, gen_vreplve, 16, MO_64, 64, tcg_gen_ld_i64)
 
-static bool trans_vbsll_v(DisasContext *ctx, arg_vv_i *a)
+static bool do_vbsll_v(DisasContext *ctx, arg_vv_i *a, uint32_t oprsz)
 {
+    int i, max;
     int ofs;
-    TCGv_i64 desthigh, destlow, high, low;
+    TCGv_i64 desthigh[2], destlow[2], high[2], low[2];
 
     if (!avail_LSX(ctx)) {
         return false;
     }
 
     CHECK_VEC;
+    max = (oprsz == 16) ? 1 : 2;
+
+    for (i = 0; i < max; i++) {
+        desthigh[i] = tcg_temp_new_i64();
+        destlow[i] = tcg_temp_new_i64();
+        high[i] = tcg_temp_new_i64();
+        low[i] = tcg_temp_new_i64();
+
+        get_vreg64(low[i], a->vj, 2 * i);
+
+        ofs = ((a->imm) & 0xf) * 8;
+        if (ofs < 64) {
+            get_vreg64(high[i], a->vj, 2 * i + 1);
+            tcg_gen_extract2_i64(desthigh[i], low[i], high[i], 64 - ofs);
+            tcg_gen_shli_i64(destlow[i], low[i], ofs);
+        } else {
+            tcg_gen_shli_i64(desthigh[i], low[i], ofs - 64);
+            destlow[i] = tcg_constant_i64(0);
+        }
 
-    desthigh = tcg_temp_new_i64();
-    destlow = tcg_temp_new_i64();
-    high = tcg_temp_new_i64();
-    low = tcg_temp_new_i64();
-
-    get_vreg64(low, a->vj, 0);
-
-    ofs = ((a->imm) & 0xf) * 8;
-    if (ofs < 64) {
-        get_vreg64(high, a->vj, 1);
-        tcg_gen_extract2_i64(desthigh, low, high, 64 - ofs);
-        tcg_gen_shli_i64(destlow, low, ofs);
-    } else {
-        tcg_gen_shli_i64(desthigh, low, ofs - 64);
-        destlow = tcg_constant_i64(0);
+        set_vreg64(desthigh[i], a->vd, 2 * i + 1);
+        set_vreg64(destlow[i], a->vd, 2 * i);
     }
 
-    set_vreg64(desthigh, a->vd, 1);
-    set_vreg64(destlow, a->vd, 0);
-
     return true;
 }
 
-static bool trans_vbsrl_v(DisasContext *ctx, arg_vv_i *a)
+static bool do_vbsrl_v(DisasContext *ctx, arg_vv_i *a, uint32_t oprsz)
 {
-    TCGv_i64 desthigh, destlow, high, low;
+    int i, max;
+    TCGv_i64 desthigh[2], destlow[2], high[2], low[2];
     int ofs;
 
     if (!avail_LSX(ctx)) {
@@ -4360,29 +4372,36 @@ static bool trans_vbsrl_v(DisasContext *ctx, arg_vv_i *a)
 
     CHECK_VEC;
 
-    desthigh = tcg_temp_new_i64();
-    destlow = tcg_temp_new_i64();
-    high = tcg_temp_new_i64();
-    low = tcg_temp_new_i64();
+    max = (oprsz == 16) ? 1 : 2;
 
-    get_vreg64(high, a->vj, 1);
+    for (i = 0; i < max; i++) {
+        desthigh[i] = tcg_temp_new_i64();
+        destlow[i] = tcg_temp_new_i64();
+        high[i] = tcg_temp_new_i64();
+        low[i] = tcg_temp_new_i64();
 
-    ofs = ((a->imm) & 0xf) * 8;
-    if (ofs < 64) {
-        get_vreg64(low, a->vj, 0);
-        tcg_gen_extract2_i64(destlow, low, high, ofs);
-        tcg_gen_shri_i64(desthigh, high, ofs);
-    } else {
-        tcg_gen_shri_i64(destlow, high, ofs - 64);
-        desthigh = tcg_constant_i64(0);
-    }
+        get_vreg64(high[i], a->vj, 2 * i + 1);
+
+        ofs = ((a->imm) & 0xf) * 8;
+        if (ofs < 64) {
+            get_vreg64(low[i], a->vj, 2 * i);
+            tcg_gen_extract2_i64(destlow[i], low[i], high[i], ofs);
+            tcg_gen_shri_i64(desthigh[i], high[i], ofs);
+        } else {
+            tcg_gen_shri_i64(destlow[i], high[i], ofs - 64);
+            desthigh[i] = tcg_constant_i64(0);
+        }
 
-    set_vreg64(desthigh, a->vd, 1);
-    set_vreg64(destlow, a->vd, 0);
+        set_vreg64(desthigh[i], a->vd, 2 * i + 1);
+        set_vreg64(destlow[i], a->vd, 2 * i);
+    }
 
     return true;
 }
 
+TRANS(vbsll_v, LSX, do_vbsll_v, 16)
+TRANS(vbsrl_v, LSX, do_vbsrl_v, 16)
+
 TRANS(vpackev_b, LSX, gen_vvv, 16, gen_helper_vpackev_b)
 TRANS(vpackev_h, LSX, gen_vvv, 16, gen_helper_vpackev_h)
 TRANS(vpackev_w, LSX, gen_vvv, 16, gen_helper_vpackev_w)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 45/48] target/loongarch: Implement xvpack xvpick xvilv{l/h}
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (43 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 44/48] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v Song Gao
@ 2023-08-30  8:48 ` Song Gao
  2023-08-31  0:35   ` Richard Henderson
  2023-08-30  8:49 ` [PATCH v4 46/48] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins Song Gao
                   ` (2 subsequent siblings)
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVPACK{EV/OD}.{B/H/W/D};
- XVPICK{EV/OD}.{B/H/W/D};
- XVILV{L/H}.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                |  27 ++++
 target/loongarch/disas.c                     |  27 ++++
 target/loongarch/vec_helper.c                | 138 +++++++++++--------
 target/loongarch/insn_trans/trans_lasx.c.inc |  27 ++++
 4 files changed, 159 insertions(+), 60 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 74383ba3bc..a325b861c1 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -2012,3 +2012,30 @@ xvpickve_d       0111 01110000 00111 110 .. ..... .....   @vv_ui2
 
 xvbsll_v         0111 01101000 11100 ..... ..... .....    @vv_ui5
 xvbsrl_v         0111 01101000 11101 ..... ..... .....    @vv_ui5
+
+xvpackev_b       0111 01010001 01100 ..... ..... .....    @vvv
+xvpackev_h       0111 01010001 01101 ..... ..... .....    @vvv
+xvpackev_w       0111 01010001 01110 ..... ..... .....    @vvv
+xvpackev_d       0111 01010001 01111 ..... ..... .....    @vvv
+xvpackod_b       0111 01010001 10000 ..... ..... .....    @vvv
+xvpackod_h       0111 01010001 10001 ..... ..... .....    @vvv
+xvpackod_w       0111 01010001 10010 ..... ..... .....    @vvv
+xvpackod_d       0111 01010001 10011 ..... ..... .....    @vvv
+
+xvpickev_b       0111 01010001 11100 ..... ..... .....    @vvv
+xvpickev_h       0111 01010001 11101 ..... ..... .....    @vvv
+xvpickev_w       0111 01010001 11110 ..... ..... .....    @vvv
+xvpickev_d       0111 01010001 11111 ..... ..... .....    @vvv
+xvpickod_b       0111 01010010 00000 ..... ..... .....    @vvv
+xvpickod_h       0111 01010010 00001 ..... ..... .....    @vvv
+xvpickod_w       0111 01010010 00010 ..... ..... .....    @vvv
+xvpickod_d       0111 01010010 00011 ..... ..... .....    @vvv
+
+xvilvl_b         0111 01010001 10100 ..... ..... .....    @vvv
+xvilvl_h         0111 01010001 10101 ..... ..... .....    @vvv
+xvilvl_w         0111 01010001 10110 ..... ..... .....    @vvv
+xvilvl_d         0111 01010001 10111 ..... ..... .....    @vvv
+xvilvh_b         0111 01010001 11000 ..... ..... .....    @vvv
+xvilvh_h         0111 01010001 11001 ..... ..... .....    @vvv
+xvilvh_w         0111 01010001 11010 ..... ..... .....    @vvv
+xvilvh_d         0111 01010001 11011 ..... ..... .....    @vvv
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index ac7dd3021d..9b6a07bbb0 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2548,3 +2548,30 @@ INSN_LASX(xvpickve_d,        vv_i)
 
 INSN_LASX(xvbsll_v,          vv_i)
 INSN_LASX(xvbsrl_v,          vv_i)
+
+INSN_LASX(xvpackev_b,        vvv)
+INSN_LASX(xvpackev_h,        vvv)
+INSN_LASX(xvpackev_w,        vvv)
+INSN_LASX(xvpackev_d,        vvv)
+INSN_LASX(xvpackod_b,        vvv)
+INSN_LASX(xvpackod_h,        vvv)
+INSN_LASX(xvpackod_w,        vvv)
+INSN_LASX(xvpackod_d,        vvv)
+
+INSN_LASX(xvpickev_b,        vvv)
+INSN_LASX(xvpickev_h,        vvv)
+INSN_LASX(xvpickev_w,        vvv)
+INSN_LASX(xvpickev_d,        vvv)
+INSN_LASX(xvpickod_b,        vvv)
+INSN_LASX(xvpickod_h,        vvv)
+INSN_LASX(xvpickod_w,        vvv)
+INSN_LASX(xvpickod_d,        vvv)
+
+INSN_LASX(xvilvl_b,          vvv)
+INSN_LASX(xvilvl_h,          vvv)
+INSN_LASX(xvilvl_w,          vvv)
+INSN_LASX(xvilvl_d,          vvv)
+INSN_LASX(xvilvh_b,          vvv)
+INSN_LASX(xvilvh_h,          vvv)
+INSN_LASX(xvilvh_w,          vvv)
+INSN_LASX(xvilvh_d,          vvv)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 8da95f20a9..34be19891a 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3118,12 +3118,13 @@ XVPICKVE(xvpickve_d, D, 64, 0x3)
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                              \
     int i;                                                     \
-    VReg temp;                                                 \
+    VReg temp = {};                                            \
     VReg *Vd = (VReg *)vd;                                     \
     VReg *Vj = (VReg *)vj;                                     \
     VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
         temp.E(2 * i + 1) = Vj->E(2 * i);                      \
         temp.E(2 *i) = Vk->E(2 * i);                           \
     }                                                          \
@@ -3139,12 +3140,13 @@ VPACKEV(vpackev_d, 128, D)
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                              \
     int i;                                                     \
-    VReg temp;                                                 \
+    VReg temp = {};                                            \
     VReg *Vd = (VReg *)vd;                                     \
     VReg *Vj = (VReg *)vj;                                     \
     VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
                                                                \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
+    for (i = 0; i < oprsz / (BIT / 8); i++) {                  \
         temp.E(2 * i + 1) = Vj->E(2 * i + 1);                  \
         temp.E(2 * i) = Vk->E(2 * i + 1);                      \
     }                                                          \
@@ -3156,20 +3158,24 @@ VPACKOD(vpackod_h, 32, H)
 VPACKOD(vpackod_w, 64, W)
 VPACKOD(vpackod_d, 128, D)
 
-#define VPICKEV(NAME, BIT, E)                                  \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{                                                              \
-    int i;                                                     \
-    VReg temp;                                                 \
-    VReg *Vd = (VReg *)vd;                                     \
-    VReg *Vj = (VReg *)vj;                                     \
-    VReg *Vk = (VReg *)vk;                                     \
-                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
-        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i);                \
-        temp.E(i) = Vk->E(2 * i);                              \
-    }                                                          \
-    *Vd = temp;                                                \
+#define VPICKEV(NAME, BIT, E)                                         \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)        \
+{                                                                     \
+    int i, j, ofs;                                                    \
+    VReg temp = {};                                                   \
+    VReg *Vd = (VReg *)vd;                                            \
+    VReg *Vj = (VReg *)vj;                                            \
+    VReg *Vk = (VReg *)vk;                                            \
+    int oprsz = simd_oprsz(desc);                                     \
+                                                                      \
+    ofs = LSX_LEN / BIT;                                              \
+    for (i = 0; i < oprsz / 16; i++) {                                \
+        for (j = 0; j < ofs; j++) {                                   \
+            temp.E(j + ofs * (2 * i + 1)) = Vj->E(2 * (j + ofs * i)); \
+            temp.E(j + ofs * 2 * i) = Vk->E(2 * (j + ofs * i));       \
+        }                                                             \
+    }                                                                 \
+    *Vd = temp;                                                       \
 }
 
 VPICKEV(vpickev_b, 16, B)
@@ -3177,20 +3183,24 @@ VPICKEV(vpickev_h, 32, H)
 VPICKEV(vpickev_w, 64, W)
 VPICKEV(vpickev_d, 128, D)
 
-#define VPICKOD(NAME, BIT, E)                                  \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{                                                              \
-    int i;                                                     \
-    VReg temp;                                                 \
-    VReg *Vd = (VReg *)vd;                                     \
-    VReg *Vj = (VReg *)vj;                                     \
-    VReg *Vk = (VReg *)vk;                                     \
-                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
-        temp.E(i + LSX_LEN/BIT) = Vj->E(2 * i + 1);            \
-        temp.E(i) = Vk->E(2 * i + 1);                          \
-    }                                                          \
-    *Vd = temp;                                                \
+#define VPICKOD(NAME, BIT, E)                                             \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg temp = {};                                                       \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / BIT;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            temp.E(j + ofs * (2 * i + 1)) = Vj->E(2 * (j + ofs * i) + 1); \
+            temp.E(j + ofs * 2 * i) = Vk->E(2 * (j + ofs * i) + 1);       \
+        }                                                                 \
+    }                                                                     \
+    *Vd = temp;                                                           \
 }
 
 VPICKOD(vpickod_b, 16, B)
@@ -3198,20 +3208,24 @@ VPICKOD(vpickod_h, 32, H)
 VPICKOD(vpickod_w, 64, W)
 VPICKOD(vpickod_d, 128, D)
 
-#define VILVL(NAME, BIT, E)                                    \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{                                                              \
-    int i;                                                     \
-    VReg temp;                                                 \
-    VReg *Vd = (VReg *)vd;                                     \
-    VReg *Vj = (VReg *)vj;                                     \
-    VReg *Vk = (VReg *)vk;                                     \
-                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
-        temp.E(2 * i + 1) = Vj->E(i);                          \
-        temp.E(2 * i) = Vk->E(i);                              \
-    }                                                          \
-    *Vd = temp;                                                \
+#define VILVL(NAME, BIT, E)                                         \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)      \
+{                                                                   \
+    int i, j, ofs;                                                  \
+    VReg temp = {};                                                 \
+    VReg *Vd = (VReg *)vd;                                          \
+    VReg *Vj = (VReg *)vj;                                          \
+    VReg *Vk = (VReg *)vk;                                          \
+    int oprsz = simd_oprsz(desc);                                   \
+                                                                    \
+    ofs = LSX_LEN / BIT;                                            \
+    for (i = 0; i < oprsz / 16; i++) {                              \
+        for (j = 0; j < ofs; j++) {                                 \
+            temp.E(2 * (j + ofs * i) + 1) = Vj->E(j + ofs * 2 * i); \
+            temp.E(2 * (j + ofs * i)) = Vk->E(j + ofs * 2 * i);     \
+        }                                                           \
+    }                                                               \
+    *Vd = temp;                                                     \
 }
 
 VILVL(vilvl_b, 16, B)
@@ -3219,20 +3233,24 @@ VILVL(vilvl_h, 32, H)
 VILVL(vilvl_w, 64, W)
 VILVL(vilvl_d, 128, D)
 
-#define VILVH(NAME, BIT, E)                                    \
-void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
-{                                                              \
-    int i;                                                     \
-    VReg temp;                                                 \
-    VReg *Vd = (VReg *)vd;                                     \
-    VReg *Vj = (VReg *)vj;                                     \
-    VReg *Vk = (VReg *)vk;                                     \
-                                                               \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                        \
-        temp.E(2 * i + 1) = Vj->E(i + LSX_LEN/BIT);            \
-        temp.E(2 * i) = Vk->E(i + LSX_LEN/BIT);                \
-    }                                                          \
-    *Vd = temp;                                                \
+#define VILVH(NAME, BIT, E)                                               \
+void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc)            \
+{                                                                         \
+    int i, j, ofs;                                                        \
+    VReg temp = {};                                                       \
+    VReg *Vd = (VReg *)vd;                                                \
+    VReg *Vj = (VReg *)vj;                                                \
+    VReg *Vk = (VReg *)vk;                                                \
+    int oprsz = simd_oprsz(desc);                                         \
+                                                                          \
+    ofs = LSX_LEN / BIT;                                                  \
+    for (i = 0; i < oprsz / 16; i++) {                                    \
+        for (j = 0; j < ofs; j++) {                                       \
+            temp.E(2 * (j + ofs * i) + 1) = Vj->E(j + ofs * (2 * i + 1)); \
+            temp.E(2 * (j + ofs * i)) = Vk->E(j + ofs * (2 * i + 1));     \
+        }                                                                 \
+    }                                                                     \
+    *Vd = temp;                                                           \
 }
 
 VILVH(vilvh_b, 16, B)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index 5fed2d2b91..aa374f3a00 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -924,3 +924,30 @@ TRANS(xvpickve_d, LASX, gen_vv_i, 32, gen_helper_xvpickve_d)
 
 TRANS(xvbsll_v, LASX, do_vbsll_v, 32)
 TRANS(xvbsrl_v, LASX, do_vbsrl_v, 32)
+
+TRANS(xvpackev_b, LASX, gen_vvv, 32, gen_helper_vpackev_b)
+TRANS(xvpackev_h, LASX, gen_vvv, 32, gen_helper_vpackev_h)
+TRANS(xvpackev_w, LASX, gen_vvv, 32, gen_helper_vpackev_w)
+TRANS(xvpackev_d, LASX, gen_vvv, 32, gen_helper_vpackev_d)
+TRANS(xvpackod_b, LASX, gen_vvv, 32, gen_helper_vpackod_b)
+TRANS(xvpackod_h, LASX, gen_vvv, 32, gen_helper_vpackod_h)
+TRANS(xvpackod_w, LASX, gen_vvv, 32, gen_helper_vpackod_w)
+TRANS(xvpackod_d, LASX, gen_vvv, 32, gen_helper_vpackod_d)
+
+TRANS(xvpickev_b, LASX, gen_vvv, 32, gen_helper_vpickev_b)
+TRANS(xvpickev_h, LASX, gen_vvv, 32, gen_helper_vpickev_h)
+TRANS(xvpickev_w, LASX, gen_vvv, 32, gen_helper_vpickev_w)
+TRANS(xvpickev_d, LASX, gen_vvv, 32, gen_helper_vpickev_d)
+TRANS(xvpickod_b, LASX, gen_vvv, 32, gen_helper_vpickod_b)
+TRANS(xvpickod_h, LASX, gen_vvv, 32, gen_helper_vpickod_h)
+TRANS(xvpickod_w, LASX, gen_vvv, 32, gen_helper_vpickod_w)
+TRANS(xvpickod_d, LASX, gen_vvv, 32, gen_helper_vpickod_d)
+
+TRANS(xvilvl_b, LASX, gen_vvv, 32, gen_helper_vilvl_b)
+TRANS(xvilvl_h, LASX, gen_vvv, 32, gen_helper_vilvl_h)
+TRANS(xvilvl_w, LASX, gen_vvv, 32, gen_helper_vilvl_w)
+TRANS(xvilvl_d, LASX, gen_vvv, 32, gen_helper_vilvl_d)
+TRANS(xvilvh_b, LASX, gen_vvv, 32, gen_helper_vilvh_b)
+TRANS(xvilvh_h, LASX, gen_vvv, 32, gen_helper_vilvh_h)
+TRANS(xvilvh_w, LASX, gen_vvv, 32, gen_helper_vilvh_w)
+TRANS(xvilvh_d, LASX, gen_vvv, 32, gen_helper_vilvh_d)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 45/48] target/loongarch: Implement xvpack xvpick xvilv{l/h}
  2023-08-30  8:48 ` [PATCH v4 45/48] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
@ 2023-08-31  0:35   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-31  0:35 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:48, Song Gao wrote:
> This patch includes:
> - XVPACK{EV/OD}.{B/H/W/D};
> - XVPICK{EV/OD}.{B/H/W/D};
> - XVILV{L/H}.{B/H/W/D}.
> 
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/insns.decode                |  27 ++++
>   target/loongarch/disas.c                     |  27 ++++
>   target/loongarch/vec_helper.c                | 138 +++++++++++--------
>   target/loongarch/insn_trans/trans_lasx.c.inc |  27 ++++
>   4 files changed, 159 insertions(+), 60 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v4 46/48] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (44 preceding siblings ...)
  2023-08-30  8:48 ` [PATCH v4 45/48] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
@ 2023-08-30  8:49 ` Song Gao
  2023-08-30  8:49 ` [PATCH v4 47/48] target/loongarch: Implement xvld xvst Song Gao
  2023-08-30  8:49 ` [PATCH v4 48/48] target/loongarch: CPUCFG support LASX Song Gao
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVSHUF.{B/H/W/D};
- XVPERM.W;
- XVSHUF4i.{B/H/W/D};
- XVPERMI.{W/D/Q};
- XVEXTRINS.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/helper.h                    |   3 +
 target/loongarch/vec.h                       |   2 +
 target/loongarch/insns.decode                |  21 ++++
 target/loongarch/disas.c                     |  21 ++++
 target/loongarch/vec_helper.c                | 112 +++++++++++++++----
 target/loongarch/insn_trans/trans_lasx.c.inc |  21 ++++
 6 files changed, 161 insertions(+), 19 deletions(-)

diff --git a/target/loongarch/helper.h b/target/loongarch/helper.h
index fb489dda2d..b3b64a0215 100644
--- a/target/loongarch/helper.h
+++ b/target/loongarch/helper.h
@@ -709,7 +709,10 @@ DEF_HELPER_FLAGS_4(vshuf4i_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vshuf4i_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vshuf4i_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
+DEF_HELPER_FLAGS_4(vperm_w, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
 DEF_HELPER_FLAGS_4(vpermi_w, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vpermi_d, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
+DEF_HELPER_FLAGS_4(vpermi_q, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 
 DEF_HELPER_FLAGS_4(vextrins_b, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
 DEF_HELPER_FLAGS_4(vextrins_h, TCG_CALL_NO_RWG, void, ptr, ptr, i64, i32)
diff --git a/target/loongarch/vec.h b/target/loongarch/vec.h
index bc74effb7c..61e5b69c1e 100644
--- a/target/loongarch/vec.h
+++ b/target/loongarch/vec.h
@@ -93,4 +93,6 @@
 #define VSLE(a, b) (a <= b ? -1 : 0)
 #define VSLT(a, b) (a < b ? -1 : 0)
 
+#define SHF_POS(i, imm) (((i) & 0xfc) + (((imm) >> (2 * ((i) & 0x03))) & 0x03))
+
 #endif /* LOONGARCH_VEC_H */
diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index a325b861c1..64b67ee9ac 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -2039,3 +2039,24 @@ xvilvh_b         0111 01010001 11000 ..... ..... .....    @vvv
 xvilvh_h         0111 01010001 11001 ..... ..... .....    @vvv
 xvilvh_w         0111 01010001 11010 ..... ..... .....    @vvv
 xvilvh_d         0111 01010001 11011 ..... ..... .....    @vvv
+
+xvshuf_b         0000 11010110 ..... ..... ..... .....    @vvvv
+xvshuf_h         0111 01010111 10101 ..... ..... .....    @vvv
+xvshuf_w         0111 01010111 10110 ..... ..... .....    @vvv
+xvshuf_d         0111 01010111 10111 ..... ..... .....    @vvv
+
+xvperm_w         0111 01010111 11010 ..... ..... .....    @vvv
+
+xvshuf4i_b       0111 01111001 00 ........ ..... .....    @vv_ui8
+xvshuf4i_h       0111 01111001 01 ........ ..... .....    @vv_ui8
+xvshuf4i_w       0111 01111001 10 ........ ..... .....    @vv_ui8
+xvshuf4i_d       0111 01111001 11 ........ ..... .....    @vv_ui8
+
+xvpermi_w        0111 01111110 01 ........ ..... .....    @vv_ui8
+xvpermi_d        0111 01111110 10 ........ ..... .....    @vv_ui8
+xvpermi_q        0111 01111110 11 ........ ..... .....    @vv_ui8
+
+xvextrins_d      0111 01111000 00 ........ ..... .....    @vv_ui8
+xvextrins_w      0111 01111000 01 ........ ..... .....    @vv_ui8
+xvextrins_h      0111 01111000 10 ........ ..... .....    @vv_ui8
+xvextrins_b      0111 01111000 11 ........ ..... .....    @vv_ui8
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index 9b6a07bbb0..a518c59772 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -2575,3 +2575,24 @@ INSN_LASX(xvilvh_b,          vvv)
 INSN_LASX(xvilvh_h,          vvv)
 INSN_LASX(xvilvh_w,          vvv)
 INSN_LASX(xvilvh_d,          vvv)
+
+INSN_LASX(xvshuf_b,          vvvv)
+INSN_LASX(xvshuf_h,          vvv)
+INSN_LASX(xvshuf_w,          vvv)
+INSN_LASX(xvshuf_d,          vvv)
+
+INSN_LASX(xvperm_w,          vvv)
+
+INSN_LASX(xvshuf4i_b,        vv_i)
+INSN_LASX(xvshuf4i_h,        vv_i)
+INSN_LASX(xvshuf4i_w,        vv_i)
+INSN_LASX(xvshuf4i_d,        vv_i)
+
+INSN_LASX(xvpermi_w,         vv_i)
+INSN_LASX(xvpermi_d,         vv_i)
+INSN_LASX(xvpermi_q,         vv_i)
+
+INSN_LASX(xvextrins_d,       vv_i)
+INSN_LASX(xvextrins_w,       vv_i)
+INSN_LASX(xvextrins_h,       vv_i)
+INSN_LASX(xvextrins_b,       vv_i)
diff --git a/target/loongarch/vec_helper.c b/target/loongarch/vec_helper.c
index 34be19891a..97058ac2b3 100644
--- a/target/loongarch/vec_helper.c
+++ b/target/loongarch/vec_helper.c
@@ -3261,17 +3261,24 @@ VILVH(vilvh_d, 128, D)
 void HELPER(vshuf_b)(void *vd, void *vj, void *vk, void *va, uint32_t desc)
 {
     int i, m;
-    VReg temp;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
     VReg *Vk = (VReg *)vk;
     VReg *Va = (VReg *)va;
+    int oprsz = simd_oprsz(desc);
 
-    m = LSX_LEN/8;
-    for (i = 0; i < m ; i++) {
+    m = LSX_LEN / 8;
+    for (i = 0; i < m; i++) {
         uint64_t k = (uint8_t)Va->B(i) % (2 * m);
         temp.B(i) = k < m ? Vk->B(k) : Vj->B(k - m);
     }
+    if (oprsz == 32) {
+        for(i = m; i < 2 * m; i++) {
+            uint64_t j = (uint8_t)Va->B(i) % (2 * m);
+            temp.B(i) = j < m ? Vk->B(j + m) : Vj->B(j);
+        }
+    }
     *Vd = temp;
 }
 
@@ -3279,16 +3286,23 @@ void HELPER(vshuf_b)(void *vd, void *vj, void *vk, void *va, uint32_t desc)
 void HELPER(NAME)(void *vd, void *vj, void *vk, uint32_t desc) \
 {                                                              \
     int i, m;                                                  \
-    VReg temp;                                                 \
+    VReg temp = {};                                            \
     VReg *Vd = (VReg *)vd;                                     \
     VReg *Vj = (VReg *)vj;                                     \
     VReg *Vk = (VReg *)vk;                                     \
+    int oprsz = simd_oprsz(desc);                              \
                                                                \
-    m = LSX_LEN/BIT;                                           \
+    m = LSX_LEN / BIT;                                         \
     for (i = 0; i < m; i++) {                                  \
-        uint64_t k  = ((uint8_t) Vd->E(i)) % (2 * m);          \
+        uint64_t k  = (uint8_t)Vd->E(i) % (2 * m);             \
         temp.E(i) = k < m ? Vk->E(k) : Vj->E(k - m);           \
     }                                                          \
+    if (oprsz == 32) {                                         \
+        for (i = m; i < 2 * m; i++) {                          \
+            uint64_t j = (uint8_t)Vd->E(i) % (2 * m);          \
+            temp.E(i) = j < m ? Vk->E(j + m): Vj->E(j);        \
+        }                                                      \
+    }                                                          \
     *Vd = temp;                                                \
 }
 
@@ -3299,14 +3313,20 @@ VSHUF(vshuf_d, 64, D)
 #define VSHUF4I(NAME, BIT, E)                                      \
 void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
 {                                                                  \
-    int i;                                                         \
-    VReg temp;                                                     \
+    int i, max;                                                    \
+    VReg temp = {};                                                \
     VReg *Vd = (VReg *)vd;                                         \
     VReg *Vj = (VReg *)vj;                                         \
+    int oprsz = simd_oprsz(desc);                                  \
                                                                    \
-    for (i = 0; i < LSX_LEN/BIT; i++) {                            \
-         temp.E(i) = Vj->E(((i) & 0xfc) + (((imm) >>               \
-                           (2 * ((i) & 0x03))) & 0x03));           \
+    max = LSX_LEN / BIT;                                           \
+    for (i = 0; i < max; i++) {                                    \
+        temp.E(i) = Vj->E(SHF_POS(i, imm));                        \
+    }                                                              \
+    if (oprsz == 32) {                                             \
+        for (i = max; i < 2 * max; i++) {                          \
+            temp.E(i) = Vj->E(SHF_POS(i - max, imm) + max);        \
+        }                                                          \
     }                                                              \
     *Vd = temp;                                                    \
 }
@@ -3317,38 +3337,92 @@ VSHUF4I(vshuf4i_w, 32, W)
 
 void HELPER(vshuf4i_d)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
+    int i;
+    VReg temp = {};
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
 
-    VReg temp;
-    temp.D(0) = (imm & 2 ? Vj : Vd)->D(imm & 1);
-    temp.D(1) = (imm & 8 ? Vj : Vd)->D((imm >> 2) & 1);
+    for (i = 0; i < oprsz / 16; i++) {
+        temp.D(2 * i) = (imm & 2 ? Vj : Vd)->D((imm & 1) + 2 * i);
+        temp.D(2 * i + 1) = (imm & 8 ? Vj : Vd)->D(((imm >> 2) & 1) + 2 * i);
+    }
+    *Vd = temp;
+}
+
+void HELPER(vperm_w)(void *vd, void *vj, void *vk, uint32_t desc)
+{
+    int i, m;
+    VReg temp = {};
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    VReg *Vk = (VReg *)vk;
+
+    m = LASX_LEN / 32;
+    for (i = 0; i < m ; i++) {
+        uint64_t k = (uint8_t)Vk->W(i) % 8;
+        temp.W(i) = Vj->W(k);
+    }
     *Vd = temp;
 }
 
 void HELPER(vpermi_w)(void *vd, void *vj, uint64_t imm, uint32_t desc)
+{
+    int i;
+    VReg temp = {};
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+    int oprsz = simd_oprsz(desc);
+
+    for (i = 0; i < oprsz / 16; i++) {
+        temp.W(4 * i) = Vj->W((imm & 0x3) + 4 * i);
+        temp.W(4 * i + 1) = Vj->W(((imm >> 2) & 0x3) + 4 * i);
+        temp.W(4 * i + 2) = Vd->W(((imm >> 4) & 0x3) + 4 * i);
+        temp.W(4 * i + 3) = Vd->W(((imm >> 6) & 0x3) + 4 * i);
+    }
+    *Vd = temp;
+}
+
+void HELPER(vpermi_d)(void *vd, void *vj, uint64_t imm, uint32_t desc)
+{
+    VReg temp = {};
+    VReg *Vd = (VReg *)vd;
+    VReg *Vj = (VReg *)vj;
+
+    temp.D(0) = Vj->D(imm & 0x3);
+    temp.D(1) = Vj->D((imm >> 2) & 0x3);
+    temp.D(2) = Vj->D((imm >> 4) & 0x3);
+    temp.D(3) = Vj->D((imm >> 6) & 0x3);
+    *Vd = temp;
+}
+
+void HELPER(vpermi_q)(void *vd, void *vj, uint64_t imm, uint32_t desc)
 {
     VReg temp;
     VReg *Vd = (VReg *)vd;
     VReg *Vj = (VReg *)vj;
 
-    temp.W(0) = Vj->W(imm & 0x3);
-    temp.W(1) = Vj->W((imm >> 2) & 0x3);
-    temp.W(2) = Vd->W((imm >> 4) & 0x3);
-    temp.W(3) = Vd->W((imm >> 6) & 0x3);
+    temp.Q(0) = (imm & 0x3) > 1 ? Vd->Q((imm & 0x3) - 2) : Vj->Q(imm & 0x3);
+    temp.Q(1) = ((imm >> 4) & 0x3) > 1 ? Vd->Q(((imm >> 4) & 0x3) - 2) :
+                                         Vj->Q((imm >> 4) & 0x3);
     *Vd = temp;
 }
 
 #define VEXTRINS(NAME, BIT, E, MASK)                               \
 void HELPER(NAME)(void *vd, void *vj, uint64_t imm, uint32_t desc) \
 {                                                                  \
-    int ins, extr;                                                 \
+    int ins, extr, max;                                            \
     VReg *Vd = (VReg *)vd;                                         \
     VReg *Vj = (VReg *)vj;                                         \
+    int oprsz = simd_oprsz(desc);                                  \
                                                                    \
+    max = LSX_LEN / BIT;                                           \
     ins = (imm >> 4) & MASK;                                       \
     extr = imm & MASK;                                             \
     Vd->E(ins) = Vj->E(extr);                                      \
+    if (oprsz == 32) {                                             \
+        Vd->E(ins + max) = Vj->E(extr + max);                      \
+    }                                                              \
 }
 
 VEXTRINS(vextrins_b, 8, B, 0xf)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index aa374f3a00..ebbbc5a6bb 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -951,3 +951,24 @@ TRANS(xvilvh_b, LASX, gen_vvv, 32, gen_helper_vilvh_b)
 TRANS(xvilvh_h, LASX, gen_vvv, 32, gen_helper_vilvh_h)
 TRANS(xvilvh_w, LASX, gen_vvv, 32, gen_helper_vilvh_w)
 TRANS(xvilvh_d, LASX, gen_vvv, 32, gen_helper_vilvh_d)
+
+TRANS(xvshuf_b, LASX, gen_vvvv, 32, gen_helper_vshuf_b)
+TRANS(xvshuf_h, LASX, gen_vvv, 32, gen_helper_vshuf_h)
+TRANS(xvshuf_w, LASX, gen_vvv, 32, gen_helper_vshuf_w)
+TRANS(xvshuf_d, LASX, gen_vvv, 32, gen_helper_vshuf_d)
+
+TRANS(xvperm_w, LASX, gen_vvv, 32,  gen_helper_vperm_w)
+
+TRANS(xvshuf4i_b, LASX, gen_vv_i, 32, gen_helper_vshuf4i_b)
+TRANS(xvshuf4i_h, LASX, gen_vv_i, 32, gen_helper_vshuf4i_h)
+TRANS(xvshuf4i_w, LASX, gen_vv_i, 32, gen_helper_vshuf4i_w)
+TRANS(xvshuf4i_d, LASX, gen_vv_i, 32, gen_helper_vshuf4i_d)
+
+TRANS(xvpermi_w, LASX, gen_vv_i, 32, gen_helper_vpermi_w)
+TRANS(xvpermi_d, LASX, gen_vv_i, 32, gen_helper_vpermi_d)
+TRANS(xvpermi_q, LASX, gen_vv_i, 32, gen_helper_vpermi_q)
+
+TRANS(xvextrins_b, LASX, gen_vv_i, 32, gen_helper_vextrins_b)
+TRANS(xvextrins_h, LASX, gen_vv_i, 32, gen_helper_vextrins_h)
+TRANS(xvextrins_w, LASX, gen_vv_i, 32, gen_helper_vextrins_w)
+TRANS(xvextrins_d, LASX, gen_vv_i, 32, gen_helper_vextrins_d)
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 47/48] target/loongarch: Implement xvld xvst
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (45 preceding siblings ...)
  2023-08-30  8:49 ` [PATCH v4 46/48] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins Song Gao
@ 2023-08-30  8:49 ` Song Gao
  2023-08-30  8:49 ` [PATCH v4 48/48] target/loongarch: CPUCFG support LASX Song Gao
  47 siblings, 0 replies; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

This patch includes:
- XVLD[X], XVST[X];
- XVLDREPL.{B/H/W/D};
- XVSTELM.{B/H/W/D}.

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/insns.decode                | 18 +++++
 target/loongarch/disas.c                     | 24 ++++++
 target/loongarch/insn_trans/trans_lasx.c.inc | 80 ++++++++++++++++++++
 target/loongarch/insn_trans/trans_lsx.c.inc  | 54 ++++++-------
 4 files changed, 149 insertions(+), 27 deletions(-)

diff --git a/target/loongarch/insns.decode b/target/loongarch/insns.decode
index 64b67ee9ac..64b308f9fb 100644
--- a/target/loongarch/insns.decode
+++ b/target/loongarch/insns.decode
@@ -550,6 +550,10 @@ dbcl             0000 00000010 10101 ...............      @i15
 @vr_i8i2      .... ........ imm2:2 ........ rj:5 vd:5    &vr_ii imm=%i8s2
 @vr_i8i3       .... ....... imm2:3 ........ rj:5 vd:5    &vr_ii imm=%i8s1
 @vr_i8i4          .... ...... imm2:4 imm:s8 rj:5 vd:5    &vr_ii
+@vr_i8i2x     .... ........ imm2:2 ........ rj:5 vd:5    &vr_ii imm=%i8s3
+@vr_i8i3x      .... ....... imm2:3 ........ rj:5 vd:5    &vr_ii imm=%i8s2
+@vr_i8i4x       .... ...... imm2:4 ........ rj:5 vd:5    &vr_ii imm=%i8s1
+@vr_i8i5x          .... ..... imm2:5 imm:s8 rj:5 vd:5    &vr_ii
 @vrr               .... ........ ..... rk:5 rj:5 vd:5    &vrr
 @v_i13                   .... ........ .. imm:13 vd:5    &v_i
 
@@ -2060,3 +2064,17 @@ xvextrins_d      0111 01111000 00 ........ ..... .....    @vv_ui8
 xvextrins_w      0111 01111000 01 ........ ..... .....    @vv_ui8
 xvextrins_h      0111 01111000 10 ........ ..... .....    @vv_ui8
 xvextrins_b      0111 01111000 11 ........ ..... .....    @vv_ui8
+
+xvld             0010 110010 ............ ..... .....     @vr_i12
+xvst             0010 110011 ............ ..... .....     @vr_i12
+xvldx            0011 10000100 10000 ..... ..... .....    @vrr
+xvstx            0011 10000100 11000 ..... ..... .....    @vrr
+
+xvldrepl_d       0011 00100001 0 ......... ..... .....    @vr_i9
+xvldrepl_w       0011 00100010 .......... ..... .....     @vr_i10
+xvldrepl_h       0011 0010010 ........... ..... .....     @vr_i11
+xvldrepl_b       0011 001010 ............ ..... .....     @vr_i12
+xvstelm_d        0011 00110001 .. ........ ..... .....    @vr_i8i2x
+xvstelm_w        0011 0011001 ... ........ ..... .....    @vr_i8i3x
+xvstelm_h        0011 001101 .... ........ ..... .....    @vr_i8i4x
+xvstelm_b        0011 00111 ..... ........ ..... .....    @vr_i8i5x
diff --git a/target/loongarch/disas.c b/target/loongarch/disas.c
index a518c59772..e5fb362d7f 100644
--- a/target/loongarch/disas.c
+++ b/target/loongarch/disas.c
@@ -1753,6 +1753,16 @@ static void output_vvr_x(DisasContext *ctx, arg_vvr *a, const char *mnemonic)
     output(ctx, mnemonic, "x%d, x%d, r%d", a->vd, a->vj, a->rk);
 }
 
+static void output_vrr_x(DisasContext *ctx, arg_vrr *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, r%d, r%d", a->vd, a->rj, a->rk);
+}
+
+static void output_vr_ii_x(DisasContext *ctx, arg_vr_ii *a, const char *mnemonic)
+{
+    output(ctx, mnemonic, "x%d, r%d, 0x%x, 0x%x", a->vd, a->rj, a->imm, a->imm2);
+}
+
 INSN_LASX(xvadd_b,           vvv)
 INSN_LASX(xvadd_h,           vvv)
 INSN_LASX(xvadd_w,           vvv)
@@ -2596,3 +2606,17 @@ INSN_LASX(xvextrins_d,       vv_i)
 INSN_LASX(xvextrins_w,       vv_i)
 INSN_LASX(xvextrins_h,       vv_i)
 INSN_LASX(xvextrins_b,       vv_i)
+
+INSN_LASX(xvld,              vr_i)
+INSN_LASX(xvst,              vr_i)
+INSN_LASX(xvldx,             vrr)
+INSN_LASX(xvstx,             vrr)
+
+INSN_LASX(xvldrepl_d,        vr_i)
+INSN_LASX(xvldrepl_w,        vr_i)
+INSN_LASX(xvldrepl_h,        vr_i)
+INSN_LASX(xvldrepl_b,        vr_i)
+INSN_LASX(xvstelm_d,         vr_ii)
+INSN_LASX(xvstelm_w,         vr_ii)
+INSN_LASX(xvstelm_h,         vr_ii)
+INSN_LASX(xvstelm_b,         vr_ii)
diff --git a/target/loongarch/insn_trans/trans_lasx.c.inc b/target/loongarch/insn_trans/trans_lasx.c.inc
index ebbbc5a6bb..b44e9e6d77 100644
--- a/target/loongarch/insn_trans/trans_lasx.c.inc
+++ b/target/loongarch/insn_trans/trans_lasx.c.inc
@@ -972,3 +972,83 @@ TRANS(xvextrins_b, LASX, gen_vv_i, 32, gen_helper_vextrins_b)
 TRANS(xvextrins_h, LASX, gen_vv_i, 32, gen_helper_vextrins_h)
 TRANS(xvextrins_w, LASX, gen_vv_i, 32, gen_helper_vextrins_w)
 TRANS(xvextrins_d, LASX, gen_vv_i, 32, gen_helper_vextrins_d)
+
+static bool gen_lasx_memory(DisasContext *ctx, arg_vr_i *a,
+                            void (*func)(DisasContext *, int, TCGv))
+{
+    TCGv addr = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv temp = NULL;
+
+    CHECK_VEC;
+
+    if (a->imm) {
+        temp = tcg_temp_new();
+        tcg_gen_addi_tl(temp, addr, a->imm);
+        addr = temp;
+    }
+
+    func(ctx, a->vd, addr);
+    return true;
+}
+
+static void gen_xvld(DisasContext *ctx, int vreg, TCGv addr)
+{
+    int i;
+    TCGv temp = tcg_temp_new();
+    TCGv dest = tcg_temp_new();
+
+    tcg_gen_qemu_ld_i64(dest, addr, ctx->mem_idx, MO_TEUQ);
+    set_vreg64(dest, vreg, 0);
+
+    for (i = 1; i < 4; i++) {
+        tcg_gen_addi_tl(temp, addr, 8 * i);
+        tcg_gen_qemu_ld_i64(dest, temp, ctx->mem_idx, MO_TEUQ);
+        set_vreg64(dest, vreg, i);
+    }
+}
+
+static void gen_xvst(DisasContext * ctx, int vreg, TCGv addr)
+{
+    int i;
+    TCGv temp = tcg_temp_new();
+    TCGv dest = tcg_temp_new();
+
+    get_vreg64(dest, vreg, 0);
+    tcg_gen_qemu_st_i64(dest, addr, ctx->mem_idx, MO_TEUQ);
+
+    for (i = 1; i < 4; i++) {
+        tcg_gen_addi_tl(temp, addr, 8 * i);
+        get_vreg64(dest, vreg, i);
+        tcg_gen_qemu_st_i64(dest, temp, ctx->mem_idx, MO_TEUQ);
+    }
+}
+
+TRANS(xvld, LASX, gen_lasx_memory, gen_xvld)
+TRANS(xvst, LASX, gen_lasx_memory, gen_xvst)
+
+static bool gen_lasx_memoryx(DisasContext *ctx, arg_vrr *a,
+                             void (*func)(DisasContext*, int, TCGv))
+{
+    TCGv src1 = gpr_src(ctx, a->rj, EXT_NONE);
+    TCGv src2 = gpr_src(ctx, a->rk, EXT_NONE);
+    TCGv addr = tcg_temp_new();
+
+    CHECK_VEC;
+
+    tcg_gen_add_tl(addr, src1, src2);
+    func(ctx, a->vd, addr);
+
+    return true;
+}
+
+TRANS(xvldx, LASX, gen_lasx_memoryx, gen_xvld)
+TRANS(xvstx, LASX, gen_lasx_memoryx, gen_xvst)
+
+TRANS(xvldrepl_b, LASX, do_vldrepl, 32, MO_8)
+TRANS(xvldrepl_h, LASX, do_vldrepl, 32, MO_16)
+TRANS(xvldrepl_w, LASX, do_vldrepl, 32, MO_32)
+TRANS(xvldrepl_d, LASX, do_vldrepl, 32, MO_64)
+VSTELM(xvstelm_b, MO_8, B)
+VSTELM(xvstelm_h, MO_16, H)
+VSTELM(xvstelm_w, MO_32, W)
+VSTELM(xvstelm_d, MO_64, D)
diff --git a/target/loongarch/insn_trans/trans_lsx.c.inc b/target/loongarch/insn_trans/trans_lsx.c.inc
index 4abb03485a..86f333981c 100644
--- a/target/loongarch/insn_trans/trans_lsx.c.inc
+++ b/target/loongarch/insn_trans/trans_lsx.c.inc
@@ -4553,33 +4553,33 @@ static bool trans_vstx(DisasContext *ctx, arg_vrr *a)
     return true;
 }
 
-#define VLDREPL(NAME, MO)                                                 \
-static bool trans_## NAME (DisasContext *ctx, arg_vr_i *a)                \
-{                                                                         \
-    TCGv addr;                                                            \
-    TCGv_i64 val;                                                         \
-                                                                          \
-    if (!avail_LSX(ctx)) {                                                \
-        return false;                                                     \
-    }                                                                     \
-                                                                          \
-    CHECK_VEC;                                                            \
-                                                                          \
-    addr = gpr_src(ctx, a->rj, EXT_NONE);                                 \
-    val = tcg_temp_new_i64();                                             \
-                                                                          \
-    addr = make_address_i(ctx, addr, a->imm);                             \
-                                                                          \
-    tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, MO);                     \
-    tcg_gen_gvec_dup_i64(MO, vec_full_offset(a->vd), 16, ctx->vl/8, val); \
-                                                                          \
-    return true;                                                          \
-}
-
-VLDREPL(vldrepl_b, MO_8)
-VLDREPL(vldrepl_h, MO_16)
-VLDREPL(vldrepl_w, MO_32)
-VLDREPL(vldrepl_d, MO_64)
+static bool do_vldrepl(DisasContext *ctx, arg_vr_i * a,
+                       uint32_t oprsz, MemOp mop)
+{
+    TCGv addr, temp;
+    TCGv_i64 val;
+
+    CHECK_VEC;
+
+    addr = gpr_src(ctx, a->rj, EXT_NONE);
+    val = tcg_temp_new_i64();
+
+    if (a->imm) {
+        temp = tcg_temp_new();
+        tcg_gen_addi_tl(temp, addr, a->imm);
+        addr = temp;
+    }
+
+    tcg_gen_qemu_ld_i64(val, addr, ctx->mem_idx, mop);
+    tcg_gen_gvec_dup_i64(mop, vec_full_offset(a->vd), oprsz, ctx->vl / 8, val);
+
+    return true;
+}
+
+TRANS(vldrepl_b, LSX, do_vldrepl, 16, MO_8)
+TRANS(vldrepl_h, LSX, do_vldrepl, 16, MO_16)
+TRANS(vldrepl_w, LSX, do_vldrepl, 16, MO_32)
+TRANS(vldrepl_d, LSX, do_vldrepl, 16, MO_64)
 
 #define VSTELM(NAME, MO, E)                                                  \
 static bool trans_## NAME (DisasContext *ctx, arg_vr_ii *a)                  \
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v4 48/48] target/loongarch: CPUCFG support LASX
  2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
                   ` (46 preceding siblings ...)
  2023-08-30  8:49 ` [PATCH v4 47/48] target/loongarch: Implement xvld xvst Song Gao
@ 2023-08-30  8:49 ` Song Gao
  2023-08-31  0:38   ` Richard Henderson
  47 siblings, 1 reply; 86+ messages in thread
From: Song Gao @ 2023-08-30  8:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: richard.henderson

Signed-off-by: Song Gao <gaosong@loongson.cn>
---
 target/loongarch/cpu.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/target/loongarch/cpu.c b/target/loongarch/cpu.c
index 4deae22104..e03f71222a 100644
--- a/target/loongarch/cpu.c
+++ b/target/loongarch/cpu.c
@@ -392,6 +392,7 @@ static void loongarch_la464_initfn(Object *obj)
     data = FIELD_DP32(data, CPUCFG2, FP_DP, 1);
     data = FIELD_DP32(data, CPUCFG2, FP_VER, 1);
     data = FIELD_DP32(data, CPUCFG2, LSX, 1),
+    data = FIELD_DP32(data, CPUCFG2, LASX, 1),
     data = FIELD_DP32(data, CPUCFG2, LLFTP, 1);
     data = FIELD_DP32(data, CPUCFG2, LLFTP_VER, 1);
     data = FIELD_DP32(data, CPUCFG2, LSPW, 1);
-- 
2.39.1



^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v4 48/48] target/loongarch: CPUCFG support LASX
  2023-08-30  8:49 ` [PATCH v4 48/48] target/loongarch: CPUCFG support LASX Song Gao
@ 2023-08-31  0:38   ` Richard Henderson
  0 siblings, 0 replies; 86+ messages in thread
From: Richard Henderson @ 2023-08-31  0:38 UTC (permalink / raw)
  To: Song Gao, qemu-devel

On 8/30/23 01:49, Song Gao wrote:
> Signed-off-by: Song Gao<gaosong@loongson.cn>
> ---
>   target/loongarch/cpu.c | 1 +
>   1 file changed, 1 insertion(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2023-08-31 15:07 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-30  8:48 [PATCH v4 00/48] Add LoongArch LASX instructions Song Gao
2023-08-30  8:48 ` [PATCH v4 01/48] target/loongarch: Add LASX data support Song Gao
2023-08-30  8:48 ` [PATCH v4 02/48] target/loongarch: meson.build support build LASX Song Gao
2023-08-30  8:48 ` [PATCH v4 03/48] target/loongarch: Add CHECK_ASXE maccro for check LASX enable Song Gao
2023-08-30  8:48 ` [PATCH v4 04/48] target/loongarch: Add avail_LASX to check LASX instructions Song Gao
2023-08-30 14:20   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 05/48] target/loongarch: Implement xvadd/xvsub Song Gao
2023-08-30 15:38   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 06/48] target/loongarch: Implement xvreplgr2vr Song Gao
2023-08-30 16:09   ` Richard Henderson
2023-08-31  7:17     ` gaosong
2023-08-30  8:48 ` [PATCH v4 07/48] target/loongarch: Implement xvaddi/xvsubi Song Gao
2023-08-30  8:48 ` [PATCH v4 08/48] target/loongarch: Implement xvneg Song Gao
2023-08-30  8:48 ` [PATCH v4 09/48] target/loongarch: Implement xvsadd/xvssub Song Gao
2023-08-30  8:48 ` [PATCH v4 10/48] target/loongarch: rename lsx_helper.c to vec_helper.c Song Gao
2023-08-30 18:06   ` Richard Henderson
2023-08-31  7:17     ` gaosong
2023-08-30  8:48 ` [PATCH v4 11/48] target/loongarch: Implement xvhaddw/xvhsubw Song Gao
2023-08-30 18:12   ` Richard Henderson
2023-08-31  7:17     ` gaosong
2023-08-31 15:06       ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 12/48] target/loongarch: Implement xvaddw/xvsubw Song Gao
2023-08-30  8:48 ` [PATCH v4 13/48] target/loongarch: Implement xavg/xvagr Song Gao
2023-08-30 18:14   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 14/48] target/loongarch: Implement xvabsd Song Gao
2023-08-30  8:48 ` [PATCH v4 15/48] target/loongarch: Implement xvadda Song Gao
2023-08-30 20:45   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 16/48] target/loongarch: Implement xvmax/xvmin Song Gao
2023-08-30 20:50   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 17/48] target/loongarch: Implement xvmul/xvmuh/xvmulw{ev/od} Song Gao
2023-08-30 18:23   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 18/48] target/loongarch: Implement xvmadd/xvmsub/xvmaddw{ev/od} Song Gao
2023-08-30 21:05   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 19/48] target/loongarch; Implement xvdiv/xvmod Song Gao
2023-08-30 22:14   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 20/48] target/loongarch: Implement xvsat Song Gao
2023-08-30 22:19   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 21/48] target/loongarch: Implement xvexth Song Gao
2023-08-30 22:34   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 22/48] target/loongarch: Implement vext2xv Song Gao
2023-08-30 22:36   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 23/48] target/loongarch: Implement xvsigncov Song Gao
2023-08-30  8:48 ` [PATCH v4 24/48] target/loongarch: Implement xvmskltz/xvmskgez/xvmsknz Song Gao
2023-08-30 22:44   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 25/48] target/loognarch: Implement xvldi Song Gao
2023-08-30  8:48 ` [PATCH v4 26/48] target/loongarch: Implement LASX logic instructions Song Gao
2023-08-30 22:46   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 27/48] target/loongarch: Implement xvsll xvsrl xvsra xvrotr Song Gao
2023-08-30  8:48 ` [PATCH v4 28/48] target/loongarch: Implement xvsllwil xvextl Song Gao
2023-08-30 22:52   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 29/48] target/loongarch: Implement xvsrlr xvsrar Song Gao
2023-08-30 22:54   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 30/48] target/loongarch: Implement xvsrln xvsran Song Gao
2023-08-30 22:57   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 31/48] target/loongarch: Implement xvsrlrn xvsrarn Song Gao
2023-08-30 23:00   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 32/48] target/loongarch: Implement xvssrln xvssran Song Gao
2023-08-30 23:22   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 33/48] target/loongarch: Implement xvssrlrn xvssrarn Song Gao
2023-08-30 23:26   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 34/48] target/loongarch: Implement xvclo xvclz Song Gao
2023-08-30 23:27   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 35/48] target/loongarch: Implement xvpcnt Song Gao
2023-08-30 23:28   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 36/48] target/loongarch: Implement xvbitclr xvbitset xvbitrev Song Gao
2023-08-30 23:30   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 37/48] target/loongarch: Implement xvfrstp Song Gao
2023-08-30 23:34   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 38/48] target/loongarch: Implement LASX fpu arith instructions Song Gao
2023-08-30 23:37   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 39/48] target/loongarch: Implement LASX fpu fcvt instructions Song Gao
2023-08-30 23:40   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 40/48] target/loongarch: Implement xvseq xvsle xvslt Song Gao
2023-08-30 23:41   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 41/48] target/loongarch: Implement xvfcmp Song Gao
2023-08-31  0:30   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 42/48] target/loongarch: Implement xvbitsel xvset Song Gao
2023-08-31  0:32   ` Richard Henderson
2023-08-30  8:48 ` [PATCH v4 43/48] target/loongarch: Implement xvinsgr2vr xvpickve2gr Song Gao
2023-08-30  8:48 ` [PATCH v4 44/48] target/loongarch: Implement xvreplve xvinsve0 xvpickve xvb{sll/srl}v Song Gao
2023-08-30  8:48 ` [PATCH v4 45/48] target/loongarch: Implement xvpack xvpick xvilv{l/h} Song Gao
2023-08-31  0:35   ` Richard Henderson
2023-08-30  8:49 ` [PATCH v4 46/48] target/loongarch: Implement xvshuf xvperm{i} xvshuf4i xvextrins Song Gao
2023-08-30  8:49 ` [PATCH v4 47/48] target/loongarch: Implement xvld xvst Song Gao
2023-08-30  8:49 ` [PATCH v4 48/48] target/loongarch: CPUCFG support LASX Song Gao
2023-08-31  0:38   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).