[PATCH v2 00/14] tcg/riscv: Add support for vector

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/14] tcg/riscv: Add support for vector
@ 2024-08-30  6:15 LIU Zhiwei
  2024-08-30  6:15 ` [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
                   ` (13 more replies)
  0 siblings, 14 replies; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

This patch set introduces support for the RISC-V vector extension
in TCG backend for RISC-V targets.

v2:
  1. Remove [PATCH v1 03/15] and use a simpler approach with fixed
  constraints at initialization in the backend instead of modifying
  register allocation constraints in tcg.c. See details in 2.

  2. Change the available vector registers, and the constraint registers
  which are defined in "tcg-target-con-str.h", to the minimum for all
  vector instructions; for TCG_TYPE_V256 (vlen=64, lmul=4, omitting v0),
  max 7 registers, and TCG_TYPE_V128/64 also use only 7.

  3. Remove all inline markers; let the compiler decide.

  4. Merge thread variables to use only prev_vtype.

  5. Increase use of whole load/store instructions when bit width ≥ vlen.

  6. Add vmv<nr>r.v to move registers when bit width ≥ vlen.

  7. Configure vtype with separate operation length (vl) and element
  width (SEW); IRs not changing SEW should inherit SEW from the previous IR
  (e.g., load/store/move/dup/dupm, and/or/xor/not). Place set_vec_config
  in the switch.

  8. Change vlen detection to include element width.

  9. Use neg instead of sub in cmp_vec expand.

  10. Complete all expansions in cmp_vec's expand function, not in
  tcg_out_vec_op.

  11. Move some asserts into tcg_out_opc_*.

  12. Remove the check for riscv_vlen >= 64 when enabling vector support. 

  13. Move not_vec to patch 7.

  14. Change OPC_VRSUB_VX to OPC_VRSUB_VI

  15. Move the vsetvli out of the SIGILL probe.

  16. Move the initialization of the riscv_vlen to the cpuinfo_init.

v1:
  https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00205.html

Swung0x48 (1):
  tcg/riscv: Add basic support for vector

TANG Tiancheng (13):
  tcg/op-gvec: Fix iteration step in 32-bit operation
  util: Add RISC-V vector extension probe in cpuinfo
  tcg/riscv: Add riscv vset{i}vli support
  tcg/riscv: Implement vector load/store
  tcg/riscv: Implement vector mov/dup{m/i}
  tcg/riscv: Add support for basic vector opcodes
  tcg/riscv: Implement vector cmp ops
  tcg/riscv: Implement vector neg ops
  tcg/riscv: Implement vector sat/mul ops
  tcg/riscv: Implement vector min/max ops
  tcg/riscv: Implement vector shs/v ops
  tcg/riscv: Implement vector roti/v/x shi ops
  tcg/riscv: Enable native vector support for TCG host

 host/include/riscv/host/cpuinfo.h |   2 +
 tcg/riscv/tcg-target-con-set.h    |   9 +
 tcg/riscv/tcg-target-con-str.h    |   2 +
 tcg/riscv/tcg-target.c.inc        | 961 ++++++++++++++++++++++++++++--
 tcg/riscv/tcg-target.h            |  80 ++-
 tcg/riscv/tcg-target.opc.h        |  20 +
 tcg/tcg-op-gvec.c                 |   2 +-
 util/cpuinfo-riscv.c              |  26 +-
 8 files changed, 1032 insertions(+), 70 deletions(-)
 create mode 100644 tcg/riscv/tcg-target.opc.h

-- 
2.43.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
@ 2024-08-30  6:15 ` LIU Zhiwei
  2024-08-31 23:59   ` Richard Henderson
  2024-08-30  6:15 ` [PATCH v2 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

The loop in the 32-bit case of the vector compare operation
was incorrectly incrementing by 8 bytes per iteration instead
of 4 bytes. This caused the function to process only half of
the intended elements.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Fixes: 9622c697d1 (tcg: Add gvec compare with immediate and scalar operand)
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/tcg-op-gvec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0308732d9b..78ee1ced80 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3939,7 +3939,7 @@ void tcg_gen_gvec_cmps(TCGCond cond, unsigned vece, uint32_t dofs,
         uint32_t i;
 
         tcg_gen_extrl_i64_i32(t1, c);
-        for (i = 0; i < oprsz; i += 8) {
+        for (i = 0; i < oprsz; i += 4) {
             tcg_gen_ld_i32(t0, tcg_env, aofs + i);
             tcg_gen_negsetcond_i32(cond, t0, t0, t1);
             tcg_gen_st_i32(t0, tcg_env, dofs + i);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation
  2024-08-30  6:15 ` [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
@ 2024-08-31 23:59   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-08-31 23:59 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:15, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> The loop in the 32-bit case of the vector compare operation
> was incorrectly incrementing by 8 bytes per iteration instead
> of 4 bytes. This caused the function to process only half of
> the intended elements.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Fixes: 9622c697d1 (tcg: Add gvec compare with immediate and scalar operand)
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/tcg-op-gvec.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~

> 
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 0308732d9b..78ee1ced80 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -3939,7 +3939,7 @@ void tcg_gen_gvec_cmps(TCGCond cond, unsigned vece, uint32_t dofs,
>           uint32_t i;
>   
>           tcg_gen_extrl_i64_i32(t1, c);
> -        for (i = 0; i < oprsz; i += 8) {
> +        for (i = 0; i < oprsz; i += 4) {
>               tcg_gen_ld_i32(t0, tcg_env, aofs + i);
>               tcg_gen_negsetcond_i32(cond, t0, t0, t1);
>               tcg_gen_st_i32(t0, tcg_env, dofs + i);



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
  2024-08-30  6:15 ` [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
@ 2024-08-30  6:15 ` LIU Zhiwei
  2024-09-02  0:12   ` Richard Henderson
  2024-08-30  6:15 ` [PATCH v2 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Add support for probing RISC-V vector extension availability in
the backend. This information will be used when deciding whether
to use vector instructions in code generation.

While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
we use RISCV_HWPROBE_IMA_V instead.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 host/include/riscv/host/cpuinfo.h |  2 ++
 util/cpuinfo-riscv.c              | 26 ++++++++++++++++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/host/include/riscv/host/cpuinfo.h b/host/include/riscv/host/cpuinfo.h
index 2b00660e36..48ad6cc3f8 100644
--- a/host/include/riscv/host/cpuinfo.h
+++ b/host/include/riscv/host/cpuinfo.h
@@ -10,9 +10,11 @@
 #define CPUINFO_ZBA             (1u << 1)
 #define CPUINFO_ZBB             (1u << 2)
 #define CPUINFO_ZICOND          (1u << 3)
+#define CPUINFO_ZVE64X          (1u << 4)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
+extern unsigned riscv_vlen;
 
 /*
  * We cannot rely on constructor ordering, so other constructors must
diff --git a/util/cpuinfo-riscv.c b/util/cpuinfo-riscv.c
index 497ce12680..780ba59856 100644
--- a/util/cpuinfo-riscv.c
+++ b/util/cpuinfo-riscv.c
@@ -12,6 +12,7 @@
 #endif
 
 unsigned cpuinfo;
+unsigned riscv_vlen;
 static volatile sig_atomic_t got_sigill;
 
 static void sigill_handler(int signo, siginfo_t *si, void *data)
@@ -33,7 +34,7 @@ static void sigill_handler(int signo, siginfo_t *si, void *data)
 /* Called both as constructor and (possibly) via other constructors. */
 unsigned __attribute__((constructor)) cpuinfo_init(void)
 {
-    unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND;
+    unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND | CPUINFO_ZVE64X;
     unsigned info = cpuinfo;
 
     if (info) {
@@ -49,6 +50,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 #endif
 #if defined(__riscv_arch_test) && defined(__riscv_zicond)
     info |= CPUINFO_ZICOND;
+#endif
+#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
+    info |= CPUINFO_ZVE64X;
 #endif
     left &= ~info;
 
@@ -64,7 +68,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
             && pair.key >= 0) {
             info |= pair.value & RISCV_HWPROBE_EXT_ZBA ? CPUINFO_ZBA : 0;
             info |= pair.value & RISCV_HWPROBE_EXT_ZBB ? CPUINFO_ZBB : 0;
-            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
+            info |= pair.value & RISCV_HWPROBE_IMA_V ? CPUINFO_ZVE64X : 0;
+            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
 #ifdef RISCV_HWPROBE_EXT_ZICOND
             info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ? CPUINFO_ZICOND : 0;
             left &= ~CPUINFO_ZICOND;
@@ -112,6 +117,23 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
         assert(left == 0);
     }
 
+    if (info & CPUINFO_ZVE64X) {
+        /*
+         * Get vlen for Vector.
+         * VLMAX = LMUL * VLEN / SEW.
+         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = VLMAX",
+         * so "vlen = VLMAX * 64".
+         */
+        unsigned long vlmax = 0;
+        asm("vsetvli %0, x0, e64" : "=r"(vlmax));
+        if (vlmax) {
+            riscv_vlen = vlmax * 64;
+            assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen - 1)));
+        } else {
+            info &= ~CPUINFO_ZVE64X;
+        }
+    }
+
     info |= CPUINFO_ALWAYS;
     cpuinfo = info;
     return info;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-08-30  6:15 ` [PATCH v2 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-09-02  0:12   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-02  0:12 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:15, LIU Zhiwei wrote:
> +extern unsigned riscv_vlen;

Do you really want to store vlen and not vlenb?
It seems that would simplify some of your computation in the tcg backend.


> @@ -49,6 +50,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
>   #endif
>   #if defined(__riscv_arch_test) && defined(__riscv_zicond)
>       info |= CPUINFO_ZICOND;
> +#endif
> +#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
> +    info |= CPUINFO_ZVE64X;
>   #endif
>       left &= ~info;
>   
> @@ -64,7 +68,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
>               && pair.key >= 0) {
>               info |= pair.value & RISCV_HWPROBE_EXT_ZBA ? CPUINFO_ZBA : 0;
>               info |= pair.value & RISCV_HWPROBE_EXT_ZBB ? CPUINFO_ZBB : 0;
> -            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
> +            info |= pair.value & RISCV_HWPROBE_IMA_V ? CPUINFO_ZVE64X : 0;
> +            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
>   #ifdef RISCV_HWPROBE_EXT_ZICOND
>               info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ? CPUINFO_ZICOND : 0;
>               left &= ~CPUINFO_ZICOND;
> @@ -112,6 +117,23 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
>           assert(left == 0);
>       }
>   
> +    if (info & CPUINFO_ZVE64X) {
> +        /*
> +         * Get vlen for Vector.
> +         * VLMAX = LMUL * VLEN / SEW.
> +         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = VLMAX",
> +         * so "vlen = VLMAX * 64".
> +         */
> +        unsigned long vlmax = 0;
> +        asm("vsetvli %0, x0, e64" : "=r"(vlmax));

This doesn't compile, surely.

s/x0/zero/ is surely the minimum change, but this still won't work unless V is enabled at 
compile-time.

You need to use the .insn form, like we do for the Zba probe above:

	.insn i 0x57, 7, %0, zero, 3 << 3

I have verified that RISCV_HWPROBE_IMA_V went into the linux kernel at the same time as 
vector support, so this probing is complete and sufficient.


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 03/14] tcg/riscv: Add basic support for vector
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
  2024-08-30  6:15 ` [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
  2024-08-30  6:15 ` [PATCH v2 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-08-30  6:15 ` LIU Zhiwei
  2024-09-02  0:28   ` Richard Henderson
  2024-08-30  6:15 ` [PATCH v2 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, Swung0x48,
	TANG Tiancheng

From: Swung0x48 <swung0x48@outlook.com>

The RISC-V vector instruction set utilizes the LMUL field to group
multiple registers, enabling variable-length vector registers. This
implementation uses only the first register number of each group while
reserving the other register numbers within the group.

In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
host runtime needs to adjust LMUL based on the type to use different
register groups.

This presents challenges for TCG's register allocation. Currently, we
avoid modifying the register allocation part of TCG and only expose the
minimum number of vector registers.

For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
LMUL equal to 4, we use 4 vector registers as one register group. We can
use a maximum of 8 register groups, but the V0 register number is reserved
as a mask register, so we can effectively use at most 7 register groups.
Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
forced to be used. This is because TCG cannot yet dynamically constrain
registers with type; likewise, when the host vlen is 128 bits and
TCG_TYPE_V256, we can use at most 15 registers.

There is not much pressure on vector register allocation in TCG now, so
using 7 registers is feasible and will not have a major impact on code
generation.

This patch:
1. Reserves vector register 0 for use as a mask register.
2. When using register groups, reserves the additional registers within
   each group.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-str.h |   1 +
 tcg/riscv/tcg-target.c.inc     | 131 +++++++++++++++++++++++++--------
 tcg/riscv/tcg-target.h         |  78 +++++++++++---------
 tcg/riscv/tcg-target.opc.h     |  12 +++
 4 files changed, 157 insertions(+), 65 deletions(-)
 create mode 100644 tcg/riscv/tcg-target.opc.h

diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index d5c419dff1..21c4a0a0e0 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -9,6 +9,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('v', GET_VREG_SET(riscv_vlen))
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index d334857226..5ef1538aed 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -32,38 +32,14 @@
 
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
-    "zero",
-    "ra",
-    "sp",
-    "gp",
-    "tp",
-    "t0",
-    "t1",
-    "t2",
-    "s0",
-    "s1",
-    "a0",
-    "a1",
-    "a2",
-    "a3",
-    "a4",
-    "a5",
-    "a6",
-    "a7",
-    "s2",
-    "s3",
-    "s4",
-    "s5",
-    "s6",
-    "s7",
-    "s8",
-    "s9",
-    "s10",
-    "s11",
-    "t3",
-    "t4",
-    "t5",
-    "t6"
+    "zero", "ra",  "sp",  "gp",  "tp",  "t0",  "t1",  "t2",
+    "s0",   "s1",  "a0",  "a1",  "a2",  "a3",  "a4",  "a5",
+    "a6",   "a7",  "s2",  "s3",  "s4",  "s5",  "s6",  "s7",
+    "s8",   "s9",  "s10", "s11", "t3",  "t4",  "t5",  "t6",
+    "v0",   "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
+    "v8",   "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+    "v16",  "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+    "v24",  "v25", "v26", "v27", "v28", "v29", "v30", "v31",
 };
 #endif
 
@@ -100,6 +76,16 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_A5,
     TCG_REG_A6,
     TCG_REG_A7,
+
+    /* Vector registers and TCG_REG_V0 reserved for mask. */
+    TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,  TCG_REG_V4,
+    TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,  TCG_REG_V8,
+    TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11, TCG_REG_V12,
+    TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, TCG_REG_V16,
+    TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, TCG_REG_V20,
+    TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, TCG_REG_V24,
+    TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, TCG_REG_V28,
+    TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -127,6 +113,12 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_J12  0x1000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
+#define ALL_DVECTOR_REG_GROUPS 0x5555555400000000
+#define ALL_QVECTOR_REG_GROUPS 0x1111111000000000
+#define GET_VREG_SET(vlen) (vlen == 64 ? ALL_QVECTOR_REG_GROUPS : \
+                             (vlen == 128 ? ALL_DVECTOR_REG_GROUPS : \
+                              ALL_VECTOR_REGS))
 
 #define sextreg  sextract64
 
@@ -475,6 +467,43 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
     }
 }
 
+/*
+ * RISC-V vector instruction emitters
+ */
+
+/* Vector registers uses the same 5 lower bits as GPR registers. */
+static void tcg_out_opc_reg_vec(TCGContext *s, RISCVInsn opc,
+                                TCGReg d, TCGReg s1, TCGReg s2, bool vm)
+{
+    tcg_out32(s, encode_r(opc, d, s1, s2) | (vm << 25));
+}
+
+static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
+                                  TCGReg rd, TCGArg imm, TCGReg vs2, bool vm)
+{
+    tcg_out32(s, encode_r(opc, rd, (imm & 0x1f), vs2) | (vm << 25));
+}
+
+/* vm=0 (vm = false) means vector masking ENABLED. */
+#define tcg_out_opc_vv(s, opc, vd, vs2, vs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
+
+/*
+ * In RISC-V, vs2 is the first operand, while rs1/imm is the
+ * second operand.
+ */
+#define tcg_out_opc_vx(s, opc, vd, vs2, rs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vd, rs1, vs2, vm);
+
+#define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
+    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
+
+/*
+ * Only unit-stride addressing implemented; may extend in future.
+ */
+#define tcg_out_opc_ldst_vec(s, opc, vs3_vd, rs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vs3_vd, rs1, 0, vm);
+
 /*
  * TCG intrinsics
  */
@@ -1881,6 +1910,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     }
 }
 
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+                           unsigned vecl, unsigned vece,
+                           const TCGArg args[TCG_MAX_OP_ARGS],
+                           const int const_args[TCG_MAX_OP_ARGS])
+{
+    switch (opc) {
+    case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
+    default:
+        g_assert_not_reached();
+    }
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+                       TCGArg a0, ...)
+{
+    switch (opc) {
+    default:
+        g_assert_not_reached();
+    }
+}
+
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+    switch (opc) {
+    default:
+        return 0;
+    }
+}
+
 static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 {
     switch (op) {
@@ -2101,6 +2160,13 @@ static void tcg_target_init(TCGContext *s)
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
 
+    if (cpuinfo & CPUINFO_ZVE64X) {
+        TCGRegSet vector_regs = GET_VREG_SET(riscv_vlen);
+        tcg_target_available_regs[TCG_TYPE_V64] = vector_regs;
+        tcg_target_available_regs[TCG_TYPE_V128] = vector_regs;
+        tcg_target_available_regs[TCG_TYPE_V256] = vector_regs;
+    }
+
     tcg_target_call_clobber_regs = -1u;
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S0);
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S1);
@@ -2123,6 +2189,7 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_V0);
 }
 
 typedef struct {
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1a347eaf6e..12a7a37aaa 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -28,42 +28,28 @@
 #include "host/cpuinfo.h"
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 typedef enum {
-    TCG_REG_ZERO,
-    TCG_REG_RA,
-    TCG_REG_SP,
-    TCG_REG_GP,
-    TCG_REG_TP,
-    TCG_REG_T0,
-    TCG_REG_T1,
-    TCG_REG_T2,
-    TCG_REG_S0,
-    TCG_REG_S1,
-    TCG_REG_A0,
-    TCG_REG_A1,
-    TCG_REG_A2,
-    TCG_REG_A3,
-    TCG_REG_A4,
-    TCG_REG_A5,
-    TCG_REG_A6,
-    TCG_REG_A7,
-    TCG_REG_S2,
-    TCG_REG_S3,
-    TCG_REG_S4,
-    TCG_REG_S5,
-    TCG_REG_S6,
-    TCG_REG_S7,
-    TCG_REG_S8,
-    TCG_REG_S9,
-    TCG_REG_S10,
-    TCG_REG_S11,
-    TCG_REG_T3,
-    TCG_REG_T4,
-    TCG_REG_T5,
-    TCG_REG_T6,
+    TCG_REG_ZERO, TCG_REG_RA,  TCG_REG_SP,  TCG_REG_GP,
+    TCG_REG_TP,   TCG_REG_T0,  TCG_REG_T1,  TCG_REG_T2,
+    TCG_REG_S0,   TCG_REG_S1,  TCG_REG_A0,  TCG_REG_A1,
+    TCG_REG_A2,   TCG_REG_A3,  TCG_REG_A4,  TCG_REG_A5,
+    TCG_REG_A6,   TCG_REG_A7,  TCG_REG_S2,  TCG_REG_S3,
+    TCG_REG_S4,   TCG_REG_S5,  TCG_REG_S6,  TCG_REG_S7,
+    TCG_REG_S8,   TCG_REG_S9,  TCG_REG_S10, TCG_REG_S11,
+    TCG_REG_T3,   TCG_REG_T4,  TCG_REG_T5,  TCG_REG_T6,
+
+    /* RISC-V V Extension registers */
+    TCG_REG_V0,   TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+    TCG_REG_V4,   TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
+    TCG_REG_V8,   TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+    TCG_REG_V12,  TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+    TCG_REG_V16,  TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+    TCG_REG_V20,  TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+    TCG_REG_V24,  TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+    TCG_REG_V28,  TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 
     /* aliases */
     TCG_AREG0          = TCG_REG_S0,
@@ -156,6 +142,32 @@ typedef enum {
 
 #define TCG_TARGET_HAS_tst              0
 
+/* vector instructions */
+#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v128             0
+#define TCG_TARGET_HAS_v256             0
+#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_nand_vec         0
+#define TCG_TARGET_HAS_nor_vec          0
+#define TCG_TARGET_HAS_eqv_vec          0
+#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
+#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_shs_vec          0
+#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
+
+#define TCG_TARGET_HAS_tst_vec          0
+
 #define TCG_TARGET_DEFAULT_MO (0)
 
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
new file mode 100644
index 0000000000..b80b39e1e5
--- /dev/null
+++ b/tcg/riscv/tcg-target.opc.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (c) C-SKY Microsystems Co., Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ *
+ * See the COPYING file in the top-level directory for details.
+ *
+ * Target-specific opcodes for host vector expansion.  These will be
+ * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/14] tcg/riscv: Add basic support for vector
  2024-08-30  6:15 ` [PATCH v2 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-09-02  0:28   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-02  0:28 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, Swung0x48, TANG Tiancheng

On 8/30/24 16:15, LIU Zhiwei wrote:
> From: Swung0x48 <swung0x48@outlook.com>
> 
> The RISC-V vector instruction set utilizes the LMUL field to group
> multiple registers, enabling variable-length vector registers. This
> implementation uses only the first register number of each group while
> reserving the other register numbers within the group.
> 
> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
> host runtime needs to adjust LMUL based on the type to use different
> register groups.
> 
> This presents challenges for TCG's register allocation. Currently, we
> avoid modifying the register allocation part of TCG and only expose the
> minimum number of vector registers.
> 
> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
> LMUL equal to 4, we use 4 vector registers as one register group. We can
> use a maximum of 8 register groups, but the V0 register number is reserved
> as a mask register, so we can effectively use at most 7 register groups.
> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
> forced to be used. This is because TCG cannot yet dynamically constrain
> registers with type; likewise, when the host vlen is 128 bits and
> TCG_TYPE_V256, we can use at most 15 registers.
> 
> There is not much pressure on vector register allocation in TCG now, so
> using 7 registers is feasible and will not have a major impact on code
> generation.
> 
> This patch:
> 1. Reserves vector register 0 for use as a mask register.
> 2. When using register groups, reserves the additional registers within
>     each group.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-str.h |   1 +
>   tcg/riscv/tcg-target.c.inc     | 131 +++++++++++++++++++++++++--------
>   tcg/riscv/tcg-target.h         |  78 +++++++++++---------
>   tcg/riscv/tcg-target.opc.h     |  12 +++
>   4 files changed, 157 insertions(+), 65 deletions(-)
>   create mode 100644 tcg/riscv/tcg-target.opc.h
> 
> diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
> index d5c419dff1..21c4a0a0e0 100644
> --- a/tcg/riscv/tcg-target-con-str.h
> +++ b/tcg/riscv/tcg-target-con-str.h
> @@ -9,6 +9,7 @@
>    * REGS(letter, register_mask)
>    */
>   REGS('r', ALL_GENERAL_REGS)
> +REGS('v', GET_VREG_SET(riscv_vlen))

Perhaps too complicated. Make this MAKE_64BIT_MASK(32, 32); everything else will be 
handled by tcg_target_available_regs[] and reserved_regs.

> @@ -127,6 +113,12 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
>   #define TCG_CT_CONST_J12  0x1000
>   
>   #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
> +#define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
> +#define ALL_DVECTOR_REG_GROUPS 0x5555555400000000
> +#define ALL_QVECTOR_REG_GROUPS 0x1111111000000000

V0 is still a vector register, even if it is reserved.
I think you should include it in these masks.

> +#define GET_VREG_SET(vlen) (vlen == 64 ? ALL_QVECTOR_REG_GROUPS : \
> +                             (vlen == 128 ? ALL_DVECTOR_REG_GROUPS : \
> +                              ALL_VECTOR_REGS))

I think you will not need this macro.

> +/*
> + * RISC-V vector instruction emitters
> + */
> +
> +/* Vector registers uses the same 5 lower bits as GPR registers. */
> +static void tcg_out_opc_reg_vec(TCGContext *s, RISCVInsn opc,
> +                                TCGReg d, TCGReg s1, TCGReg s2, bool vm)
> +{
> +    tcg_out32(s, encode_r(opc, d, s1, s2) | (vm << 25));
> +}
> +
> +static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
> +                                  TCGReg rd, TCGArg imm, TCGReg vs2, bool vm)
> +{
> +    tcg_out32(s, encode_r(opc, rd, (imm & 0x1f), vs2) | (vm << 25));

I think you want to create new encode_* functions, not abuse the integer ones.

> +}
> +
> +/* vm=0 (vm = false) means vector masking ENABLED. */
> +#define tcg_out_opc_vv(s, opc, vd, vs2, vs1, vm) \
> +    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
> +
> +/*
> + * In RISC-V, vs2 is the first operand, while rs1/imm is the
> + * second operand.
> + */
> +#define tcg_out_opc_vx(s, opc, vd, vs2, rs1, vm) \
> +    tcg_out_opc_reg_vec(s, opc, vd, rs1, vs2, vm);
> +
> +#define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
> +    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
> +
> +/*
> + * Only unit-stride addressing implemented; may extend in future.
> + */
> +#define tcg_out_opc_ldst_vec(s, opc, vs3_vd, rs1, vm) \
> +    tcg_out_opc_reg_vec(s, opc, vs3_vd, rs1, 0, vm);

I don't understand the need for any of these #defines.
Why are we not simply creating functions of the correct name?

> @@ -2101,6 +2160,13 @@ static void tcg_target_init(TCGContext *s)
>       tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
>       tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
>   
> +    if (cpuinfo & CPUINFO_ZVE64X) {
> +        TCGRegSet vector_regs = GET_VREG_SET(riscv_vlen);

This ought to be the only usage of GET_VREG_SET, so I think you should inline that code 
with a switch on riscv_vlen/vlenb.


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (2 preceding siblings ...)
  2024-08-30  6:15 ` [PATCH v2 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-08-30  6:15 ` LIU Zhiwei
  2024-09-02  1:06   ` Richard Henderson
  2024-08-30  6:15 ` [PATCH v2 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

In RISC-V, vector operations require initial configuration using
the vset{i}vl{i} instruction.

This instruction:
  1. Sets the vector length (vl) in bytes
  2. Configures the vtype register, which includes:
    SEW (Single Element Width)
    LMUL (vector register group multiplier)
    Other vector operation parameters

This configuration is crucial for defining subsequent vector
operation behavior. To optimize performance, the configuration
process is managed dynamically:
  1. Reconfiguration using vset{i}vl{i} is necessary when SEW
     or vector register group width changes.
  2. The vset instruction can be omitted when configuration
     remains unchanged.

This optimization is only effective within a single TB.
Each TB requires reconfiguration at its start, as the current
state cannot be obtained from hardware.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: Weiwei Li <liwei1518@gmail.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 104 +++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 5ef1538aed..49d01b8775 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -119,6 +119,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define GET_VREG_SET(vlen) (vlen == 64 ? ALL_QVECTOR_REG_GROUPS : \
                              (vlen == 128 ? ALL_DVECTOR_REG_GROUPS : \
                               ALL_VECTOR_REGS))
+#define riscv_vlenb (riscv_vlen / 8)
 
 #define sextreg  sextract64
 
@@ -168,6 +169,18 @@ static bool tcg_target_const_match(int64_t val, int ct,
  * RISC-V Base ISA opcodes (IM)
  */
 
+#define V_OPIVV (0x0 << 12)
+#define V_OPFVV (0x1 << 12)
+#define V_OPMVV (0x2 << 12)
+#define V_OPIVI (0x3 << 12)
+#define V_OPIVX (0x4 << 12)
+#define V_OPFVF (0x5 << 12)
+#define V_OPMVX (0x6 << 12)
+#define V_OPCFG (0x7 << 12)
+
+#define V_SUMOP (0x0 << 20)
+#define V_LUMOP (0x0 << 20)
+
 typedef enum {
     OPC_ADD = 0x33,
     OPC_ADDI = 0x13,
@@ -263,6 +276,11 @@ typedef enum {
     /* Zicond: integer conditional operations */
     OPC_CZERO_EQZ = 0x0e005033,
     OPC_CZERO_NEZ = 0x0e007033,
+
+    /* V: Vector extension 1.0 */
+    OPC_VSETVLI  = 0x57 | V_OPCFG,
+    OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
+    OPC_VSETVL   = 0x80000057 | V_OPCFG,
 } RISCVInsn;
 
 /*
@@ -355,6 +373,35 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
     return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
 }
 
+typedef enum {
+    VTA_TU = 0,
+    VTA_TA,
+} RISCVVta;
+
+typedef enum {
+    VMA_MU = 0,
+    VMA_MA,
+} RISCVVma;
+
+typedef enum {
+    VLMUL_M1 = 0, /* LMUL=1 */
+    VLMUL_M2,     /* LMUL=2 */
+    VLMUL_M4,     /* LMUL=4 */
+    VLMUL_M8,     /* LMUL=8 */
+    VLMUL_RESERVED,
+    VLMUL_MF8,    /* LMUL=1/8 */
+    VLMUL_MF4,    /* LMUL=1/4 */
+    VLMUL_MF2,    /* LMUL=1/2 */
+} RISCVVlmul;
+#define LMUL_MAX 8
+
+static int32_t encode_vtypei(RISCVVta vta, RISCVVma vma,
+                            unsigned vsew, RISCVVlmul vlmul)
+{
+    return (vma & 0x1) << 7 | (vta & 0x1) << 6 | (vsew & 0x7) << 3 |
+           (vlmul & 0x7);
+}
+
 /*
  * RISC-V instruction emitters
  */
@@ -484,6 +531,12 @@ static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
     tcg_out32(s, encode_r(opc, rd, (imm & 0x1f), vs2) | (vm << 25));
 }
 
+static void tcg_out_opc_vec_config(TCGContext *s, RISCVInsn opc,
+                                  TCGReg rd, uint32_t avl, int32_t vtypei)
+{
+    tcg_out32(s, encode_i(opc, rd, avl, vtypei));
+}
+
 /* vm=0 (vm = false) means vector masking ENABLED. */
 #define tcg_out_opc_vv(s, opc, vd, vs2, vs1, vm) \
     tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
@@ -498,12 +551,62 @@ static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
 #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
     tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
 
+#define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
+    tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);
+
 /*
  * Only unit-stride addressing implemented; may extend in future.
  */
 #define tcg_out_opc_ldst_vec(s, opc, vs3_vd, rs1, vm) \
     tcg_out_opc_reg_vec(s, opc, vs3_vd, rs1, 0, vm);
 
+static void tcg_out_vsetvl(TCGContext *s, uint32_t avl, int vtypei)
+{
+    if (avl < 32) {
+        tcg_out_opc_vconfig(s, OPC_VSETIVLI, TCG_REG_ZERO, avl, vtypei);
+    } else {
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
+        tcg_out_opc_vconfig(s, OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtypei);
+    }
+}
+
+/*
+ * TODO: If the vtype value is not supported by the implementation,
+ * then the vill bit is set in vtype, the remaining bits in
+ * vtype are set to zero, and the vl register is also set to zero
+ */
+
+static __thread int prev_vtypei;
+
+#define get_vlmax(vsew) (riscv_vlen / (8 << vsew) * (LMUL_MAX))
+#define get_vec_type_bytes(type)    (type >= TCG_TYPE_V64 ? \
+                                    (8 << (type - TCG_TYPE_V64)) : 0)
+#define calc_vlmul(oprsz)    (ctzl(oprsz / riscv_vlenb))
+
+static void tcg_target_set_vec_config(TCGContext *s, TCGType type,
+                                      unsigned vece)
+{
+    unsigned vsew, oprsz, avl;
+    int vtypei;
+    RISCVVlmul vlmul;
+
+    vsew = vece;
+    oprsz = get_vec_type_bytes(type);
+    avl = oprsz / (1 << vece);
+    vlmul = oprsz > riscv_vlenb ?
+                      calc_vlmul(oprsz) : VLMUL_M1;
+    vtypei = encode_vtypei(VTA_TA, VMA_MA, vsew, vlmul);
+
+    tcg_debug_assert(avl <= get_vlmax(vsew));
+    tcg_debug_assert(vlmul <= VLMUL_RESERVED);
+    tcg_debug_assert(vsew <= MO_64);
+
+    if (vtypei != prev_vtypei) {
+        prev_vtypei = vtypei;
+        tcg_out_vsetvl(s, avl, vtypei);
+    }
+}
+
 /*
  * TCG intrinsics
  */
@@ -2152,6 +2255,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 static void tcg_out_tb_start(TCGContext *s)
 {
+    prev_vtypei = -1;
     /* nothing to do */
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-08-30  6:15 ` [PATCH v2 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
@ 2024-09-02  1:06   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-02  1:06 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:15, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> In RISC-V, vector operations require initial configuration using
> the vset{i}vl{i} instruction.
> 
> This instruction:
>    1. Sets the vector length (vl) in bytes
>    2. Configures the vtype register, which includes:
>      SEW (Single Element Width)
>      LMUL (vector register group multiplier)
>      Other vector operation parameters
> 
> This configuration is crucial for defining subsequent vector
> operation behavior. To optimize performance, the configuration
> process is managed dynamically:
>    1. Reconfiguration using vset{i}vl{i} is necessary when SEW
>       or vector register group width changes.
>    2. The vset instruction can be omitted when configuration
>       remains unchanged.
> 
> This optimization is only effective within a single TB.
> Each TB requires reconfiguration at its start, as the current
> state cannot be obtained from hardware.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Signed-off-by: Weiwei Li <liwei1518@gmail.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.c.inc | 104 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 104 insertions(+)
> 
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index 5ef1538aed..49d01b8775 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -119,6 +119,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
>   #define GET_VREG_SET(vlen) (vlen == 64 ? ALL_QVECTOR_REG_GROUPS : \
>                                (vlen == 128 ? ALL_DVECTOR_REG_GROUPS : \
>                                 ALL_VECTOR_REGS))
> +#define riscv_vlenb (riscv_vlen / 8)
>   
>   #define sextreg  sextract64
>   
> @@ -168,6 +169,18 @@ static bool tcg_target_const_match(int64_t val, int ct,
>    * RISC-V Base ISA opcodes (IM)
>    */
>   
> +#define V_OPIVV (0x0 << 12)
> +#define V_OPFVV (0x1 << 12)
> +#define V_OPMVV (0x2 << 12)
> +#define V_OPIVI (0x3 << 12)
> +#define V_OPIVX (0x4 << 12)
> +#define V_OPFVF (0x5 << 12)
> +#define V_OPMVX (0x6 << 12)
> +#define V_OPCFG (0x7 << 12)
> +
> +#define V_SUMOP (0x0 << 20)
> +#define V_LUMOP (0x0 << 20)
> +
>   typedef enum {
>       OPC_ADD = 0x33,
>       OPC_ADDI = 0x13,
> @@ -263,6 +276,11 @@ typedef enum {
>       /* Zicond: integer conditional operations */
>       OPC_CZERO_EQZ = 0x0e005033,
>       OPC_CZERO_NEZ = 0x0e007033,
> +
> +    /* V: Vector extension 1.0 */
> +    OPC_VSETVLI  = 0x57 | V_OPCFG,
> +    OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
> +    OPC_VSETVL   = 0x80000057 | V_OPCFG,
>   } RISCVInsn;
>   
>   /*
> @@ -355,6 +373,35 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
>       return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
>   }
>   
> +typedef enum {
> +    VTA_TU = 0,
> +    VTA_TA,
> +} RISCVVta;
> +
> +typedef enum {
> +    VMA_MU = 0,
> +    VMA_MA,
> +} RISCVVma;

Do these really need enumerators, or would 'bool' be sufficient?

> +static int32_t encode_vtypei(RISCVVta vta, RISCVVma vma,
> +                            unsigned vsew, RISCVVlmul vlmul)
> +{
> +    return (vma & 0x1) << 7 | (vta & 0x1) << 6 | (vsew & 0x7) << 3 |
> +           (vlmul & 0x7);
> +}

s/vtypei/vtype/g?  vtype is only immediate in specific contexts, and you'll match the 
manual better if you talk about vtype the CSR rather than the vset*vli argument.

Assert values in range rather than masking.

Use MemOp vsew, since you're using MO_64, etc.

> @@ -498,12 +551,62 @@ static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
>   #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
>       tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
>   
> +#define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
> +    tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);

Why the extra define?

> +
> +/*
> + * TODO: If the vtype value is not supported by the implementation,
> + * then the vill bit is set in vtype, the remaining bits in
> + * vtype are set to zero, and the vl register is also set to zero
> + */

Why is this a TODO?
Are you suggesting that we might need to probe *all* of the cases at startup?

> +static __thread int prev_vtypei;

I think we should put this into TCGContext.
We don't currently have any host-specific values there, but there's no reason we can't 
have any.

> +#define get_vlmax(vsew) (riscv_vlen / (8 << vsew) * (LMUL_MAX))

Given that we know that LMUL_MAX is 8, doesn't this cancel out?


> +#define get_vec_type_bytes(type)    (type >= TCG_TYPE_V64 ? \
> +                                    (8 << (type - TCG_TYPE_V64)) : 0)

Again, assert not produce nonsense results.  And this doesn't need hiding in a macro.

> +#define calc_vlmul(oprsz)    (ctzl(oprsz / riscv_vlenb))

I think it's clearer to do this inline, where we can see that oprsz > vlenb.

> +
> +static void tcg_target_set_vec_config(TCGContext *s, TCGType type,
> +                                      unsigned vece)
> +{
> +    unsigned vsew, oprsz, avl;
> +    int vtypei;
> +    RISCVVlmul vlmul;
> +
> +    vsew = vece;

You can just name the argument vsew...

> +    oprsz = get_vec_type_bytes(type);
> +    avl = oprsz / (1 << vece);
> +    vlmul = oprsz > riscv_vlenb ?
> +                      calc_vlmul(oprsz) : VLMUL_M1;

I guess it is always the case that full register operations are preferred over fractional?

> +    vtypei = encode_vtypei(VTA_TA, VMA_MA, vsew, vlmul);
> +
> +    tcg_debug_assert(avl <= get_vlmax(vsew));
> +    tcg_debug_assert(vlmul <= VLMUL_RESERVED);
> +    tcg_debug_assert(vsew <= MO_64);

These asserts should be moved higher, above their first uses.


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 05/14] tcg/riscv: Implement vector load/store
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (3 preceding siblings ...)
  2024-08-30  6:15 ` [PATCH v2 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
@ 2024-08-30  6:15 ` LIU Zhiwei
  2024-09-02  1:31   ` Richard Henderson
  2024-08-30  6:15 ` [PATCH v2 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |   2 +
 tcg/riscv/tcg-target.c.inc     | 169 ++++++++++++++++++++++++++++++++-
 2 files changed, 167 insertions(+), 4 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index aac5ceee2b..d73a62b0f2 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -21,3 +21,5 @@ C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
+C_O0_I2(v, r)
+C_O1_I1(v, r)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 49d01b8775..6f8814564a 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -178,8 +178,11 @@ static bool tcg_target_const_match(int64_t val, int ct,
 #define V_OPMVX (0x6 << 12)
 #define V_OPCFG (0x7 << 12)
 
-#define V_SUMOP (0x0 << 20)
-#define V_LUMOP (0x0 << 20)
+#define V_UNIT_STRIDE (0x0 << 20)
+#define V_UNIT_STRIDE_WHOLE_REG (0x8 << 20)
+
+/* NF <= 7 && BNF >= 0 */
+#define V_NF(x) (x << 29)
 
 typedef enum {
     OPC_ADD = 0x33,
@@ -281,6 +284,25 @@ typedef enum {
     OPC_VSETVLI  = 0x57 | V_OPCFG,
     OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
     OPC_VSETVL   = 0x80000057 | V_OPCFG,
+
+    OPC_VLE8_V  = 0x7 | V_UNIT_STRIDE,
+    OPC_VLE16_V = 0x5007 | V_UNIT_STRIDE,
+    OPC_VLE32_V = 0x6007 | V_UNIT_STRIDE,
+    OPC_VLE64_V = 0x7007 | V_UNIT_STRIDE,
+    OPC_VSE8_V  = 0x27 | V_UNIT_STRIDE,
+    OPC_VSE16_V = 0x5027 | V_UNIT_STRIDE,
+    OPC_VSE32_V = 0x6027 | V_UNIT_STRIDE,
+    OPC_VSE64_V = 0x7027 | V_UNIT_STRIDE,
+
+    OPC_VL1RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(0),
+    OPC_VL2RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
+    OPC_VL4RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
+    OPC_VL8RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+
+    OPC_VS1R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(0),
+    OPC_VS2R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
+    OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
+    OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
 } RISCVInsn;
 
 /*
@@ -607,6 +629,19 @@ static void tcg_target_set_vec_config(TCGContext *s, TCGType type,
     }
 }
 
+static int riscv_set_vec_config_vl(TCGContext *s, TCGType type)
+{
+    int prev_vsew = prev_vtypei < 0 ? MO_8 : ((prev_vtypei >> 3) & 0x7);
+    tcg_target_set_vec_config(s, type, prev_vsew);
+    return prev_vsew;
+}
+
+static void riscv_set_vec_config_vl_vece(TCGContext *s, TCGType type,
+                                         unsigned vece)
+{
+    tcg_target_set_vec_config(s, type, vece);
+}
+
 /*
  * TCG intrinsics
  */
@@ -799,6 +834,17 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
     case OPC_SD:
         tcg_out_opc_store(s, opc, addr, data, imm12);
         break;
+    case OPC_VSE8_V:
+    case OPC_VSE16_V:
+    case OPC_VSE32_V:
+    case OPC_VSE64_V:
+    case OPC_VS1R_V:
+    case OPC_VS2R_V:
+    case OPC_VS4R_V:
+    case OPC_VS8R_V:
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, imm12);
+        tcg_out_opc_ldst_vec(s, opc, data, TCG_REG_TMP0, true);
+        break;
     case OPC_LB:
     case OPC_LBU:
     case OPC_LH:
@@ -808,6 +854,17 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
     case OPC_LD:
         tcg_out_opc_imm(s, opc, data, addr, imm12);
         break;
+    case OPC_VLE8_V:
+    case OPC_VLE16_V:
+    case OPC_VLE32_V:
+    case OPC_VLE64_V:
+    case OPC_VL1RE64_V:
+    case OPC_VL2RE64_V:
+    case OPC_VL4RE64_V:
+    case OPC_VL8RE64_V:
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, imm12);
+        tcg_out_opc_ldst_vec(s, opc, data, TCG_REG_TMP0, true);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -816,14 +873,101 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                        TCGReg arg1, intptr_t arg2)
 {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = (type == TCG_TYPE_I32) ? OPC_LW : OPC_LD;
+    } else {
+        int nf = get_vec_type_bytes(type) / riscv_vlenb;
+
+        switch (nf) {
+        case 1:
+            insn = OPC_VL1RE64_V;
+            break;
+        case 2:
+            insn = OPC_VL2RE64_V;
+            break;
+        case 4:
+            insn = OPC_VL4RE64_V;
+            break;
+        case 8:
+            insn = OPC_VL8RE64_V;
+            break;
+        default:
+            {
+                int prev_vsew = riscv_set_vec_config_vl(s, type);
+
+                switch (prev_vsew) {
+                case MO_8:
+                    insn = OPC_VLE8_V;
+                    break;
+                case MO_16:
+                    insn = OPC_VLE16_V;
+                    break;
+                case MO_32:
+                    insn = OPC_VLE32_V;
+                    break;
+                case MO_64:
+                    insn = OPC_VLE64_V;
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        }
+    }
     tcg_out_ldst(s, insn, arg, arg1, arg2);
 }
 
 static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                        TCGReg arg1, intptr_t arg2)
 {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_SW : OPC_SD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = (type == TCG_TYPE_I32) ? OPC_SW : OPC_SD;
+        tcg_out_ldst(s, insn, arg, arg1, arg2);
+    } else {
+        int nf = get_vec_type_bytes(type) / riscv_vlenb;
+
+        switch (nf) {
+        case 1:
+            insn = OPC_VS1R_V;
+            break;
+        case 2:
+            insn = OPC_VS2R_V;
+            break;
+        case 4:
+            insn = OPC_VS4R_V;
+            break;
+        case 8:
+            insn = OPC_VS8R_V;
+            break;
+        default:
+            {
+                int prev_vsew = riscv_set_vec_config_vl(s, type);
+
+                switch (prev_vsew) {
+                case MO_8:
+                    insn = OPC_VSE8_V;
+                    break;
+                case MO_16:
+                    insn = OPC_VSE16_V;
+                    break;
+                case MO_32:
+                    insn = OPC_VSE32_V;
+                    break;
+                case MO_64:
+                    insn = OPC_VSE64_V;
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        }
+    }
     tcg_out_ldst(s, insn, arg, arg1, arg2);
 }
 
@@ -2018,7 +2162,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const TCGArg args[TCG_MAX_OP_ARGS],
                            const int const_args[TCG_MAX_OP_ARGS])
 {
+    TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0, a1, a2;
+
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
+
     switch (opc) {
+    case INDEX_op_ld_vec:
+        tcg_out_ld(s, type, a0, a1, a2);
+        break;
+    case INDEX_op_st_vec:
+        tcg_out_st(s, type, a0, a1, a2);
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2182,6 +2339,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_a64_i64:
         return C_O0_I2(rZ, r);
 
+    case INDEX_op_st_vec:
+        return C_O0_I2(v, r);
+    case INDEX_op_ld_vec:
+        return C_O1_I1(v, r);
     default:
         g_assert_not_reached();
     }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/14] tcg/riscv: Implement vector load/store
  2024-08-30  6:15 ` [PATCH v2 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
@ 2024-09-02  1:31   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-02  1:31 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:15, LIU Zhiwei wrote:
> @@ -799,6 +834,17 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
>       case OPC_SD:
>           tcg_out_opc_store(s, opc, addr, data, imm12);
>           break;
> +    case OPC_VSE8_V:
> +    case OPC_VSE16_V:
> +    case OPC_VSE32_V:
> +    case OPC_VSE64_V:
> +    case OPC_VS1R_V:
> +    case OPC_VS2R_V:
> +    case OPC_VS4R_V:
> +    case OPC_VS8R_V:
> +        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, imm12);
> +        tcg_out_opc_ldst_vec(s, opc, data, TCG_REG_TMP0, true);
> +        break;
>       case OPC_LB:
>       case OPC_LBU:
>       case OPC_LH:

I think you shouldn't try to handle vector load/store in this same function.
You'll want something like

     if (offset != 0) {
         if (offset == sextreg(offset, 12)) {
             tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, offset);
         } else {
             tcg_out_movi(s, TCG_REG_TMP0, offset);
             tcg_out_opc_reg(s, TCG_REG_TMP0, TCG_REG_TMP0, addr);
         }
         addr = TCG_REG_TMP0;
     }

at the top, instead of the imm12 split currently at the top of tcg_out_ldst.


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 06/14] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (4 preceding siblings ...)
  2024-08-30  6:15 ` [PATCH v2 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
@ 2024-08-30  6:15 ` LIU Zhiwei
  2024-09-02  1:36   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:15 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 54 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 6f8814564a..b6b4bdc269 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -303,6 +303,12 @@ typedef enum {
     OPC_VS2R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
     OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
     OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+
+    OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
+    OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
+    OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
+
+    OPC_VMVNR_V = 0x9e000057 | V_OPIVI,
 } RISCVInsn;
 
 /*
@@ -544,6 +550,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 static void tcg_out_opc_reg_vec(TCGContext *s, RISCVInsn opc,
                                 TCGReg d, TCGReg s1, TCGReg s2, bool vm)
 {
+    tcg_debug_assert(d >= TCG_REG_V0 && d >= TCG_REG_V0);
     tcg_out32(s, encode_r(opc, d, s1, s2) | (vm << 25));
 }
 
@@ -656,6 +663,21 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     case TCG_TYPE_I64:
         tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
         break;
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        {
+            int nf = get_vec_type_bytes(type) / riscv_vlenb;
+
+            if (nf != 0) {
+                tcg_debug_assert(is_power_of_2(nf) && nf <= 8);
+                tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1, true);
+            } else {
+                riscv_set_vec_config_vl(s, type);
+                tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
+            }
+        }
+        break;
     default:
         g_assert_not_reached();
     }
@@ -1042,6 +1064,33 @@ static void tcg_out_addsub2(TCGContext *s,
     }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                                   TCGReg dst, TCGReg src)
+{
+    riscv_set_vec_config_vl_vece(s, type, vece);
+    tcg_out_opc_vx(s, OPC_VMV_V_X, dst, TCG_REG_V0, src, true);
+    return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, TCGReg base, intptr_t offset)
+{
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
+    return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
+
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, int64_t arg)
+{
+    if (arg < 16 && arg >= -16) {
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VMV_V_I, dst, TCG_REG_V0, arg, true);
+        return;
+    }
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, arg);
+    tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
+
 static const struct {
     RISCVInsn op;
     bool swap;
@@ -2170,6 +2219,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     a2 = args[2];
 
     switch (opc) {
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
     case INDEX_op_ld_vec:
         tcg_out_ld(s, type, a0, a1, a2);
         break;
@@ -2341,6 +2393,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_st_vec:
         return C_O0_I2(v, r);
+    case INDEX_op_dup_vec:
+    case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
     default:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/14] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-30  6:15 ` [PATCH v2 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-09-02  1:36   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-02  1:36 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:15, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.c.inc | 54 ++++++++++++++++++++++++++++++++++++++
>   1 file changed, 54 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 07/14] tcg/riscv: Add support for basic vector opcodes
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (5 preceding siblings ...)
  2024-08-30  6:15 ` [PATCH v2 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-02  1:39   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  2 ++
 tcg/riscv/tcg-target-con-str.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 54 ++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h         |  2 +-
 4 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index d73a62b0f2..7277cb9af8 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -22,4 +22,6 @@ C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
 C_O0_I2(v, r)
+C_O0_I2(v, vK)
 C_O1_I1(v, r)
+C_O1_I1(v, v)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index 21c4a0a0e0..a4ae7b49c8 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -17,6 +17,7 @@ REGS('v', GET_VREG_SET(riscv_vlen))
  */
 CONST('I', TCG_CT_CONST_S12)
 CONST('J', TCG_CT_CONST_J12)
+CONST('K', TCG_CT_CONST_S5)
 CONST('N', TCG_CT_CONST_N12)
 CONST('M', TCG_CT_CONST_M12)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index b6b4bdc269..fde4e71260 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -111,6 +111,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_N12   0x400
 #define TCG_CT_CONST_M12   0x800
 #define TCG_CT_CONST_J12  0x1000
+#define TCG_CT_CONST_S5   0x2000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
@@ -162,6 +163,13 @@ static bool tcg_target_const_match(int64_t val, int ct,
     if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
         return 1;
     }
+    /*
+     * Sign extended from 5 bits: [-0x10, 0x0f].
+     * Used for vector-immediate.
+     */
+    if ((ct & TCG_CT_CONST_S5) && val >= -0x10 && val <= 0x0f) {
+        return 1;
+    }
     return 0;
 }
 
@@ -304,6 +312,13 @@ typedef enum {
     OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
     OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
 
+    OPC_VADD_VV = 0x57 | V_OPIVV,
+    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
+    OPC_VAND_VV = 0x24000057 | V_OPIVV,
+    OPC_VOR_VV = 0x28000057 | V_OPIVV,
+    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
+    OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -2228,6 +2243,30 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_add_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VADD_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sub_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSUB_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_and_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vv(s, OPC_VAND_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_or_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vv(s, OPC_VOR_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_xor_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_not_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2247,6 +2286,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 {
     switch (opc) {
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+    case INDEX_op_not_vec:
+        return 1;
     default:
         return 0;
     }
@@ -2397,6 +2443,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
+    case INDEX_op_not_vec:
+        return C_O1_I1(v, v);
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+        return C_O1_I2(v, v, v);
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 12a7a37aaa..acb8dfdf16 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -151,7 +151,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_vec         0
 #define TCG_TARGET_HAS_nor_vec          0
 #define TCG_TARGET_HAS_eqv_vec          0
-#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/14] tcg/riscv: Add support for basic vector opcodes
  2024-08-30  6:16 ` [PATCH v2 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-09-02  1:39   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-02  1:39 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-set.h |  2 ++
>   tcg/riscv/tcg-target-con-str.h |  1 +
>   tcg/riscv/tcg-target.c.inc     | 54 ++++++++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.h         |  2 +-
>   4 files changed, 58 insertions(+), 1 deletion(-)
> 
> diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
> index d73a62b0f2..7277cb9af8 100644
> --- a/tcg/riscv/tcg-target-con-set.h
> +++ b/tcg/riscv/tcg-target-con-set.h
> @@ -22,4 +22,6 @@ C_N1_I2(r, r, rM)
>   C_O1_I4(r, r, rI, rM, rM)
>   C_O2_I4(r, r, rZ, rZ, rM, rM)
>   C_O0_I2(v, r)
> +C_O0_I2(v, vK)
>   C_O1_I1(v, r)
> +C_O1_I1(v, v)
> diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
> index 21c4a0a0e0..a4ae7b49c8 100644
> --- a/tcg/riscv/tcg-target-con-str.h
> +++ b/tcg/riscv/tcg-target-con-str.h
> @@ -17,6 +17,7 @@ REGS('v', GET_VREG_SET(riscv_vlen))
>    */
>   CONST('I', TCG_CT_CONST_S12)
>   CONST('J', TCG_CT_CONST_J12)
> +CONST('K', TCG_CT_CONST_S5)
>   CONST('N', TCG_CT_CONST_N12)
>   CONST('M', TCG_CT_CONST_M12)
>   CONST('Z', TCG_CT_CONST_ZERO)
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index b6b4bdc269..fde4e71260 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -111,6 +111,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
>   #define TCG_CT_CONST_N12   0x400
>   #define TCG_CT_CONST_M12   0x800
>   #define TCG_CT_CONST_J12  0x1000
> +#define TCG_CT_CONST_S5   0x2000
>   

Added, but not used in this patch?


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (6 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03  6:45   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
comparison instructions.

2.Extend comparison results from mask registers to SEW-width elements,
  following recommendations in The RISC-V SPEC Volume I (Version 20240411).

This aligns with TCG's cmp_vec behavior by expanding compare results to
full element width: all 1s for true, all 0s for false.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |   6 +-
 tcg/riscv/tcg-target.c.inc     | 240 +++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.opc.h     |   5 +
 3 files changed, 250 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 7277cb9af8..6c9ad5188b 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -21,7 +21,11 @@ C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
+C_O0_I1(v)
+C_O0_I2(v, v)
 C_O0_I2(v, r)
-C_O0_I2(v, vK)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
+C_O1_I2(v, v, v)
+C_O1_I2(v, v, vi)
+C_O1_I2(v, v, vK)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index fde4e71260..1e8c0fb031 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -312,6 +312,9 @@ typedef enum {
     OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
     OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
 
+    OPC_VMERGE_VIM = 0x5c000057 | V_OPIVI,
+    OPC_VMERGE_VVM = 0x5c000057 | V_OPIVV,
+
     OPC_VADD_VV = 0x57 | V_OPIVV,
     OPC_VSUB_VV = 0x8000057 | V_OPIVV,
     OPC_VAND_VV = 0x24000057 | V_OPIVV,
@@ -319,6 +322,29 @@ typedef enum {
     OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
+    OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
+    OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
+    OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
+    OPC_VMSNE_VV = 0x64000057 | V_OPIVV,
+    OPC_VMSNE_VI = 0x64000057 | V_OPIVI,
+    OPC_VMSNE_VX = 0x64000057 | V_OPIVX,
+
+    OPC_VMSLTU_VV = 0x68000057 | V_OPIVV,
+    OPC_VMSLTU_VX = 0x68000057 | V_OPIVX,
+    OPC_VMSLT_VV = 0x6c000057 | V_OPIVV,
+    OPC_VMSLT_VX = 0x6c000057 | V_OPIVX,
+    OPC_VMSLEU_VV = 0x70000057 | V_OPIVV,
+    OPC_VMSLEU_VX = 0x70000057 | V_OPIVX,
+    OPC_VMSLE_VV = 0x74000057 | V_OPIVV,
+    OPC_VMSLE_VX = 0x74000057 | V_OPIVX,
+
+    OPC_VMSLEU_VI = 0x70000057 | V_OPIVI,
+    OPC_VMSLE_VI = 0x74000057 | V_OPIVI,
+    OPC_VMSGTU_VI = 0x78000057 | V_OPIVI,
+    OPC_VMSGTU_VX = 0x78000057 | V_OPIVX,
+    OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
+    OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -595,6 +621,12 @@ static void tcg_out_opc_vec_config(TCGContext *s, RISCVInsn opc,
 #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
     tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
 
+#define tcg_out_opc_vim_mask(s, opc, vd, vs2, imm) \
+    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, false);
+
+#define tcg_out_opc_vvm_mask(s, opc, vd, vs2, vs1) \
+    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, false);
+
 #define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
     tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);
 
@@ -1139,6 +1171,101 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
     tcg_out_opc_branch(s, op, arg1, arg2, 0);
 }
 
+static const struct {
+    RISCVInsn op;
+    bool swap;
+} tcg_cmpcond_to_rvv_vv[] = {
+    [TCG_COND_EQ] =  { OPC_VMSEQ_VV,  false },
+    [TCG_COND_NE] =  { OPC_VMSNE_VV,  false },
+    [TCG_COND_LT] =  { OPC_VMSLT_VV,  false },
+    [TCG_COND_GE] =  { OPC_VMSLE_VV,  true  },
+    [TCG_COND_GT] =  { OPC_VMSLT_VV,  true  },
+    [TCG_COND_LE] =  { OPC_VMSLE_VV,  false },
+    [TCG_COND_LTU] = { OPC_VMSLTU_VV, false },
+    [TCG_COND_GEU] = { OPC_VMSLEU_VV, true  },
+    [TCG_COND_GTU] = { OPC_VMSLTU_VV, true  },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VV, false }
+};
+
+static const struct {
+    RISCVInsn op;
+    bool invert;
+}  tcg_cmpcond_to_rvv_vx[] = {
+    [TCG_COND_EQ]  = { OPC_VMSEQ_VX,  false },
+    [TCG_COND_NE]  = { OPC_VMSNE_VX,  false },
+    [TCG_COND_GT]  = { OPC_VMSGT_VX,  false },
+    [TCG_COND_LE]  = { OPC_VMSLE_VX,  false },
+    [TCG_COND_LT]  = { OPC_VMSLT_VX,  false },
+    [TCG_COND_LTU] = { OPC_VMSLTU_VX, false },
+    [TCG_COND_GTU] = { OPC_VMSGTU_VX, false },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VX, false },
+    [TCG_COND_GE]  = { OPC_VMSLT_VX,  true  },
+    [TCG_COND_GEU] = { OPC_VMSLTU_VX, true  },
+};
+
+static const struct {
+    RISCVInsn op;
+    int min;
+    int max;
+    bool adjust;
+}  tcg_cmpcond_to_rvv_vi[] = {
+    [TCG_COND_EQ]  = { OPC_VMSEQ_VI,  -16, 15, false },
+    [TCG_COND_NE]  = { OPC_VMSNE_VI,  -16, 15, false },
+    [TCG_COND_GT]  = { OPC_VMSGT_VI,  -16, 15, false },
+    [TCG_COND_LE]  = { OPC_VMSLE_VI,  -16, 15, false },
+    [TCG_COND_LT]  = { OPC_VMSLE_VI,  -15, 16, true  },
+    [TCG_COND_GE]  = { OPC_VMSGT_VI,  -15, 16, true  },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VI,   0, 15, false },
+    [TCG_COND_GTU] = { OPC_VMSGTU_VI,   0, 15, false },
+    [TCG_COND_LTU] = { OPC_VMSLEU_VI,   1, 16, true  },
+    [TCG_COND_GEU] = { OPC_VMSGTU_VI,   1, 16, true  },
+};
+
+static void tcg_out_cmp_vec_vv(TCGContext *s, TCGCond cond,
+                                      TCGReg arg1, TCGReg arg2)
+{
+    RISCVInsn op;
+
+    tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vv));
+    op = tcg_cmpcond_to_rvv_vv[cond].op;
+    tcg_debug_assert(op != 0);
+
+    tcg_out_opc_vv(s, op, TCG_REG_V0, arg1, arg2, true);
+}
+
+static void tcg_out_cmp_vec_vx(TCGContext *s, TCGCond cond,
+                                      TCGReg arg1, TCGReg arg2)
+{
+    RISCVInsn op;
+
+    tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vx));
+    op = tcg_cmpcond_to_rvv_vx[cond].op;
+    tcg_debug_assert(op != 0);
+
+    tcg_out_opc_vx(s, op, TCG_REG_V0, arg1, arg2, true);
+}
+
+static bool tcg_vec_cmp_can_do_vi(TCGCond cond, int64_t arg)
+{
+    signed imm_min, imm_max;
+
+    imm_min = tcg_cmpcond_to_rvv_vi[cond].min;
+    imm_max = tcg_cmpcond_to_rvv_vi[cond].max;
+    return (arg >= imm_min && arg <= imm_max);
+}
+
+static void tcg_out_cmp_vec_vi(TCGContext *s, TCGCond cond, TCGReg arg1,
+                               tcg_target_long arg2)
+{
+    RISCVInsn op;
+
+    tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vi));
+    op = tcg_cmpcond_to_rvv_vi[cond].op;
+    tcg_debug_assert(op != 0);
+
+    tcg_out_opc_vi(s, op, TCG_REG_V0, arg1, arg2, true);
+}
+
 #define SETCOND_INV    TCG_TARGET_NB_REGS
 #define SETCOND_NEZ    (SETCOND_INV << 1)
 #define SETCOND_FLAGS  (SETCOND_INV | SETCOND_NEZ)
@@ -2267,6 +2394,28 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl(s, type);
         tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
         break;
+    case INDEX_op_rvv_cmp_vx:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_cmp_vec_vx(s, a2, a0, a1);
+        break;
+    case INDEX_op_rvv_cmp_vi:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_debug_assert(tcg_vec_cmp_can_do_vi(a2, a1));
+        tcg_out_cmp_vec_vi(s, a2, a0, a1);
+        break;
+    case INDEX_op_rvv_cmp_vv:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_cmp_vec_vv(s, a2, a0, a1);
+        break;
+    case INDEX_op_rvv_merge_vec:
+        if (const_args[2]) {
+            /* vd[i] == v0.mask[i] ? imm : vs2[i] */
+            tcg_out_opc_vim_mask(s, OPC_VMERGE_VIM, a0, a1, a2);
+        } else {
+            /* vd[i] == v0.mask[i] ? vs1[i] : vs2[i] */
+            tcg_out_opc_vvm_mask(s, OPC_VMERGE_VVM, a0, a1, a2);
+        }
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2274,13 +2423,92 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     }
 }
 
+static void expand_vec_cmp_vv(TCGType type, unsigned vece,
+                              TCGv_vec v1, TCGv_vec v2, TCGCond cond)
+{
+    if (tcg_cmpcond_to_rvv_vv[cond].swap) {
+        vec_gen_3(INDEX_op_rvv_cmp_vv, type, vece,
+                  tcgv_vec_arg(v2), tcgv_vec_arg(v1), cond);
+    } else {
+        vec_gen_3(INDEX_op_rvv_cmp_vv, type, vece,
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2), cond);
+    }
+}
+
+static bool expand_vec_cmp_vi(TCGType type, unsigned vece,
+                              TCGv_vec v1, TCGArg a2, TCGCond cond)
+{
+    int64_t arg2 = arg_temp(a2)->val;
+    bool invert = false;
+
+    if (!tcg_vec_cmp_can_do_vi(cond, arg2)) {
+        /* for cmp_vec_vx */
+        vec_gen_3(INDEX_op_rvv_cmp_vx, type, vece,
+                  tcgv_vec_arg(v1), tcgv_i64_arg(tcg_constant_i64(arg2)),
+                  cond);
+
+        tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vx));
+        invert = tcg_cmpcond_to_rvv_vx[cond].invert;
+    } else {
+        if (tcg_cmpcond_to_rvv_vi[cond].adjust) {
+            arg2 -= 1;
+        }
+        vec_gen_3(INDEX_op_rvv_cmp_vi, type, vece,
+                    tcgv_vec_arg(v1), arg2, cond);
+    }
+    return invert;
+}
+
+static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v1,
+                          TCGArg a2, TCGCond cond)
+{
+    bool invert = false;
+    TCGTemp *t1 = arg_temp(a2);
+
+    if (t1->kind == TEMP_CONST) {
+        invert = expand_vec_cmp_vi(type, vece, v1, a2, cond);
+    } else {
+        expand_vec_cmp_vv(type, vece, v1, temp_tcgv_vec(t1), cond);
+    }
+    return invert;
+}
+
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
+    va_list va;
+    TCGv_vec v0, v1;
+    TCGArg a2, a3;
+
+    va_start(va, a0);
+    v0 = temp_tcgv_vec(arg_temp(a0));
+    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    a2 = va_arg(va, TCGArg);
+
     switch (opc) {
+    case INDEX_op_cmp_vec:
+        {
+            a3 = va_arg(va, TCGArg);
+
+            /*
+             * Mask values should be widened into SEW-width elements.
+             * For e.g. when SEW = 8bit, do 0b1 -> 0xff, 0b0 -> 0x00.
+             */
+            if (expand_vec_cmp_noinv(type, vece, v1, a2, a3)) {
+                vec_gen_3(INDEX_op_rvv_merge_vec, type, vece, tcgv_vec_arg(v0),
+                          tcgv_vec_arg(tcg_constant_vec(type, vece, -1)),
+                          tcgv_i64_arg(tcg_constant_i64(0)));
+            } else {
+                vec_gen_3(INDEX_op_rvv_merge_vec, type, vece, tcgv_vec_arg(v0),
+                          tcgv_vec_arg(tcg_constant_vec(type, vece, 0)),
+                          tcgv_i64_arg(tcg_constant_i64(-1)));
+            }
+        }
+        break;
     default:
         g_assert_not_reached();
     }
+    va_end(va);
 }
 
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
@@ -2293,6 +2521,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
         return 1;
+    case INDEX_op_cmp_vec:
+        return -1;
     default:
         return 0;
     }
@@ -2451,6 +2681,16 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
         return C_O1_I2(v, v, v);
+    case INDEX_op_rvv_merge_vec:
+        return C_O1_I2(v, v, vK);
+    case INDEX_op_rvv_cmp_vi:
+        return C_O0_I1(v);
+    case INDEX_op_rvv_cmp_vx:
+        return C_O0_I2(v, r);
+    case INDEX_op_rvv_cmp_vv:
+        return C_O0_I2(v, v);
+    case INDEX_op_cmp_vec:
+        return C_O1_I2(v, v, vi);
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
index b80b39e1e5..8eb0daf0a7 100644
--- a/tcg/riscv/tcg-target.opc.h
+++ b/tcg/riscv/tcg-target.opc.h
@@ -10,3 +10,8 @@
  * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
  * consider these to be UNSPEC with names.
  */
+
+DEF(rvv_cmp_vi, 0, 1, 2, IMPLVEC)
+DEF(rvv_cmp_vx, 0, 2, 1, IMPLVEC)
+DEF(rvv_cmp_vv, 0, 2, 1, IMPLVEC)
+DEF(rvv_merge_vec, 1, 2, 0, IMPLVEC)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops
  2024-08-30  6:16 ` [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
@ 2024-09-03  6:45   ` Richard Henderson
  2024-09-03 14:51     ` Richard Henderson
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2024-09-03  6:45 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/30/24 16:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> 1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
> comparison instructions.
> 
> 2.Extend comparison results from mask registers to SEW-width elements,
>    following recommendations in The RISC-V SPEC Volume I (Version 20240411).
> 
> This aligns with TCG's cmp_vec behavior by expanding compare results to
> full element width: all 1s for true, all 0s for false.
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-set.h |   6 +-
>   tcg/riscv/tcg-target.c.inc     | 240 +++++++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.opc.h     |   5 +
>   3 files changed, 250 insertions(+), 1 deletion(-)
> 
> diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
> index 7277cb9af8..6c9ad5188b 100644
> --- a/tcg/riscv/tcg-target-con-set.h
> +++ b/tcg/riscv/tcg-target-con-set.h
> @@ -21,7 +21,11 @@ C_O1_I2(r, rZ, rZ)
>   C_N1_I2(r, r, rM)
>   C_O1_I4(r, r, rI, rM, rM)
>   C_O2_I4(r, r, rZ, rZ, rM, rM)
> +C_O0_I1(v)
> +C_O0_I2(v, v)
>   C_O0_I2(v, r)
> -C_O0_I2(v, vK)

Removing vK, just added in the previous patch.

> +static bool expand_vec_cmp_vi(TCGType type, unsigned vece,
> +                              TCGv_vec v1, TCGArg a2, TCGCond cond)
> +{
> +    int64_t arg2 = arg_temp(a2)->val;
> +    bool invert = false;
> +
> +    if (!tcg_vec_cmp_can_do_vi(cond, arg2)) {
...
> +static bool expand_vec_cmp_noinv(TCGType type, unsigned vece, TCGv_vec v1,
> +                          TCGArg a2, TCGCond cond)
> +{
> +    bool invert = false;
> +    TCGTemp *t1 = arg_temp(a2);
> +
> +    if (t1->kind == TEMP_CONST) {
> +        invert = expand_vec_cmp_vi(type, vece, v1, a2, cond);

This will not work as you intend, primarily because vector constants are stored in 
expanded form. E.g. MO_8 1 is stored as 0x0101010101010101.

This is handled transparently *if* you use tcg_target_const_match instead.
Otherwise one must (sign)extract the low vece bits, and then double-check that the 
replication of the low bits matches the complete 'a2' value.

I agree that we should be prepared for more vector x scalar operations, but that needs to 
happen during generic expansion rather than very late in the backend.

I think the first implementation should be simpler:

CONST('C', TCG_CT_CONST_CMP_VI)

tcg_target_const_match()
{
     ...
     if ((ct & TCG_CT_CONST_CMP_VI) &&
         val >= tcg_cmpcond_to_rvv_vi[cond].min &&
         val <= tcg_cmpcond_to_rvv_vi[cond].max) {
         return true;
     }
}

     case INDEX_op_cmp_vec:
         riscv_set_vec_config_vl_vece(s, type, vece);
         cond = args[3];
         if (c2) {
             tcg_out_opc_vi(s, tcg_cmpcond_to_rvv_vi[cond].op, a0, a1,
                            a2 - tcg_cmpcond_to_rvv_vi[cond].adjust);
         } else if (tcg_cmpcond_to_rvv_vv[cond].swap) {
             tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op, a0, a2, a1);
         } else {
             tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op, a0, a1, a2);
         }
         break;

This appears to not require any expansion in tcg_expand_vec_op at all.


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops
  2024-09-03  6:45   ` Richard Henderson
@ 2024-09-03 14:51     ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 14:51 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/2/24 23:45, Richard Henderson wrote:
> I think the first implementation should be simpler:
> 
> CONST('C', TCG_CT_CONST_CMP_VI)
> 
> tcg_target_const_match()
> {
>      ...
>      if ((ct & TCG_CT_CONST_CMP_VI) &&
>          val >= tcg_cmpcond_to_rvv_vi[cond].min &&
>          val <= tcg_cmpcond_to_rvv_vi[cond].max) {
>          return true;
>      }
> }
> 
>      case INDEX_op_cmp_vec:
>          riscv_set_vec_config_vl_vece(s, type, vece);
>          cond = args[3];
>          if (c2) {
>              tcg_out_opc_vi(s, tcg_cmpcond_to_rvv_vi[cond].op, a0, a1,
>                             a2 - tcg_cmpcond_to_rvv_vi[cond].adjust);
>          } else if (tcg_cmpcond_to_rvv_vv[cond].swap) {
>              tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op, a0, a2, a1);
>          } else {
>              tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op, a0, a1, a2);
>          }
>          break;
> 
> This appears to not require any expansion in tcg_expand_vec_op at all.

I knew I should have slept on that answer.
Of course you need expansion, because riscv cmp_vv produces a mask.

However, I think we should simply model this as INDEX_op_cmpsel_vec:

     case INDEX_op_cmpsel_vec:
           riscv_set_vec_config_vl_vece(s, type, vece);
           a3 = args[3];
           a4 = args[4];
           cond = args[5];
           /* Use only vmerge_vim if possible, by inverting the test. */
           if (const_args[4] && !const_args[3]) {
               cond = tcg_cond_inv(cond);
               a3 = a4;
               a4 = args[3];
               const_args[3] = true;
               const_args[4] = false;
           }
           /* Perform the comparison into V0 mask. */
           if (const_args[2]) {
               tcg_out_opc_vi(s, tcg_cmpcond_to_rvv_vi[cond].op,
                              TCG_REG_V0, a1,
                              a2 - tcg_cmpcond_to_rvv_vi[cond].adjust);
           } else if (tcg_cmpcond_to_rvv_vv[cond].swap) {
               tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op,
                              TCG_REG_V0, a2, a1);
           } else {
               tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op,
                              TCG_REG_V0, a1, a2);
           }
           if (const_args[3]) {
               if (const_args[4]) {
                   tcg_out_opc_vi(s, OPC_VMV_V_I, a0, TCG_REG_V0, a4, true);
                   a4 = a0;
               }
               tcg_out_opc_vim_mask(s, OPC_VMERGE_VIM, a0, a3, a4);
           } else {
               tcg_out_opc_vvm_mask(s, OPC_VMERGE_VVM, a0, a3, a4);
           }
           break;

Then INDEX_op_cmp_vec should be expanded to

     INDEX_op_cmpsel_vec a0, a1, a2, -1, 0, a3


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 09/14] tcg/riscv: Implement vector neg ops
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (7 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03 14:52   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 8 ++++++++
 tcg/riscv/tcg-target.h     | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 1e8c0fb031..4fc82481a7 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -322,6 +322,7 @@ typedef enum {
     OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
+    OPC_VRSUB_VI = 0xc000057 | V_OPIVI,
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2394,6 +2395,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl(s, type);
         tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
         break;
+    case INDEX_op_neg_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VRSUB_VI, a0, a1, 0, true);
+        break;
     case INDEX_op_rvv_cmp_vx:
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_cmp_vec_vx(s, a2, a0, a1);
@@ -2408,6 +2413,7 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_out_cmp_vec_vv(s, a2, a0, a1);
         break;
     case INDEX_op_rvv_merge_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
         if (const_args[2]) {
             /* vd[i] == v0.mask[i] ? imm : vs2[i] */
             tcg_out_opc_vim_mask(s, OPC_VMERGE_VIM, a0, a1, a2);
@@ -2520,6 +2526,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
+    case INDEX_op_neg_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2673,6 +2680,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
+    case INDEX_op_neg_vec:
     case INDEX_op_not_vec:
         return C_O1_I1(v, v);
     case INDEX_op_add_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index acb8dfdf16..401696d639 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -152,7 +152,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_vec          0
 #define TCG_TARGET_HAS_eqv_vec          0
 #define TCG_TARGET_HAS_not_vec          1
-#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_rots_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/14] tcg/riscv: Implement vector neg ops
  2024-08-30  6:16 ` [PATCH v2 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
@ 2024-09-03 14:52   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 14:52 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/29/24 23:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.c.inc | 8 ++++++++
>   tcg/riscv/tcg-target.h     | 2 +-
>   2 files changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 10/14] tcg/riscv: Implement vector sat/mul ops
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (8 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03 14:52   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 36 ++++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  4 ++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 4fc82481a7..01d03a9208 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -323,6 +323,12 @@ typedef enum {
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
     OPC_VRSUB_VI = 0xc000057 | V_OPIVI,
+    OPC_VMUL_VV = 0x94000057 | V_OPMVV,
+    OPC_VSADD_VV = 0x84000057 | V_OPIVV,
+    OPC_VSSUB_VV = 0x8c000057 | V_OPIVV,
+    OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
+    OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
+
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2399,6 +2405,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vi(s, OPC_VRSUB_VI, a0, a1, 0, true);
         break;
+    case INDEX_op_mul_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMUL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_ssadd_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSADD_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sssub_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSSUB_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_usadd_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSADDU_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_ussub_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmp_vx:
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_cmp_vec_vx(s, a2, a0, a1);
@@ -2527,6 +2553,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
     case INDEX_op_neg_vec:
+    case INDEX_op_mul_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2688,6 +2719,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_and_vec:
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
+    case INDEX_op_mul_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_rvv_merge_vec:
         return C_O1_I2(v, v, vK);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 401696d639..21251f8b23 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -160,8 +160,8 @@ typedef enum {
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
-#define TCG_TARGET_HAS_mul_vec          0
-#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_mul_vec          1
+#define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       0
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 10/14] tcg/riscv: Implement vector sat/mul ops
  2024-08-30  6:16 ` [PATCH v2 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
@ 2024-09-03 14:52   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 14:52 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/29/24 23:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.c.inc | 36 ++++++++++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.h     |  4 ++--
>   2 files changed, 38 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 11/14] tcg/riscv: Implement vector min/max ops
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (9 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03 14:53   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 29 +++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  2 +-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 01d03a9208..7de2da3571 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -329,6 +329,11 @@ typedef enum {
     OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
     OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
 
+    OPC_VMAX_VV = 0x1c000057 | V_OPIVV,
+    OPC_VMAXU_VV = 0x18000057 | V_OPIVV,
+    OPC_VMIN_VV = 0x14000057 | V_OPIVV,
+    OPC_VMINU_VV = 0x10000057 | V_OPIVV,
+
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2425,6 +2430,22 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_smax_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMAX_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_smin_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMIN_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_umax_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMAXU_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_umin_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmp_vx:
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_cmp_vec_vx(s, a2, a0, a1);
@@ -2558,6 +2579,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2724,6 +2749,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_rvv_merge_vec:
         return C_O1_I2(v, v, vK);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 21251f8b23..35e7086ad7 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -162,7 +162,7 @@ typedef enum {
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
-#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 11/14] tcg/riscv: Implement vector min/max ops
  2024-08-30  6:16 ` [PATCH v2 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
@ 2024-09-03 14:53   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 14:53 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/29/24 23:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.c.inc | 29 +++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.h     |  2 +-
>   2 files changed, 30 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 12/14] tcg/riscv: Implement vector shs/v ops
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (10 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03 14:54   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
  2024-08-30  6:16 ` [PATCH v2 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 44 ++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h         |  4 ++--
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 6c9ad5188b..3154fe8ea8 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -29,3 +29,4 @@ C_O1_I1(v, v)
 C_O1_I2(v, v, v)
 C_O1_I2(v, v, vi)
 C_O1_I2(v, v, vK)
+C_O1_I2(v, v, r)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 7de2da3571..31e161c5bc 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -357,6 +357,13 @@ typedef enum {
     OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
     OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
 
+    OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+    OPC_VSLL_VX = 0x94000057 | V_OPIVX,
+    OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+    OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
+    OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+    OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -2446,6 +2453,30 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_shls_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vx(s, OPC_VSLL_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_shrs_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_sars_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vx(s, OPC_VSRA_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_shlv_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSLL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_shrv_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sarv_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmp_vx:
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_cmp_vec_vx(s, a2, a0, a1);
@@ -2583,6 +2614,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2753,7 +2790,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return C_O1_I2(v, v, v);
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
+        return C_O1_I2(v, v, r);
     case INDEX_op_rvv_merge_vec:
         return C_O1_I2(v, v, vK);
     case INDEX_op_rvv_cmp_vi:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 35e7086ad7..41c6c446e8 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -158,8 +158,8 @@ typedef enum {
 #define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
-#define TCG_TARGET_HAS_shs_vec          0
-#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_shs_vec          1
+#define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 12/14] tcg/riscv: Implement vector shs/v ops
  2024-08-30  6:16 ` [PATCH v2 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
@ 2024-09-03 14:54   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 14:54 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/29/24 23:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-set.h |  1 +
>   tcg/riscv/tcg-target.c.inc     | 44 ++++++++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.h         |  4 ++--
>   3 files changed, 47 insertions(+), 2 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (11 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03 15:15   ` Richard Henderson
  2024-08-30  6:16 ` [PATCH v2 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 98 +++++++++++++++++++++++++++++++++++++-
 tcg/riscv/tcg-target.h     |  8 ++--
 tcg/riscv/tcg-target.opc.h |  3 ++
 3 files changed, 104 insertions(+), 5 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 31e161c5bc..6560d3381a 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -358,10 +358,13 @@ typedef enum {
     OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
 
     OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+    OPC_VSLL_VI = 0x94000057 | V_OPIVI,
     OPC_VSLL_VX = 0x94000057 | V_OPIVX,
     OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+    OPC_VSRL_VI = 0xa0000057 | V_OPIVI,
     OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
     OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+    OPC_VSRA_VI = 0xa4000057 | V_OPIVI,
     OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
 
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
@@ -2477,6 +2480,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_rvv_shli_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VSLL_VI, a0, a1, a2, true);
+        break;
+    case INDEX_op_rvv_shri_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VSRL_VI, a0, a1, a2, true);
+        break;
+    case INDEX_op_rvv_sari_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VSRA_VI, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmp_vx:
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_cmp_vec_vx(s, a2, a0, a1);
@@ -2561,7 +2576,8 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
     va_list va;
-    TCGv_vec v0, v1;
+    TCGv_vec v0, v1, v2, t1;
+    TCGv_i32 t2;
     TCGArg a2, a3;
 
     va_start(va, a0);
@@ -2589,6 +2605,69 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
             }
         }
         break;
+    case INDEX_op_shli_vec:
+        if (a2 > 31) {
+            tcg_gen_shls_vec(vece, v0, v1, tcg_constant_i32(a2));
+        } else {
+            vec_gen_3(INDEX_op_rvv_shli_vec, type, vece, tcgv_vec_arg(v0),
+                      tcgv_vec_arg(v1), a2);
+        }
+        break;
+    case INDEX_op_shri_vec:
+        if (a2 > 31) {
+            tcg_gen_shrs_vec(vece, v0, v1, tcg_constant_i32(a2));
+        } else {
+            vec_gen_3(INDEX_op_rvv_shri_vec, type, vece, tcgv_vec_arg(v0),
+                      tcgv_vec_arg(v1), a2);
+        }
+        break;
+    case INDEX_op_sari_vec:
+        if (a2 > 31) {
+            tcg_gen_sars_vec(vece, v0, v1, tcg_constant_i32(a2));
+        } else {
+            vec_gen_3(INDEX_op_rvv_sari_vec, type, vece, tcgv_vec_arg(v0),
+                      tcgv_vec_arg(v1), a2);
+        }
+        break;
+    case INDEX_op_rotli_vec:
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_shli_vec(vece, t1, v1, a2);
+        tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - a2);
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
+    case INDEX_op_rotls_vec:
+        t1 = tcg_temp_new_vec(type);
+        t2 = tcg_temp_new_i32();
+        tcg_gen_neg_i32(t2, temp_tcgv_i32(arg_temp(a2)));
+        tcg_gen_shrs_vec(vece, v0, v1, t2);
+        tcg_gen_shls_vec(vece, t1, v1, temp_tcgv_i32(arg_temp(a2)));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        tcg_temp_free_i32(t2);
+        break;
+    case INDEX_op_rotlv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_neg_vec(vece, t1, v2);
+        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
+    case INDEX_op_rotrv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_neg_vec(vece, t1, v2);
+        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -2622,6 +2701,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sarv_vec:
         return 1;
     case INDEX_op_cmp_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_sari_vec:
+    case INDEX_op_rotls_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
+    case INDEX_op_rotli_vec:
         return -1;
     default:
         return 0;
@@ -2775,6 +2861,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I1(v, r);
     case INDEX_op_neg_vec:
     case INDEX_op_not_vec:
+    case INDEX_op_rotli_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_sari_vec:
+    case INDEX_op_rvv_shli_vec:
+    case INDEX_op_rvv_shri_vec:
+    case INDEX_op_rvv_sari_vec:
         return C_O1_I1(v, v);
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
@@ -2793,10 +2886,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_shls_vec:
     case INDEX_op_shrs_vec:
     case INDEX_op_sars_vec:
+    case INDEX_op_rotls_vec:
         return C_O1_I2(v, v, r);
     case INDEX_op_rvv_merge_vec:
         return C_O1_I2(v, v, vK);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 41c6c446e8..eb5129a976 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -154,10 +154,10 @@ typedef enum {
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          0
-#define TCG_TARGET_HAS_roti_vec         0
-#define TCG_TARGET_HAS_rots_vec         0
-#define TCG_TARGET_HAS_rotv_vec         0
-#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_roti_vec         -1
+#define TCG_TARGET_HAS_rots_vec         -1
+#define TCG_TARGET_HAS_rotv_vec         -1
+#define TCG_TARGET_HAS_shi_vec          -1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
index 8eb0daf0a7..9f31286025 100644
--- a/tcg/riscv/tcg-target.opc.h
+++ b/tcg/riscv/tcg-target.opc.h
@@ -15,3 +15,6 @@ DEF(rvv_cmp_vi, 0, 1, 2, IMPLVEC)
 DEF(rvv_cmp_vx, 0, 2, 1, IMPLVEC)
 DEF(rvv_cmp_vv, 0, 2, 1, IMPLVEC)
 DEF(rvv_merge_vec, 1, 2, 0, IMPLVEC)
+DEF(rvv_shli_vec, 1, 1, 1, IMPLVEC)
+DEF(rvv_shri_vec, 1, 1, 1, IMPLVEC)
+DEF(rvv_sari_vec, 1, 1, 1, IMPLVEC)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops
  2024-08-30  6:16 ` [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
@ 2024-09-03 15:15   ` Richard Henderson
  2024-09-04 15:25     ` LIU Zhiwei
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 15:15 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/29/24 23:16, LIU Zhiwei wrote:
> @@ -2589,6 +2605,69 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
>               }
>           }
>           break;
> +    case INDEX_op_shli_vec:
> +        if (a2 > 31) {
> +            tcg_gen_shls_vec(vece, v0, v1, tcg_constant_i32(a2));
> +        } else {
> +            vec_gen_3(INDEX_op_rvv_shli_vec, type, vece, tcgv_vec_arg(v0),
> +                      tcgv_vec_arg(v1), a2);
> +        }
> +        break;
> +    case INDEX_op_shri_vec:
> +        if (a2 > 31) {
> +            tcg_gen_shrs_vec(vece, v0, v1, tcg_constant_i32(a2));
> +        } else {
> +            vec_gen_3(INDEX_op_rvv_shri_vec, type, vece, tcgv_vec_arg(v0),
> +                      tcgv_vec_arg(v1), a2);
> +        }
> +        break;
> +    case INDEX_op_sari_vec:
> +        if (a2 > 31) {
> +            tcg_gen_sars_vec(vece, v0, v1, tcg_constant_i32(a2));
> +        } else {
> +            vec_gen_3(INDEX_op_rvv_sari_vec, type, vece, tcgv_vec_arg(v0),
> +                      tcgv_vec_arg(v1), a2);
> +        }
> +        break;
> +    case INDEX_op_rotli_vec:
> +        t1 = tcg_temp_new_vec(type);
> +        tcg_gen_shli_vec(vece, t1, v1, a2);
> +        tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - a2);
> +        tcg_gen_or_vec(vece, v0, v0, t1);
> +        tcg_temp_free_vec(t1);
> +        break;
> +    case INDEX_op_rotls_vec:
> +        t1 = tcg_temp_new_vec(type);
> +        t2 = tcg_temp_new_i32();
> +        tcg_gen_neg_i32(t2, temp_tcgv_i32(arg_temp(a2)));
> +        tcg_gen_shrs_vec(vece, v0, v1, t2);
> +        tcg_gen_shls_vec(vece, t1, v1, temp_tcgv_i32(arg_temp(a2)));
> +        tcg_gen_or_vec(vece, v0, v0, t1);
> +        tcg_temp_free_vec(t1);
> +        tcg_temp_free_i32(t2);
> +        break;

I'm trying to work out how much benefit there is here of expanding these early, as opposed 
to simply using TCG_REG_TMP0 when the immediate doesn't fit, or for rotls_vec negation.

> +    case INDEX_op_rotlv_vec:
> +        v2 = temp_tcgv_vec(arg_temp(a2));
> +        t1 = tcg_temp_new_vec(type);
> +        tcg_gen_neg_vec(vece, t1, v2);
> +        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(t1),
> +                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
> +        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(v0),
> +                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
> +        tcg_gen_or_vec(vece, v0, v0, t1);
> +        tcg_temp_free_vec(t1);
> +        break;
> +    case INDEX_op_rotrv_vec:
> +        v2 = temp_tcgv_vec(arg_temp(a2));
> +        t1 = tcg_temp_new_vec(type);
> +        tcg_gen_neg_vec(vece, t1, v2);
> +        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
> +                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
> +        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(v0),
> +                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
> +        tcg_gen_or_vec(vece, v0, v0, t1);
> +        tcg_temp_free_vec(t1);
> +        break;

And here we can use TCG_REG_V0 as the temporary, both for negation and shift intermediate.

     vrsub_vi  V0, a2, 0
     vshlv_vv  V0, a1, V0
     vshrv_vv  a0, a1, a2
     vor_vv    a0, a0, V0


r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops
  2024-09-03 15:15   ` Richard Henderson
@ 2024-09-04 15:25     ` LIU Zhiwei
  2024-09-04 19:05       ` Richard Henderson
  0 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-09-04 15:25 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

[-- Attachment #1: Type: text/plain, Size: 4821 bytes --]


On 2024/9/3 23:15, Richard Henderson wrote:
> On 8/29/24 23:16, LIU Zhiwei wrote:
>> @@ -2589,6 +2605,69 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType 
>> type, unsigned vece,
>>               }
>>           }
>>           break;
>> +    case INDEX_op_shli_vec:
>> +        if (a2 > 31) {
>> +            tcg_gen_shls_vec(vece, v0, v1, tcg_constant_i32(a2));
>> +        } else {
>> +            vec_gen_3(INDEX_op_rvv_shli_vec, type, vece, 
>> tcgv_vec_arg(v0),
>> +                      tcgv_vec_arg(v1), a2);
>> +        }
>> +        break;
>> +    case INDEX_op_shri_vec:
>> +        if (a2 > 31) {
>> +            tcg_gen_shrs_vec(vece, v0, v1, tcg_constant_i32(a2));
>> +        } else {
>> +            vec_gen_3(INDEX_op_rvv_shri_vec, type, vece, 
>> tcgv_vec_arg(v0),
>> +                      tcgv_vec_arg(v1), a2);
>> +        }
>> +        break;
>> +    case INDEX_op_sari_vec:
>> +        if (a2 > 31) {
>> +            tcg_gen_sars_vec(vece, v0, v1, tcg_constant_i32(a2));
>> +        } else {
>> +            vec_gen_3(INDEX_op_rvv_sari_vec, type, vece, 
>> tcgv_vec_arg(v0),
>> +                      tcgv_vec_arg(v1), a2);
>> +        }
>> +        break;
>> +    case INDEX_op_rotli_vec:
>> +        t1 = tcg_temp_new_vec(type);
>> +        tcg_gen_shli_vec(vece, t1, v1, a2);
>> +        tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - a2);
>> +        tcg_gen_or_vec(vece, v0, v0, t1);
>> +        tcg_temp_free_vec(t1);
>> +        break;
>> +    case INDEX_op_rotls_vec:
>> +        t1 = tcg_temp_new_vec(type);
>> +        t2 = tcg_temp_new_i32();
>> +        tcg_gen_neg_i32(t2, temp_tcgv_i32(arg_temp(a2)));
>> +        tcg_gen_shrs_vec(vece, v0, v1, t2);
>> +        tcg_gen_shls_vec(vece, t1, v1, temp_tcgv_i32(arg_temp(a2)));
>> +        tcg_gen_or_vec(vece, v0, v0, t1);
>> +        tcg_temp_free_vec(t1);
>> +        tcg_temp_free_i32(t2);
>> +        break;
>
> I'm trying to work out how much benefit there is here of expanding 
> these early, as opposed to simply using TCG_REG_TMP0 when the 
> immediate doesn't fit,

We find for rotli,  it just copied code from the implementation of 
INDEX_op_shli_vec and INDEX_op_shri_vec if we don't expand it.

   case INDEX_op_rotli_vec:
         if (a2 > 31) {
             tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, a2);
             tcg_out_opc_vx(s, OPC_VSLL_VX, TCG_REG_V0, a1, TCG_REG_TMP0, true);
         } else {
             tcg_out_opc_vi(s, OPC_VSLL_VI, TCG_REG_V0, a1, a2, true);
         }

         if ((8 << vece) - a2) > 31) {
             tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, 8 << vece) - a2);
             tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, TCG_REG_TMP0, true);
         } else {
             tcg_out_opc_vi(s, OPC_VSRL_VI, a0, a1, 8 << vece) - a2, true);
         }
         tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
         break;

Thus, I prefer to expand it early, at least for rotli_vec.

Thanks,
Zhiwei

> or for rotls_vec negation.
>
>> +    case INDEX_op_rotlv_vec:
>> +        v2 = temp_tcgv_vec(arg_temp(a2));
>> +        t1 = tcg_temp_new_vec(type);
>> +        tcg_gen_neg_vec(vece, t1, v2);
>> +        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(t1),
>> +                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
>> +        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(v0),
>> +                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
>> +        tcg_gen_or_vec(vece, v0, v0, t1);
>> +        tcg_temp_free_vec(t1);
>> +        break;
>> +    case INDEX_op_rotrv_vec:
>> +        v2 = temp_tcgv_vec(arg_temp(a2));
>> +        t1 = tcg_temp_new_vec(type);
>> +        tcg_gen_neg_vec(vece, t1, v2);
>> +        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
>> +                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
>> +        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(v0),
>> +                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
>> +        tcg_gen_or_vec(vece, v0, v0, t1);
>> +        tcg_temp_free_vec(t1);
>> +        break;
>
> And here we can use TCG_REG_V0 as the temporary, both for negation and 
> shift intermediate.
>
>     vrsub_vi  V0, a2, 0
>     vshlv_vv  V0, a1, V0
>     vshrv_vv  a0, a1, a2
>     vor_vv    a0, a0, V0
>
>
> r~

[-- Attachment #2: Type: text/html, Size: 6912 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops
  2024-09-04 15:25     ` LIU Zhiwei
@ 2024-09-04 19:05       ` Richard Henderson
  2024-09-05  1:40         ` LIU Zhiwei
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Henderson @ 2024-09-04 19:05 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 08:25, LIU Zhiwei wrote:
>> I'm trying to work out how much benefit there is here of expanding these early, as 
>> opposed to simply using TCG_REG_TMP0 when the immediate doesn't fit,
> 
> We find for rotli,  it just copied code from the implementation of INDEX_op_shli_vec and 
> INDEX_op_shri_vec if we don't expand it.
> 
>    case INDEX_op_rotli_vec:
>          if (a2 > 31) {
>              tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, a2);
>              tcg_out_opc_vx(s, OPC_VSLL_VX, TCG_REG_V0, a1, TCG_REG_TMP0, true);
>          } else {
>              tcg_out_opc_vi(s, OPC_VSLL_VI, TCG_REG_V0, a1, a2, true);
>          }
> 
>          if ((8 << vece) - a2) > 31) {
>              tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, 8 << vece) - a2);
>              tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, TCG_REG_TMP0, true);
>          } else {
>              tcg_out_opc_vi(s, OPC_VSRL_VI, a0, a1, 8 << vece) - a2, true);
>          }
>          tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
>          break;
> 
> Thus, I prefer to expand it early, at least for rotli_vec.

static void tcg_out_vshifti(TCGContext *s, RISCVInsn op_vi, RISCVInsn op_vx,
                             TCGReg dst, TCGReg src, unsigned imm)
{
     if (imm < 32) {
         tcg_out_opc_vi(s, op_vi, dst, src, imm);
     } else {
         tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP0, imm);
         tcg_out_opc_vx(s, op_vx, dst, src, TCG_REG_TMP0);
     }
}


     case INDEX_op_shli_vec:
         set_vconfig_vl_sew(s, type, vece);
         tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, a0, a1, a2);
         break;

     case INDEX_op_rotli_vec:
         set_vconfig_vl_sew(s, type, vece);
         tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, TCG_REG_V0, a1, a2);
         a2 = -a2 & ((8 << vece) - 1);
         tcg_out_vshifti(s, OPC_VSRL_VI, OPC_VSRL_VX, a0, a1, a2);
         tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0);
         break;

r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops
  2024-09-04 19:05       ` Richard Henderson
@ 2024-09-05  1:40         ` LIU Zhiwei
  0 siblings, 0 replies; 33+ messages in thread
From: LIU Zhiwei @ 2024-09-05  1:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/5 3:05, Richard Henderson wrote:
> On 9/4/24 08:25, LIU Zhiwei wrote:
>>> I'm trying to work out how much benefit there is here of expanding 
>>> these early, as opposed to simply using TCG_REG_TMP0 when the 
>>> immediate doesn't fit,
>>
>> We find for rotli,  it just copied code from the implementation of 
>> INDEX_op_shli_vec and INDEX_op_shri_vec if we don't expand it.
>>
>>    case INDEX_op_rotli_vec:
>>          if (a2 > 31) {
>>              tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, 
>> a2);
>>              tcg_out_opc_vx(s, OPC_VSLL_VX, TCG_REG_V0, a1, 
>> TCG_REG_TMP0, true);
>>          } else {
>>              tcg_out_opc_vi(s, OPC_VSLL_VI, TCG_REG_V0, a1, a2, true);
>>          }
>>
>>          if ((8 << vece) - a2) > 31) {
>>              tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, 
>> 8 << vece) - a2);
>>              tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, TCG_REG_TMP0, true);
>>          } else {
>>              tcg_out_opc_vi(s, OPC_VSRL_VI, a0, a1, 8 << vece) - a2, 
>> true);
>>          }
>>          tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
>>          break;
>>
>> Thus, I prefer to expand it early, at least for rotli_vec.
>
> static void tcg_out_vshifti(TCGContext *s, RISCVInsn op_vi, RISCVInsn 
> op_vx,
>                             TCGReg dst, TCGReg src, unsigned imm)
> {
>     if (imm < 32) {
>         tcg_out_opc_vi(s, op_vi, dst, src, imm);
>     } else {
>         tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP0, imm);
>         tcg_out_opc_vx(s, op_vx, dst, src, TCG_REG_TMP0);
>     }
> }
>
>
Thanks for the guide.
> case INDEX_op_shli_vec:
>         set_vconfig_vl_sew(s, type, vece);
>         tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, a0, a1, a2);
>         break;
>
>     case INDEX_op_rotli_vec:
>         set_vconfig_vl_sew(s, type, vece);
>         tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, TCG_REG_V0, a1, a2);
>         a2 = -a2 & ((8 << vece) - 1);
>         tcg_out_vshifti(s, OPC_VSRL_VI, OPC_VSRL_VX, a0, a1, a2);
>         tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0);
>         break;

OK. We will take this way.

Thanks,
Zhiwei

>
> r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 14/14] tcg/riscv: Enable native vector support for TCG host
  2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (12 preceding siblings ...)
  2024-08-30  6:16 ` [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
@ 2024-08-30  6:16 ` LIU Zhiwei
  2024-09-03 15:02   ` Richard Henderson
  13 siblings, 1 reply; 33+ messages in thread
From: LIU Zhiwei @ 2024-08-30  6:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index eb5129a976..b8f553207e 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -143,9 +143,11 @@ typedef enum {
 #define TCG_TARGET_HAS_tst              0
 
 /* vector instructions */
-#define TCG_TARGET_HAS_v64              0
-#define TCG_TARGET_HAS_v128             0
-#define TCG_TARGET_HAS_v256             0
+#define have_rvv    (cpuinfo & CPUINFO_ZVE64X)
+
+#define TCG_TARGET_HAS_v64              have_rvv
+#define TCG_TARGET_HAS_v128             have_rvv
+#define TCG_TARGET_HAS_v256             have_rvv
 #define TCG_TARGET_HAS_andc_vec         0
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_nand_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 14/14] tcg/riscv: Enable native vector support for TCG host
  2024-08-30  6:16 ` [PATCH v2 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
@ 2024-09-03 15:02   ` Richard Henderson
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Henderson @ 2024-09-03 15:02 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/29/24 23:16, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.h | 8 +++++---
>   1 file changed, 5 insertions(+), 3 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-09-05  1:42 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30  6:15 [PATCH v2 00/14] tcg/riscv: Add support for vector LIU Zhiwei
2024-08-30  6:15 ` [PATCH v2 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
2024-08-31 23:59   ` Richard Henderson
2024-08-30  6:15 ` [PATCH v2 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
2024-09-02  0:12   ` Richard Henderson
2024-08-30  6:15 ` [PATCH v2 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
2024-09-02  0:28   ` Richard Henderson
2024-08-30  6:15 ` [PATCH v2 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
2024-09-02  1:06   ` Richard Henderson
2024-08-30  6:15 ` [PATCH v2 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
2024-09-02  1:31   ` Richard Henderson
2024-08-30  6:15 ` [PATCH v2 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
2024-09-02  1:36   ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
2024-09-02  1:39   ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
2024-09-03  6:45   ` Richard Henderson
2024-09-03 14:51     ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
2024-09-03 14:52   ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
2024-09-03 14:52   ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
2024-09-03 14:53   ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
2024-09-03 14:54   ` Richard Henderson
2024-08-30  6:16 ` [PATCH v2 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
2024-09-03 15:15   ` Richard Henderson
2024-09-04 15:25     ` LIU Zhiwei
2024-09-04 19:05       ` Richard Henderson
2024-09-05  1:40         ` LIU Zhiwei
2024-08-30  6:16 ` [PATCH v2 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
2024-09-03 15:02   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).