[PATCH v1 00/15] tcg/riscv: Add support for vector

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 00/15] tcg/riscv: Add support for vector
@ 2024-08-13 11:34 LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 01/15] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
                   ` (14 more replies)
  0 siblings, 15 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

This patch set introduces support for the RISC-V vector extension
in TCG backend for RISC-V targets.

Key features of this patch series include:
  1. Improved register allocation constraints for vector registers.
  2. Implementation of vset{i}vli instructions for vector configuration
     and support for variable-length vector registers and LMUL-based
     register grouping.
  3. Expanded cmp_vec to align TCG's cmp_vec behavior.

We tested the correctness and the acceleration efficiency of vector
instructions for this patch set. Conduct testing using the
CanMV K230 V1.1 development board in an Ubuntu 23 environment.

* Correctness:
Using risu, we generated 200,000 vector instructions for aarch64 and
tested them on the board K230, which supports RISC-V vector 1.0.
The K230 board runs Ubuntu 23, operating a statically compiled user-mode
qemu-aarch64. We ran risu's master mode with a vector backend qemu and
apprentice mode with a non-vector backend qemu. 

```
apprentice:
$ time qemu-aarch64-release --novec arm64_risu/risu --host=localhost arm64_risu/aarch64sve_20w

master:
$ script -c "time qemu-aarch64-release arm64_risu/risu --master arm64_risu/aarch64sve_20w" risu_test.log
Script started, output log file is 'risu_test.log'.
loading test image arm64_risu/aarch64sve_20w...
master port 9191
master: waiting for connection on port 9191...
starting master image at 0x3fc6df1000
starting image
match status...
match!
scalar_cnt = 50447787
vec_cnt = 786345
All cnts = 51234132
Percent vector: 1.53%

RISC-V Vector backend stat:
INDEX_op_mov_vec counts: 503
INDEX_op_dup_vec counts: 9989
INDEX_op_ld_vec counts: 27728
INDEX_op_st_vec counts: 272810
INDEX_op_dupm_vec counts: 263
INDEX_op_add_vec counts: 4842
INDEX_op_sub_vec counts: 2993
INDEX_op_mul_vec counts: 677
INDEX_op_neg_vec counts: 968
INDEX_op_ssadd_vec counts: 223
INDEX_op_usadd_vec counts: 229
INDEX_op_sssub_vec counts: 226
INDEX_op_ussub_vec counts: 265
INDEX_op_smin_vec counts: 650
INDEX_op_umin_vec counts: 1134
INDEX_op_smax_vec counts: 908
INDEX_op_umax_vec counts: 625
INDEX_op_and_vec counts: 8053
INDEX_op_or_vec counts: 4520
INDEX_op_xor_vec counts: 2177
INDEX_op_not_vec counts: 2639
INDEX_op_shli_vec counts: 947
INDEX_op_shri_vec counts: 3769
INDEX_op_sari_vec counts: 2130
INDEX_op_rotli_vec counts: 558
INDEX_op_shls_vec counts: 211
INDEX_op_shrs_vec counts: 779
INDEX_op_sars_vec counts: 162
INDEX_op_rotls_vec counts: 0
INDEX_op_shlv_vec counts: 482
INDEX_op_shrv_vec counts: 229
INDEX_op_sarv_vec counts: 253
INDEX_op_rotlv_vec counts: 0
INDEX_op_rotrv_vec counts: 0
INDEX_op_cmp_vec counts: 3211
INDEX_op_rvv_cmpcond_vec counts: 3211
INDEX_op_rvv_merge_vec counts: 3211
INDEX_op_rvv_shli_vec counts: 736
INDEX_op_rvv_shri_vec counts: 2990
INDEX_op_rvv_sari_vec counts: 1968

real    27m55.049s
user    8m0.284s
sys     1m28.278s
Script done.
```
Note:
The test results are as follows: the qemu frontend of aarch64 did not
use rotls_vec, rotlv_vec, or rotrv_vec, so their generation count in
the test results is 0.

* Vector Acceleration Efficiency: 
We used the opencv_perf_core test program, which has been vectorized and
optimized with O3 compilation. The tests were conducted on the k230 board
running opencv_perf_core via qemu-aarch64. 

The result is below:
```
$ time ./qemu-aarch64 ./opencv_test/opencv_perf_core --gtest_filter="*UMatTest*" --perf_force_samples=50 --perf_min_samples=50
Time compensation is 0
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.5.4-dev
OpenCV VCS version: 6a00483e71-dirty
Build type: Release
Compiler: /mnt/rtos_nfs_new/xujf/arm-tools/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-gnu-g++  (ver 10.2.1)
CPU features: NEON FP16
Note: Google Test filter = *UMatTest*
[==========] Running 12 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 12 tests from OCL_UMatTest_CustomPtr
...
[----------] Global test environment tear-down
[==========] 12 tests from 1 test case ran. (274624 ms total)
[  PASSED  ] 12 tests.
scalar_cnt = 1599444
vec_cnt = 7544
All cnts = 1606988

RISC-V Vector backend stat:
INDEX_op_ld_vec counts: 95
INDEX_op_st_vec counts: 3679
INDEX_op_dupm_vec counts: 2
INDEX_op_add_vec counts: 10
INDEX_op_ssadd_vec counts: 0
INDEX_op_usadd_vec counts: 4
INDEX_op_and_vec counts: 24
INDEX_op_or_vec counts: 17
INDEX_op_xor_vec counts: 4
INDEX_op_cmp_vec counts: 28
INDEX_op_rvv_cmpcond_vec counts: 28
INDEX_op_rvv_merge_vec counts: 28
real    5m37.402s
user    5m36.656s
sys     0m0.588s

$ time ./qemu-aarch64 -novec ./opencv_test/opencv_perf_core --gtest_filter="*UMatTest*" --perf_force_samples=50 --perf_min_samples=50
Time compensation is 0
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.5.4-dev
OpenCV VCS version: 6a00483e71-dirty
Build type: Release
Compiler: /mnt/rtos_nfs_new/xujf/arm-tools/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-linux-gnu-g++  (ver 10.2.1)
CPU features: NEON FP16
Note: Google Test filter = *UMatTest*
[==========] Running 12 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 12 tests from OCL_UMatTest_CustomPtr
...
[----------] Global test environment tear-down
[==========] 12 tests from 1 test case ran. (539970 ms total)
[  PASSED  ] 12 tests.
scalar_cnt = 1598291
vec_cnt = 0
All cnts = 1598291

real    10m4.760s
user    10m3.690s
sys     0m0.824s
```

Using qemu run times, the acceleration ratio is calculated as
(10*60+3)/(5*60+36) = 1.79, indicating a speed improvement of 44%.

And the argument "-novec" is not included in this set, as we only use it for local testing.

Swung0x48 (1):
  tcg/riscv: Add basic support for vector

TANG Tiancheng (14):
  util: Add RISC-V vector extension probe in cpuinfo
  tcg/op-gvec: Fix iteration step in 32-bit operation
  tcg: Fix register allocation constraints
  tcg/riscv: Add riscv vset{i}vli support
  tcg/riscv: Implement vector load/store
  tcg/riscv: Implement vector mov/dup{m/i}
  tcg/riscv: Add support for basic vector opcodes
  tcg/riscv: Implement vector cmp ops
  tcg/riscv: Implement vector not/neg ops
  tcg/riscv: Implement vector sat/mul ops
  tcg/riscv: Implement vector min/max ops
  tcg/riscv: Implement vector shs/v ops
  tcg/riscv: Implement vector roti/v/x shi ops
  tcg/riscv: Enable vector TCG host-native

 host/include/riscv/host/cpuinfo.h |   1 +
 tcg/riscv/tcg-target-con-set.h    |   7 +
 tcg/riscv/tcg-target-con-str.h    |   2 +
 tcg/riscv/tcg-target.c.inc        | 839 ++++++++++++++++++++++++++++--
 tcg/riscv/tcg-target.h            |  82 +--
 tcg/riscv/tcg-target.opc.h        |  18 +
 tcg/tcg-op-gvec.c                 |   2 +-
 tcg/tcg.c                         |  20 +-
 util/cpuinfo-riscv.c              |  20 +-
 9 files changed, 914 insertions(+), 77 deletions(-)
 create mode 100644 tcg/riscv/tcg-target.opc.h

-- 
2.43.0



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 01/15] util: Add RISC-V vector extension probe in cpuinfo
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 02/15] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Add support for probing RISC-V vector extension availability in
the backend. This information will be used when deciding whether
to use vector instructions in code generation.

While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
we use RISCV_HWPROBE_IMA_V instead.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 host/include/riscv/host/cpuinfo.h |  1 +
 util/cpuinfo-riscv.c              | 20 ++++++++++++++++++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/host/include/riscv/host/cpuinfo.h b/host/include/riscv/host/cpuinfo.h
index 2b00660e36..bf6ae51f72 100644
--- a/host/include/riscv/host/cpuinfo.h
+++ b/host/include/riscv/host/cpuinfo.h
@@ -10,6 +10,7 @@
 #define CPUINFO_ZBA             (1u << 1)
 #define CPUINFO_ZBB             (1u << 2)
 #define CPUINFO_ZICOND          (1u << 3)
+#define CPUINFO_ZVE64X          (1u << 4)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
diff --git a/util/cpuinfo-riscv.c b/util/cpuinfo-riscv.c
index 497ce12680..551821edef 100644
--- a/util/cpuinfo-riscv.c
+++ b/util/cpuinfo-riscv.c
@@ -33,7 +33,7 @@ static void sigill_handler(int signo, siginfo_t *si, void *data)
 /* Called both as constructor and (possibly) via other constructors. */
 unsigned __attribute__((constructor)) cpuinfo_init(void)
 {
-    unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND;
+    unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND | CPUINFO_ZVE64X;
     unsigned info = cpuinfo;
 
     if (info) {
@@ -49,6 +49,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 #endif
 #if defined(__riscv_arch_test) && defined(__riscv_zicond)
     info |= CPUINFO_ZICOND;
+#endif
+#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
+    info |= CPUINFO_ZVE64X;
 #endif
     left &= ~info;
 
@@ -64,7 +67,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
             && pair.key >= 0) {
             info |= pair.value & RISCV_HWPROBE_EXT_ZBA ? CPUINFO_ZBA : 0;
             info |= pair.value & RISCV_HWPROBE_EXT_ZBB ? CPUINFO_ZBB : 0;
-            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
+            info |= pair.value & RISCV_HWPROBE_IMA_V ? CPUINFO_ZVE64X : 0;
+            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
 #ifdef RISCV_HWPROBE_EXT_ZICOND
             info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ? CPUINFO_ZICOND : 0;
             left &= ~CPUINFO_ZICOND;
@@ -108,6 +112,18 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
             left &= ~CPUINFO_ZICOND;
         }
 
+        if (left & CPUINFO_ZVE64X) {
+            /* Probe for Vector: vsetivli t0,1,e64,m1,ta,ma */
+            unsigned vl;
+            got_sigill = 0;
+
+            asm volatile(
+                "vsetivli %0, 1, e64, m1, ta, ma\n\t"
+                : "=r"(vl) : : "vl"
+            );
+            info |= (got_sigill || vl != 1) ? 0 : CPUINFO_ZVE64X;
+        }
+
         sigaction(SIGILL, &sa_old, NULL);
         assert(left == 0);
     }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 02/15] tcg/op-gvec: Fix iteration step in 32-bit operation
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 01/15] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 03/15] tcg: Fix register allocation constraints LIU Zhiwei
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

The loop in the 32-bit case of the vector compare operation
was incorrectly incrementing by 8 bytes per iteration instead
of 4 bytes. This caused the function to process only half of
the intended elements.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Fixes: 9622c697d1 (tcg: Add gvec compare with immediate and scalar operand)
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/tcg-op-gvec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0308732d9b..78ee1ced80 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3939,7 +3939,7 @@ void tcg_gen_gvec_cmps(TCGCond cond, unsigned vece, uint32_t dofs,
         uint32_t i;
 
         tcg_gen_extrl_i64_i32(t1, c);
-        for (i = 0; i < oprsz; i += 8) {
+        for (i = 0; i < oprsz; i += 4) {
             tcg_gen_ld_i32(t0, tcg_env, aofs + i);
             tcg_gen_negsetcond_i32(cond, t0, t0, t1);
             tcg_gen_st_i32(t0, tcg_env, dofs + i);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 01/15] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 02/15] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 11:52   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 04/15] tcg/riscv: Add basic support for vector LIU Zhiwei
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

When allocating registers for input and output, ensure they match
the available registers to avoid allocating illeagal registers.

We should respect RISC-V vector extension's variable-length registers
and LMUL-based register grouping. Coordinate with tcg_target_available_regs
initialization tcg_target_init (behind this commit) to ensure proper
handling of vector register constraints.

Note: While mov_vec doesn't have constraints, dup_vec and other IRs do.
We need to strengthen constraints for all IRs except mov_vec, and this
is sufficient.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Fixes: 29f5e92502 (tcg: Introduce paired register allocation)
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/tcg.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 34e3056380..d26b42534d 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4722,8 +4722,10 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
         return;
     }
 
-    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs;
-    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].regs;
+    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs &
+                                    tcg_target_available_regs[ots->type];
+    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].regs &
+                                    tcg_target_available_regs[its->type];
 
     /* Allocate the output register now.  */
     if (ots->val_type != TEMP_VAL_REG) {
@@ -4876,7 +4878,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
 
         reg = ts->reg;
         i_preferred_regs = 0;
-        i_required_regs = arg_ct->regs;
+        i_required_regs = arg_ct->regs & tcg_target_available_regs[ts->type];
         allocate_new_reg = false;
         copyto_new_reg = false;
 
@@ -5078,6 +5080,7 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
 
         /* satisfy the output constraints */
         for(k = 0; k < nb_oargs; k++) {
+            TCGRegSet o_required_regs;
             i = def->args_ct[k].sort_index;
             arg = op->args[i];
             arg_ct = &def->args_ct[i];
@@ -5085,17 +5088,19 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
 
             /* ENV should not be modified.  */
             tcg_debug_assert(!temp_readonly(ts));
+            o_required_regs = arg_ct->regs &
+                              tcg_target_available_regs[ts->type];
 
             switch (arg_ct->pair) {
             case 0: /* not paired */
                 if (arg_ct->oalias && !const_args[arg_ct->alias_index]) {
                     reg = new_args[arg_ct->alias_index];
                 } else if (arg_ct->newreg) {
-                    reg = tcg_reg_alloc(s, arg_ct->regs,
+                    reg = tcg_reg_alloc(s, o_required_regs,
                                         i_allocated_regs | o_allocated_regs,
                                         output_pref(op, k), ts->indirect_base);
                 } else {
-                    reg = tcg_reg_alloc(s, arg_ct->regs, o_allocated_regs,
+                    reg = tcg_reg_alloc(s, o_required_regs, o_allocated_regs,
                                         output_pref(op, k), ts->indirect_base);
                 }
                 break;
@@ -5104,12 +5109,13 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
                 if (arg_ct->oalias) {
                     reg = new_args[arg_ct->alias_index];
                 } else if (arg_ct->newreg) {
-                    reg = tcg_reg_alloc_pair(s, arg_ct->regs,
+                    reg = tcg_reg_alloc_pair(s, o_required_regs,
                                              i_allocated_regs | o_allocated_regs,
                                              output_pref(op, k),
                                              ts->indirect_base);
                 } else {
-                    reg = tcg_reg_alloc_pair(s, arg_ct->regs, o_allocated_regs,
+                    reg = tcg_reg_alloc_pair(s, o_required_regs,
+                                             o_allocated_regs,
                                              output_pref(op, k),
                                              ts->indirect_base);
                 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-13 11:34 ` [PATCH v1 03/15] tcg: Fix register allocation constraints LIU Zhiwei
@ 2024-08-13 11:52   ` Richard Henderson
  2024-08-14  0:58     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-13 11:52 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> When allocating registers for input and output, ensure they match
> the available registers to avoid allocating illeagal registers.
> 
> We should respect RISC-V vector extension's variable-length registers
> and LMUL-based register grouping. Coordinate with tcg_target_available_regs
> initialization tcg_target_init (behind this commit) to ensure proper
> handling of vector register constraints.
> 
> Note: While mov_vec doesn't have constraints, dup_vec and other IRs do.
> We need to strengthen constraints for all IRs except mov_vec, and this
> is sufficient.
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Fixes: 29f5e92502 (tcg: Introduce paired register allocation)
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/tcg.c | 20 +++++++++++++-------
>   1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 34e3056380..d26b42534d 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -4722,8 +4722,10 @@ static void tcg_reg_alloc_dup(TCGContext *s, const TCGOp *op)
>           return;
>       }
>   
> -    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs;
> -    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].regs;
> +    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs &
> +                                    tcg_target_available_regs[ots->type];
> +    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].regs &
> +                                    tcg_target_available_regs[its->type];
>   

Why would you ever have constraints that resolve to unavailable registers?

If you don't want to fix this in the backend, then the next best place is in 
process_op_defs(), so that we take care of this once at startup, and never have to think 
about it again.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-13 11:52   ` Richard Henderson
@ 2024-08-14  0:58     ` LIU Zhiwei
  2024-08-14  2:04       ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-14  0:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/13 19:52, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>
>> When allocating registers for input and output, ensure they match
>> the available registers to avoid allocating illeagal registers.
>>
>> We should respect RISC-V vector extension's variable-length registers
>> and LMUL-based register grouping. Coordinate with 
>> tcg_target_available_regs
>> initialization tcg_target_init (behind this commit) to ensure proper
>> handling of vector register constraints.
>>
>> Note: While mov_vec doesn't have constraints, dup_vec and other IRs do.
>> We need to strengthen constraints for all IRs except mov_vec, and this
>> is sufficient.
>>
>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>> Fixes: 29f5e92502 (tcg: Introduce paired register allocation)
>> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
>> ---
>>   tcg/tcg.c | 20 +++++++++++++-------
>>   1 file changed, 13 insertions(+), 7 deletions(-)
>>
>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>> index 34e3056380..d26b42534d 100644
>> --- a/tcg/tcg.c
>> +++ b/tcg/tcg.c
>> @@ -4722,8 +4722,10 @@ static void tcg_reg_alloc_dup(TCGContext *s, 
>> const TCGOp *op)
>>           return;
>>       }
>>   -    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs;
>> -    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].regs;
>> +    dup_out_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[0].regs &
>> + tcg_target_available_regs[ots->type];
>> +    dup_in_regs = tcg_op_defs[INDEX_op_dup_vec].args_ct[1].regs &
>> + tcg_target_available_regs[its->type];
>
> Why would you ever have constraints that resolve to unavailable 
> registers?
>
> If you don't want to fix this in the backend, then the next best place 
> is in process_op_defs(), so that we take care of this once at startup, 
> and never have to think about it again.

Hi Richard,

The constraints provided in process_op_defs() are static and tied to the 
IR operations. For example, if we create constraints for add_vec, the 
same constraints will apply to all types of add_vec operations 
(TCG_TYPE_V64, TCG_TYPE_V128, TCG_TYPE_V256). This means the constraints 
don't change based on the specific type of operation being performed.
In contrast, RISC-V's LMUL (Length Multiplier) can change at runtime 
depending on the type of IR operation. Different LMUL values affect 
which vector registers are available for use in RISC-V. Let's consider 
an example where the host's vector register width is 128 bits:

For an add_vec operation on v256 (256-bit vectors), only even-numbered 
vector registers like 0, 2, 4 can be used.
However, for an add_vec operation on v128 (128-bit vectors), all vector 
registers (0, 1, 2, etc.) are available.

Thus if we want to use all registers of vectors, we have to add a 
dynamic constraint on register allocation based on IR types.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-14  0:58     ` LIU Zhiwei
@ 2024-08-14  2:04       ` Richard Henderson
  2024-08-14  2:27         ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  2:04 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/14/24 10:58, LIU Zhiwei wrote:
> Thus if we want to use all registers of vectors, we have to add a dynamic constraint on 
> register allocation based on IR types.

My comment vs patch 4 is that you can't do that, at least not without large changes to TCG.

In addition, I said that the register pressure on vector regs is not high enough to 
justify such changes.  There is, so far, little benefit in having more than 4 or 5 vector 
registers, much less 32.  Thus 7 (lmul 4, omitting v0) is sufficient.

r~

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-14  2:04       ` Richard Henderson
@ 2024-08-14  2:27         ` LIU Zhiwei
  2024-08-14  3:08           ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-14  2:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 2024/8/14 10:04, Richard Henderson wrote:
> On 8/14/24 10:58, LIU Zhiwei wrote:
>> Thus if we want to use all registers of vectors, we have to add a 
>> dynamic constraint on register allocation based on IR types.
>
> My comment vs patch 4 is that you can't do that, at least not without 
> large changes to TCG.
>
> In addition, I said that the register pressure on vector regs is not 
> high enough to justify such changes.  There is, so far, little benefit 
> in having more than 4 or 5 vector registers, much less 32.  Thus 7 
> (lmul 4, omitting v0) is sufficient.

At least on QEMU, SVE can support 2048 bit vector length with 
'sve-default-vector-length=256'.  Software optimized with SVE, such as 
X264 can benefit with long SVE length in less dynamic A64 instructions.

We want to expose all host vector ability. Thus the largest 
TCG_TYPE_V256 is not enough, as 128-bit RVV can give 8*128=1024 width 
operation. We have expand TCG_TYPE_V512/1024/2048 types(not in this 
patch set, but intend to upstream later).
With large TCG_TYPE_V1024/2048, we get better performance on RISC-V 
board with much less translated RISC-V vector instructions. We can give 
a more detailed experiment result if needed.

However, we will only have 3 vector register when support 
TCG_TYPE_V1024.  And even less for TCG_TYPE_V2048.  Current approach 
will give more vectors TCG_TYPE_V128 even with support TCG_TYPE_V1024, 
which will relax some guest NEON register pressure.

Thanks,
Zhiwei

>
>
> r~

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-14  2:27         ` LIU Zhiwei
@ 2024-08-14  3:08           ` Richard Henderson
  2024-08-14  3:30             ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  3:08 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/14/24 12:27, LIU Zhiwei wrote:
> 
> On 2024/8/14 10:04, Richard Henderson wrote:
>> On 8/14/24 10:58, LIU Zhiwei wrote:
>>> Thus if we want to use all registers of vectors, we have to add a dynamic constraint on 
>>> register allocation based on IR types.
>>
>> My comment vs patch 4 is that you can't do that, at least not without large changes to TCG.
>>
>> In addition, I said that the register pressure on vector regs is not high enough to 
>> justify such changes.  There is, so far, little benefit in having more than 4 or 5 
>> vector registers, much less 32.  Thus 7 (lmul 4, omitting v0) is sufficient.
> 
> At least on QEMU, SVE can support 2048 bit vector length with 'sve-default-vector- 
> length=256'.  Software optimized with SVE, such as X264 can benefit with long SVE length 
> in less dynamic A64 instructions.
> 
> We want to expose all host vector ability. Thus the largest TCG_TYPE_V256 is not enough, 
> as 128-bit RVV can give 8*128=1024 width operation. We have expand TCG_TYPE_V512/1024/2048 
> types(not in this patch set, but intend to upstream later).
> With large TCG_TYPE_V1024/2048, we get better performance on RISC-V board with much less 
> translated RISC-V vector instructions. We can give a more detailed experiment result if 
> needed.
> 
> However, we will only have 3 vector register when support TCG_TYPE_V1024.  And even less 
> for TCG_TYPE_V2048.  Current approach will give more vectors TCG_TYPE_V128 even with 
> support TCG_TYPE_V1024, which will relax some guest NEON register pressure.

Then you will have to teach TCG about one operand consuming and clobbering N hard 
registers, so that you get the spills and fills done correctly.

But you haven't done that in this patch set, so will currently generate incorrect code.

I think you should make longer vector operations a longer term project, and start with 
something simpler.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-14  3:08           ` Richard Henderson
@ 2024-08-14  3:30             ` LIU Zhiwei
  2024-08-14  4:18               ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-14  3:30 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 11:08, Richard Henderson wrote:
> On 8/14/24 12:27, LIU Zhiwei wrote:
>>
>> On 2024/8/14 10:04, Richard Henderson wrote:
>>> On 8/14/24 10:58, LIU Zhiwei wrote:
>>>> Thus if we want to use all registers of vectors, we have to add a 
>>>> dynamic constraint on register allocation based on IR types.
>>>
>>> My comment vs patch 4 is that you can't do that, at least not 
>>> without large changes to TCG.
>>>
>>> In addition, I said that the register pressure on vector regs is not 
>>> high enough to justify such changes.  There is, so far, little 
>>> benefit in having more than 4 or 5 vector registers, much less 32.  
>>> Thus 7 (lmul 4, omitting v0) is sufficient.
>>
>> At least on QEMU, SVE can support 2048 bit vector length with 
>> 'sve-default-vector- length=256'.  Software optimized with SVE, such 
>> as X264 can benefit with long SVE length in less dynamic A64 
>> instructions.
>>
>> We want to expose all host vector ability. Thus the largest 
>> TCG_TYPE_V256 is not enough, as 128-bit RVV can give 8*128=1024 width 
>> operation. We have expand TCG_TYPE_V512/1024/2048 types(not in this 
>> patch set, but intend to upstream later).
>> With large TCG_TYPE_V1024/2048, we get better performance on RISC-V 
>> board with much less translated RISC-V vector instructions. We can 
>> give a more detailed experiment result if needed.
>>
>> However, we will only have 3 vector register when support 
>> TCG_TYPE_V1024.  And even less for TCG_TYPE_V2048.  Current approach 
>> will give more vectors TCG_TYPE_V128 even with support 
>> TCG_TYPE_V1024, which will relax some guest NEON register pressure.
>
> Then you will have to teach TCG about one operand consuming and 
> clobbering N hard registers, so that you get the spills and fills done 
> correctly.
I think we have done this in patch 6.
>
> But you haven't done that in this patch set, so will currently 
> generate incorrect code.
>
> I think you should make longer vector operations a longer term project, 

Does longer vector operations implementation deserves to upstream? We 
can contribute it sooner as it is ready.

> and start with something simpler.

Agree if you insist. 🙂

Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-14  3:30             ` LIU Zhiwei
@ 2024-08-14  4:18               ` Richard Henderson
  2024-08-14  7:47                 ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  4:18 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/14/24 13:30, LIU Zhiwei wrote:
> 
> On 2024/8/14 11:08, Richard Henderson wrote:
>> On 8/14/24 12:27, LIU Zhiwei wrote:
>>>
>>> On 2024/8/14 10:04, Richard Henderson wrote:
>>>> On 8/14/24 10:58, LIU Zhiwei wrote:
>>>>> Thus if we want to use all registers of vectors, we have to add a dynamic constraint 
>>>>> on register allocation based on IR types.
>>>>
>>>> My comment vs patch 4 is that you can't do that, at least not without large changes to 
>>>> TCG.
>>>>
>>>> In addition, I said that the register pressure on vector regs is not high enough to 
>>>> justify such changes.  There is, so far, little benefit in having more than 4 or 5 
>>>> vector registers, much less 32. Thus 7 (lmul 4, omitting v0) is sufficient.
>>>
>>> At least on QEMU, SVE can support 2048 bit vector length with 'sve-default-vector- 
>>> length=256'.  Software optimized with SVE, such as X264 can benefit with long SVE 
>>> length in less dynamic A64 instructions.
>>>
>>> We want to expose all host vector ability. Thus the largest TCG_TYPE_V256 is not 
>>> enough, as 128-bit RVV can give 8*128=1024 width operation. We have expand 
>>> TCG_TYPE_V512/1024/2048 types(not in this patch set, but intend to upstream later).
>>> With large TCG_TYPE_V1024/2048, we get better performance on RISC-V board with much 
>>> less translated RISC-V vector instructions. We can give a more detailed experiment 
>>> result if needed.
>>>
>>> However, we will only have 3 vector register when support TCG_TYPE_V1024.  And even 
>>> less for TCG_TYPE_V2048.  Current approach will give more vectors TCG_TYPE_V128 even 
>>> with support TCG_TYPE_V1024, which will relax some guest NEON register pressure.
>>
>> Then you will have to teach TCG about one operand consuming and clobbering N hard 
>> registers, so that you get the spills and fills done correctly.
> I think we have done this in patch 6.

No, you have not.

There are no modifications to tcg_reg_alloc, and there are no additional calls to 
tcg_reg_free, which is where spills are generated. There would also need to be changes on 
the fill side, temp_load.


>> I think you should make longer vector operations a longer term project, 
> 
> Does longer vector operations implementation deserves to upstream? We can contribute it 
> sooner as it is ready.

Sure.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 03/15] tcg: Fix register allocation constraints
  2024-08-14  4:18               ` Richard Henderson
@ 2024-08-14  7:47                 ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-14  7:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 12:18, Richard Henderson wrote:
> On 8/14/24 13:30, LIU Zhiwei wrote:
>>
>> On 2024/8/14 11:08, Richard Henderson wrote:
>>> On 8/14/24 12:27, LIU Zhiwei wrote:
>>>>
>>>> On 2024/8/14 10:04, Richard Henderson wrote:
>>>>> On 8/14/24 10:58, LIU Zhiwei wrote:
>>>>>> Thus if we want to use all registers of vectors, we have to add a 
>>>>>> dynamic constraint on register allocation based on IR types.
>>>>>
>>>>> My comment vs patch 4 is that you can't do that, at least not 
>>>>> without large changes to TCG.
>>>>>
>>>>> In addition, I said that the register pressure on vector regs is 
>>>>> not high enough to justify such changes.  There is, so far, little 
>>>>> benefit in having more than 4 or 5 vector registers, much less 32. 
>>>>> Thus 7 (lmul 4, omitting v0) is sufficient.
>>>>
>>>> At least on QEMU, SVE can support 2048 bit vector length with 
>>>> 'sve-default-vector- length=256'.  Software optimized with SVE, 
>>>> such as X264 can benefit with long SVE length in less dynamic A64 
>>>> instructions.
>>>>
>>>> We want to expose all host vector ability. Thus the largest 
>>>> TCG_TYPE_V256 is not enough, as 128-bit RVV can give 8*128=1024 
>>>> width operation. We have expand TCG_TYPE_V512/1024/2048 types(not 
>>>> in this patch set, but intend to upstream later).
>>>> With large TCG_TYPE_V1024/2048, we get better performance on RISC-V 
>>>> board with much less translated RISC-V vector instructions. We can 
>>>> give a more detailed experiment result if needed.
>>>>
>>>> However, we will only have 3 vector register when support 
>>>> TCG_TYPE_V1024.  And even less for TCG_TYPE_V2048.  Current 
>>>> approach will give more vectors TCG_TYPE_V128 even with support 
>>>> TCG_TYPE_V1024, which will relax some guest NEON register pressure.
>>>
>>> Then you will have to teach TCG about one operand consuming and 
>>> clobbering N hard registers, so that you get the spills and fills 
>>> done correctly.
>> I think we have done this in patch 6.
>
> No, you have not.
>
> There are no modifications to tcg_reg_alloc, and there are no 
> additional calls to tcg_reg_free, which is where spills are generated. 
> There would also need to be changes on the fill side, temp_load.
Thanks. I choose the simple design as you suggest for this patch set. 
And We will fix this problem when send the longer vector operations 
implementation.
>
>
>>> I think you should make longer vector operations a longer term project, 
>>
>> Does longer vector operations implementation deserves to upstream? We 
>> can contribute it sooner as it is ready.
>
> Sure.

Good news!

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 04/15] tcg/riscv: Add basic support for vector
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (2 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 03/15] tcg: Fix register allocation constraints LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 12:19   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, Swung0x48,
	TANG Tiancheng

From: Swung0x48 <swung0x48@outlook.com>

The RISC-V vector instruction set utilizes the LMUL field to group
multiple registers, enabling variable-length vector registers.
This implementation uses only the first register number of each group
while reserving the other register numbers within the group.

The unused register numbers within groups are implemented by adding
constraints to tcg_target_available_regs during register allocation
in tcg.c in the previous commit.

This patch:
1. Reserves vector register 0 for use as a mask register.
2. When using register groups, reserves the additional registers within
   each group.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-str.h |   1 +
 tcg/riscv/tcg-target.c.inc     | 151 ++++++++++++++++++++++++++-------
 tcg/riscv/tcg-target.h         |  78 ++++++++++-------
 tcg/riscv/tcg-target.opc.h     |  12 +++
 4 files changed, 177 insertions(+), 65 deletions(-)
 create mode 100644 tcg/riscv/tcg-target.opc.h

diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index d5c419dff1..b2b3211bcb 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -9,6 +9,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('v', ALL_VECTOR_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index d334857226..ca9bafcb3c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -30,40 +30,18 @@
 #include "../tcg-ldst.c.inc"
 #include "../tcg-pool.c.inc"
 
+int riscv_vlen = -1;
+
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
-    "zero",
-    "ra",
-    "sp",
-    "gp",
-    "tp",
-    "t0",
-    "t1",
-    "t2",
-    "s0",
-    "s1",
-    "a0",
-    "a1",
-    "a2",
-    "a3",
-    "a4",
-    "a5",
-    "a6",
-    "a7",
-    "s2",
-    "s3",
-    "s4",
-    "s5",
-    "s6",
-    "s7",
-    "s8",
-    "s9",
-    "s10",
-    "s11",
-    "t3",
-    "t4",
-    "t5",
-    "t6"
+    "zero", "ra",  "sp",  "gp",  "tp",  "t0",  "t1",  "t2",
+    "s0",   "s1",  "a0",  "a1",  "a2",  "a3",  "a4",  "a5",
+    "a6",   "a7",  "s2",  "s3",  "s4",  "s5",  "s6",  "s7",
+    "s8",   "s9",  "s10", "s11", "t3",  "t4",  "t5",  "t6",
+    "v0",   "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
+    "v8",   "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+    "v16",  "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+    "v24",  "v25", "v26", "v27", "v28", "v29", "v30", "v31",
 };
 #endif
 
@@ -100,6 +78,16 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_A5,
     TCG_REG_A6,
     TCG_REG_A7,
+
+    /* Vector registers and TCG_REG_V0 reserved for mask. */
+    TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,  TCG_REG_V4,
+    TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,  TCG_REG_V8,
+    TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11, TCG_REG_V12,
+    TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, TCG_REG_V16,
+    TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, TCG_REG_V20,
+    TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, TCG_REG_V24,
+    TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, TCG_REG_V28,
+    TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -127,6 +115,9 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_J12  0x1000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
+#define ALL_DVECTOR_REG_GROUPS 0x5555555400000000
+#define ALL_QVECTOR_REG_GROUPS 0x1111111000000000
 
 #define sextreg  sextract64
 
@@ -475,6 +466,43 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
     }
 }
 
+/*
+ * RISC-V vector instruction emitters
+ */
+
+/* Vector registers uses the same 5 lower bits as GPR registers. */
+static void tcg_out_opc_reg_vec(TCGContext *s, RISCVInsn opc,
+                                TCGReg d, TCGReg s1, TCGReg s2, bool vm)
+{
+    tcg_out32(s, encode_r(opc, d, s1, s2) | (vm << 25));
+}
+
+static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
+                                  TCGReg rd, TCGArg imm, TCGReg vs2, bool vm)
+{
+    tcg_out32(s, encode_r(opc, rd, (imm & 0x1f), vs2) | (vm << 25));
+}
+
+/* vm=0 (vm = false) means vector masking ENABLED. */
+#define tcg_out_opc_vv(s, opc, vd, vs2, vs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
+
+/*
+ * In RISC-V, vs2 is the first operand, while rs1/imm is the
+ * second operand.
+ */
+#define tcg_out_opc_vx(s, opc, vd, vs2, rs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vd, rs1, vs2, vm);
+
+#define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
+    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
+
+/*
+ * Only unit-stride addressing implemented; may extend in future.
+ */
+#define tcg_out_opc_ldst_vec(s, opc, vs3_vd, rs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vs3_vd, rs1, 0, vm);
+
 /*
  * TCG intrinsics
  */
@@ -1881,6 +1909,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     }
 }
 
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+                           unsigned vecl, unsigned vece,
+                           const TCGArg args[TCG_MAX_OP_ARGS],
+                           const int const_args[TCG_MAX_OP_ARGS])
+{
+    switch (opc) {
+    case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
+    default:
+        g_assert_not_reached();
+    }
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+                       TCGArg a0, ...)
+{
+    switch (opc) {
+    default:
+        g_assert_not_reached();
+    }
+}
+
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+    switch (opc) {
+    default:
+        return 0;
+    }
+}
+
 static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 {
     switch (op) {
@@ -2096,11 +2154,39 @@ static void tcg_out_tb_start(TCGContext *s)
     /* nothing to do */
 }
 
+static void riscv_get_vlenb(void){
+    /* Get vlenb for Vector: csrrs %0, vlenb, zero. */
+    asm volatile("csrrs %0, 0xc22, x0" : "=r"(riscv_vlen));
+    riscv_vlen *= 8;
+}
+
 static void tcg_target_init(TCGContext *s)
 {
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
 
+    if (cpuinfo & CPUINFO_ZVE64X) {
+        /* We need to get vlenb for vector's extension */
+        riscv_get_vlenb();
+        tcg_debug_assert(riscv_vlen >= 64 && is_power_of_2(riscv_vlen));
+
+        if (riscv_vlen >= 256) {
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+        } else if (riscv_vlen == 128) {
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
+        } else if (riscv_vlen == 64) {
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
+        } else {
+            g_assert_not_reached();
+        }
+    }
+
     tcg_target_call_clobber_regs = -1u;
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S0);
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S1);
@@ -2123,6 +2209,7 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_V0);
 }
 
 typedef struct {
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1a347eaf6e..12a7a37aaa 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -28,42 +28,28 @@
 #include "host/cpuinfo.h"
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 typedef enum {
-    TCG_REG_ZERO,
-    TCG_REG_RA,
-    TCG_REG_SP,
-    TCG_REG_GP,
-    TCG_REG_TP,
-    TCG_REG_T0,
-    TCG_REG_T1,
-    TCG_REG_T2,
-    TCG_REG_S0,
-    TCG_REG_S1,
-    TCG_REG_A0,
-    TCG_REG_A1,
-    TCG_REG_A2,
-    TCG_REG_A3,
-    TCG_REG_A4,
-    TCG_REG_A5,
-    TCG_REG_A6,
-    TCG_REG_A7,
-    TCG_REG_S2,
-    TCG_REG_S3,
-    TCG_REG_S4,
-    TCG_REG_S5,
-    TCG_REG_S6,
-    TCG_REG_S7,
-    TCG_REG_S8,
-    TCG_REG_S9,
-    TCG_REG_S10,
-    TCG_REG_S11,
-    TCG_REG_T3,
-    TCG_REG_T4,
-    TCG_REG_T5,
-    TCG_REG_T6,
+    TCG_REG_ZERO, TCG_REG_RA,  TCG_REG_SP,  TCG_REG_GP,
+    TCG_REG_TP,   TCG_REG_T0,  TCG_REG_T1,  TCG_REG_T2,
+    TCG_REG_S0,   TCG_REG_S1,  TCG_REG_A0,  TCG_REG_A1,
+    TCG_REG_A2,   TCG_REG_A3,  TCG_REG_A4,  TCG_REG_A5,
+    TCG_REG_A6,   TCG_REG_A7,  TCG_REG_S2,  TCG_REG_S3,
+    TCG_REG_S4,   TCG_REG_S5,  TCG_REG_S6,  TCG_REG_S7,
+    TCG_REG_S8,   TCG_REG_S9,  TCG_REG_S10, TCG_REG_S11,
+    TCG_REG_T3,   TCG_REG_T4,  TCG_REG_T5,  TCG_REG_T6,
+
+    /* RISC-V V Extension registers */
+    TCG_REG_V0,   TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+    TCG_REG_V4,   TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
+    TCG_REG_V8,   TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+    TCG_REG_V12,  TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+    TCG_REG_V16,  TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+    TCG_REG_V20,  TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+    TCG_REG_V24,  TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+    TCG_REG_V28,  TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 
     /* aliases */
     TCG_AREG0          = TCG_REG_S0,
@@ -156,6 +142,32 @@ typedef enum {
 
 #define TCG_TARGET_HAS_tst              0
 
+/* vector instructions */
+#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v128             0
+#define TCG_TARGET_HAS_v256             0
+#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_nand_vec         0
+#define TCG_TARGET_HAS_nor_vec          0
+#define TCG_TARGET_HAS_eqv_vec          0
+#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
+#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_shs_vec          0
+#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
+
+#define TCG_TARGET_HAS_tst_vec          0
+
 #define TCG_TARGET_DEFAULT_MO (0)
 
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
new file mode 100644
index 0000000000..b80b39e1e5
--- /dev/null
+++ b/tcg/riscv/tcg-target.opc.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (c) C-SKY Microsystems Co., Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ *
+ * See the COPYING file in the top-level directory for details.
+ *
+ * Target-specific opcodes for host vector expansion.  These will be
+ * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 04/15] tcg/riscv: Add basic support for vector
  2024-08-13 11:34 ` [PATCH v1 04/15] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-08-13 12:19   ` Richard Henderson
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Henderson @ 2024-08-13 12:19 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, Swung0x48, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> +    if (cpuinfo & CPUINFO_ZVE64X) {
> +        /* We need to get vlenb for vector's extension */
> +        riscv_get_vlenb();
> +        tcg_debug_assert(riscv_vlen >= 64 && is_power_of_2(riscv_vlen));
> +
> +        if (riscv_vlen >= 256) {
> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
> +        } else if (riscv_vlen == 128) {
> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
> +        } else if (riscv_vlen == 64) {
> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
> +        } else {
> +            g_assert_not_reached();
> +        }
> +    }

I think this is over-complicated, and perhaps the reason for patch 3.

What I believe you're missing with patch 3 is the fact that when you change the lmul, 
adjacent vector registers get clobbered, and the tcg register allocator does not expect 
that.  This will result in incorrect register allocation.

You need to pick one size at startup, and expose *only* those registers.

This won't affect code generation much, because we never have heavy vector register 
pressure.  Mostly values go out of scope at the end of every guest instruction.  So having 
only 8 or 16 visible host registers instead of 32 isn't a big deal.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (3 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 04/15] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  8:24   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 06/15] tcg/riscv: Implement vector load/store LIU Zhiwei
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

In RISC-V, vector operations require initial configuration using
the vset{i}vl{i} instruction.

This instruction:
  1. Sets the vector length (vl) in bytes
  2. Configures the vtype register, which includes:
    SEW (Single Element Width)
    LMUL (vector register group multiplier)
    Other vector operation parameters

This configuration is crucial for defining subsequent vector
operation behavior. To optimize performance, the configuration
process is managed dynamically:
  1. Reconfiguration using vset{i}vl{i} is necessary when SEW
     or vector register group width changes.
  2. The vset instruction can be omitted when configuration
     remains unchanged.

This optimization is only effective within a single TB.
Each TB requires reconfiguration at its start, as the current
state cannot be obtained from hardware.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: Weiwei Li <liwei1518@gmail.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 121 +++++++++++++++++++++++++++++++++++++
 1 file changed, 121 insertions(+)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index ca9bafcb3c..d17f523187 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -167,6 +167,18 @@ static bool tcg_target_const_match(int64_t val, int ct,
  * RISC-V Base ISA opcodes (IM)
  */
 
+#define V_OPIVV (0x0 << 12)
+#define V_OPFVV (0x1 << 12)
+#define V_OPMVV (0x2 << 12)
+#define V_OPIVI (0x3 << 12)
+#define V_OPIVX (0x4 << 12)
+#define V_OPFVF (0x5 << 12)
+#define V_OPMVX (0x6 << 12)
+#define V_OPCFG (0x7 << 12)
+
+#define V_SUMOP (0x0 << 20)
+#define V_LUMOP (0x0 << 20)
+
 typedef enum {
     OPC_ADD = 0x33,
     OPC_ADDI = 0x13,
@@ -262,6 +274,11 @@ typedef enum {
     /* Zicond: integer conditional operations */
     OPC_CZERO_EQZ = 0x0e005033,
     OPC_CZERO_NEZ = 0x0e007033,
+
+    /* V: Vector extension 1.0 */
+    OPC_VSETVLI  = 0x57 | V_OPCFG,
+    OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
+    OPC_VSETVL   = 0x80000057 | V_OPCFG,
 } RISCVInsn;
 
 /*
@@ -354,6 +371,42 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
     return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
 }
 
+typedef enum {
+    VTA_TU = 0,
+    VTA_TA,
+} RISCVVta;
+
+typedef enum {
+    VMA_MU = 0,
+    VMA_MA,
+} RISCVVma;
+
+typedef enum {
+    VSEW_E8 = 0, /* EW=8b */
+    VSEW_E16,    /* EW=16b */
+    VSEW_E32,    /* EW=32b */
+    VSEW_E64,    /* EW=64b */
+} RISCVVsew;
+
+typedef enum {
+    VLMUL_M1 = 0, /* LMUL=1 */
+    VLMUL_M2,     /* LMUL=2 */
+    VLMUL_M4,     /* LMUL=4 */
+    VLMUL_M8,     /* LMUL=8 */
+    VLMUL_RESERVED,
+    VLMUL_MF8,    /* LMUL=1/8 */
+    VLMUL_MF4,    /* LMUL=1/4 */
+    VLMUL_MF2,    /* LMUL=1/2 */
+} RISCVVlmul;
+#define LMUL_MAX 8
+
+static int32_t encode_vtype(RISCVVta vta, RISCVVma vma,
+                            RISCVVsew vsew, RISCVVlmul vlmul)
+{
+    return (vma & 0x1) << 7 | (vta & 0x1) << 6 | (vsew & 0x7) << 3 |
+           (vlmul & 0x7);
+}
+
 /*
  * RISC-V instruction emitters
  */
@@ -483,6 +536,12 @@ static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
     tcg_out32(s, encode_r(opc, rd, (imm & 0x1f), vs2) | (vm << 25));
 }
 
+static void tcg_out_opc_vec_config(TCGContext *s, RISCVInsn opc,
+                                  TCGReg rd, uint32_t avl, int32_t vtypei)
+{
+    tcg_out32(s, encode_i(opc, rd, avl, vtypei));
+}
+
 /* vm=0 (vm = false) means vector masking ENABLED. */
 #define tcg_out_opc_vv(s, opc, vd, vs2, vs1, vm) \
     tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
@@ -497,12 +556,68 @@ static void tcg_out_opc_reg_vec_i(TCGContext *s, RISCVInsn opc,
 #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
     tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
 
+#define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
+    tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);
+
 /*
  * Only unit-stride addressing implemented; may extend in future.
  */
 #define tcg_out_opc_ldst_vec(s, opc, vs3_vd, rs1, vm) \
     tcg_out_opc_reg_vec(s, opc, vs3_vd, rs1, 0, vm);
 
+static void tcg_out_vsetvl(TCGContext *s, uint32_t avl, RISCVVta vta,
+                           RISCVVma vma, RISCVVsew vsew,
+                           RISCVVlmul vlmul)
+{
+    int32_t vtypei = encode_vtype(vta, vma, vsew, vlmul);
+
+    if (avl < 32) {
+        tcg_out_opc_vconfig(s, OPC_VSETIVLI, TCG_REG_ZERO, avl, vtypei);
+    } else {
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
+        tcg_out_opc_vconfig(s, OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtypei);
+    }
+}
+
+/*
+ * TODO: If the vtype value is not supported by the implementation,
+ * then the vill bit is set in vtype, the remaining bits in
+ * vtype are set to zero, and the vl register is also set to zero
+ */
+
+static __thread unsigned prev_size;
+static __thread unsigned prev_vece = MO_8;
+static __thread bool vec_vtpye_init = true;
+
+#define get_vlmax(vsew) (riscv_vlen / (8 << vsew) * (LMUL_MAX))
+#define get_vec_type_bytes(type)    (type >= TCG_TYPE_V64 ? \
+                                    (8 << (type - TCG_TYPE_V64)) : 0)
+#define encode_lmul(oprsz, vlenb)   (ctzl(oprsz / vlenb))
+
+static inline void tcg_target_set_vec_config(TCGContext *s, TCGType type,
+                                      unsigned vece)
+{
+    unsigned oprsz = get_vec_type_bytes(type);
+
+    if (!vec_vtpye_init && (prev_size == oprsz && prev_vece == vece)) {
+        return ;
+    }
+
+    RISCVVsew vsew = vece - MO_8 + VSEW_E8;
+    unsigned avl = oprsz / (1 << vece);
+    unsigned vlenb = riscv_vlen / 8;
+    RISCVVlmul lmul = oprsz > vlenb ?
+                      encode_lmul(oprsz, vlenb) : VLMUL_M1;
+    tcg_debug_assert(avl <= get_vlmax(vsew));
+    tcg_debug_assert(lmul <= VLMUL_RESERVED);
+
+    prev_size = oprsz;
+    prev_vece = vece;
+    vec_vtpye_init = false;
+    tcg_out_vsetvl(s, avl, VTA_TA, VMA_MA, vsew, lmul);
+    return ;
+}
+
 /*
  * TCG intrinsics
  */
@@ -1914,6 +2029,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const TCGArg args[TCG_MAX_OP_ARGS],
                            const int const_args[TCG_MAX_OP_ARGS])
 {
+    TCGType type = vecl + TCG_TYPE_V64;
+
+    if (vec_vtpye_init) {
+        tcg_target_set_vec_config(s, type, vece);
+    }
     switch (opc) {
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
@@ -2151,6 +2271,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 static void tcg_out_tb_start(TCGContext *s)
 {
+    vec_vtpye_init = true;
     /* nothing to do */
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support
  2024-08-13 11:34 ` [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
@ 2024-08-14  8:24   ` Richard Henderson
  2024-08-19  1:34     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  8:24 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> In RISC-V, vector operations require initial configuration using
> the vset{i}vl{i} instruction.
> 
> This instruction:
>    1. Sets the vector length (vl) in bytes
>    2. Configures the vtype register, which includes:
>      SEW (Single Element Width)
>      LMUL (vector register group multiplier)
>      Other vector operation parameters
> 
> This configuration is crucial for defining subsequent vector
> operation behavior. To optimize performance, the configuration
> process is managed dynamically:
>    1. Reconfiguration using vset{i}vl{i} is necessary when SEW
>       or vector register group width changes.
>    2. The vset instruction can be omitted when configuration
>       remains unchanged.
> 
> This optimization is only effective within a single TB.
> Each TB requires reconfiguration at its start, as the current
> state cannot be obtained from hardware.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Signed-off-by: Weiwei Li <liwei1518@gmail.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.c.inc | 121 +++++++++++++++++++++++++++++++++++++
>   1 file changed, 121 insertions(+)
> 
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index ca9bafcb3c..d17f523187 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -167,6 +167,18 @@ static bool tcg_target_const_match(int64_t val, int ct,
>    * RISC-V Base ISA opcodes (IM)
>    */
>   
> +#define V_OPIVV (0x0 << 12)
> +#define V_OPFVV (0x1 << 12)
> +#define V_OPMVV (0x2 << 12)
> +#define V_OPIVI (0x3 << 12)
> +#define V_OPIVX (0x4 << 12)
> +#define V_OPFVF (0x5 << 12)
> +#define V_OPMVX (0x6 << 12)
> +#define V_OPCFG (0x7 << 12)
> +
> +#define V_SUMOP (0x0 << 20)
> +#define V_LUMOP (0x0 << 20)
> +
>   typedef enum {
>       OPC_ADD = 0x33,
>       OPC_ADDI = 0x13,
> @@ -262,6 +274,11 @@ typedef enum {
>       /* Zicond: integer conditional operations */
>       OPC_CZERO_EQZ = 0x0e005033,
>       OPC_CZERO_NEZ = 0x0e007033,
> +
> +    /* V: Vector extension 1.0 */
> +    OPC_VSETVLI  = 0x57 | V_OPCFG,
> +    OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
> +    OPC_VSETVL   = 0x80000057 | V_OPCFG,
>   } RISCVInsn;
>   
>   /*
> @@ -354,6 +371,42 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
>       return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
>   }
>   
> +typedef enum {
> +    VTA_TU = 0,
> +    VTA_TA,
> +} RISCVVta;
> +
> +typedef enum {
> +    VMA_MU = 0,
> +    VMA_MA,
> +} RISCVVma;
> +
> +typedef enum {
> +    VSEW_E8 = 0, /* EW=8b */
> +    VSEW_E16,    /* EW=16b */
> +    VSEW_E32,    /* EW=32b */
> +    VSEW_E64,    /* EW=64b */
> +} RISCVVsew;

This exactly aligns with MemOp and vece.  Do we really need an enum for this?

> +
> +typedef enum {
> +    VLMUL_M1 = 0, /* LMUL=1 */
> +    VLMUL_M2,     /* LMUL=2 */
> +    VLMUL_M4,     /* LMUL=4 */
> +    VLMUL_M8,     /* LMUL=8 */
> +    VLMUL_RESERVED,
> +    VLMUL_MF8,    /* LMUL=1/8 */
> +    VLMUL_MF4,    /* LMUL=1/4 */
> +    VLMUL_MF2,    /* LMUL=1/2 */
> +} RISCVVlmul;
> +#define LMUL_MAX 8
> +
> +static int32_t encode_vtype(RISCVVta vta, RISCVVma vma,
> +                            RISCVVsew vsew, RISCVVlmul vlmul)
> +{
> +    return (vma & 0x1) << 7 | (vta & 0x1) << 6 | (vsew & 0x7) << 3 |
> +           (vlmul & 0x7);
> +}

> +static void tcg_out_vsetvl(TCGContext *s, uint32_t avl, RISCVVta vta,
> +                           RISCVVma vma, RISCVVsew vsew,
> +                           RISCVVlmul vlmul)
> +{
> +    int32_t vtypei = encode_vtype(vta, vma, vsew, vlmul);
> +
> +    if (avl < 32) {
> +        tcg_out_opc_vconfig(s, OPC_VSETIVLI, TCG_REG_ZERO, avl, vtypei);
> +    } else {
> +        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
> +        tcg_out_opc_vconfig(s, OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtypei);
> +    }
> +}
> +
> +/*
> + * TODO: If the vtype value is not supported by the implementation,
> + * then the vill bit is set in vtype, the remaining bits in
> + * vtype are set to zero, and the vl register is also set to zero
> + */
> +
> +static __thread unsigned prev_size;
> +static __thread unsigned prev_vece = MO_8;
> +static __thread bool vec_vtpye_init = true;

Typo in vtpye.

That said, init should be redundant.  I think you only need one variable here:

   static __thread int prev_vtype;

Since any vtype < 0 is vill, the "uninitialized" value is easily -1.

> +static inline void tcg_target_set_vec_config(TCGContext *s, TCGType type,
> +                                      unsigned vece)
> +{
> +    unsigned oprsz = get_vec_type_bytes(type);
> +
> +    if (!vec_vtpye_init && (prev_size == oprsz && prev_vece == vece)) {
> +        return ;
> +    }

     int vtype = encode_vtype(...);
     if (vtype != prev_vtype) {
         prev_vtype = vtype;
         tcg_out_vsetvl(s, vtype);
     }

> @@ -1914,6 +2029,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>                              const TCGArg args[TCG_MAX_OP_ARGS],
>                              const int const_args[TCG_MAX_OP_ARGS])
>   {
> +    TCGType type = vecl + TCG_TYPE_V64;
> +
> +    if (vec_vtpye_init) {
> +        tcg_target_set_vec_config(s, type, vece);
> +    }

Here is perhaps too early... see patch 8 re logicals.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support
  2024-08-14  8:24   ` Richard Henderson
@ 2024-08-19  1:34     ` LIU Zhiwei
  2024-08-19  2:35       ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-19  1:34 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 16:24, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>>
>> In RISC-V, vector operations require initial configuration using
>> the vset{i}vl{i} instruction.
>>
>> This instruction:
>>    1. Sets the vector length (vl) in bytes
>>    2. Configures the vtype register, which includes:
>>      SEW (Single Element Width)
>>      LMUL (vector register group multiplier)
>>      Other vector operation parameters
>>
>> This configuration is crucial for defining subsequent vector
>> operation behavior. To optimize performance, the configuration
>> process is managed dynamically:
>>    1. Reconfiguration using vset{i}vl{i} is necessary when SEW
>>       or vector register group width changes.
>>    2. The vset instruction can be omitted when configuration
>>       remains unchanged.
>>
>> This optimization is only effective within a single TB.
>> Each TB requires reconfiguration at its start, as the current
>> state cannot be obtained from hardware.
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Signed-off-by: Weiwei Li <liwei1518@gmail.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>> ---
>>   tcg/riscv/tcg-target.c.inc | 121 +++++++++++++++++++++++++++++++++++++
>>   1 file changed, 121 insertions(+)
>>
>> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
>> index ca9bafcb3c..d17f523187 100644
>> --- a/tcg/riscv/tcg-target.c.inc
>> +++ b/tcg/riscv/tcg-target.c.inc
>> @@ -167,6 +167,18 @@ static bool tcg_target_const_match(int64_t val, 
>> int ct,
>>    * RISC-V Base ISA opcodes (IM)
>>    */
>>   +#define V_OPIVV (0x0 << 12)
>> +#define V_OPFVV (0x1 << 12)
>> +#define V_OPMVV (0x2 << 12)
>> +#define V_OPIVI (0x3 << 12)
>> +#define V_OPIVX (0x4 << 12)
>> +#define V_OPFVF (0x5 << 12)
>> +#define V_OPMVX (0x6 << 12)
>> +#define V_OPCFG (0x7 << 12)
>> +
>> +#define V_SUMOP (0x0 << 20)
>> +#define V_LUMOP (0x0 << 20)
>> +
>>   typedef enum {
>>       OPC_ADD = 0x33,
>>       OPC_ADDI = 0x13,
>> @@ -262,6 +274,11 @@ typedef enum {
>>       /* Zicond: integer conditional operations */
>>       OPC_CZERO_EQZ = 0x0e005033,
>>       OPC_CZERO_NEZ = 0x0e007033,
>> +
>> +    /* V: Vector extension 1.0 */
>> +    OPC_VSETVLI  = 0x57 | V_OPCFG,
>> +    OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
>> +    OPC_VSETVL   = 0x80000057 | V_OPCFG,
>>   } RISCVInsn;
>>     /*
>> @@ -354,6 +371,42 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg 
>> rd, uint32_t imm)
>>       return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
>>   }
>>   +typedef enum {
>> +    VTA_TU = 0,
>> +    VTA_TA,
>> +} RISCVVta;
>> +
>> +typedef enum {
>> +    VMA_MU = 0,
>> +    VMA_MA,
>> +} RISCVVma;
>> +
>> +typedef enum {
>> +    VSEW_E8 = 0, /* EW=8b */
>> +    VSEW_E16,    /* EW=16b */
>> +    VSEW_E32,    /* EW=32b */
>> +    VSEW_E64,    /* EW=64b */
>> +} RISCVVsew;
>
> This exactly aligns with MemOp and vece.  Do we really need an enum 
> for this?
OK. We will use MemOp enum next version.
>
>> +
>> +typedef enum {
>> +    VLMUL_M1 = 0, /* LMUL=1 */
>> +    VLMUL_M2,     /* LMUL=2 */
>> +    VLMUL_M4,     /* LMUL=4 */
>> +    VLMUL_M8,     /* LMUL=8 */
>> +    VLMUL_RESERVED,
>> +    VLMUL_MF8,    /* LMUL=1/8 */
>> +    VLMUL_MF4,    /* LMUL=1/4 */
>> +    VLMUL_MF2,    /* LMUL=1/2 */
>> +} RISCVVlmul;
>> +#define LMUL_MAX 8
>> +
>> +static int32_t encode_vtype(RISCVVta vta, RISCVVma vma,
>> +                            RISCVVsew vsew, RISCVVlmul vlmul)
>> +{
>> +    return (vma & 0x1) << 7 | (vta & 0x1) << 6 | (vsew & 0x7) << 3 |
>> +           (vlmul & 0x7);
>> +}
>
>> +static void tcg_out_vsetvl(TCGContext *s, uint32_t avl, RISCVVta vta,
>> +                           RISCVVma vma, RISCVVsew vsew,
>> +                           RISCVVlmul vlmul)
>> +{
>> +    int32_t vtypei = encode_vtype(vta, vma, vsew, vlmul);
>> +
>> +    if (avl < 32) {
>> +        tcg_out_opc_vconfig(s, OPC_VSETIVLI, TCG_REG_ZERO, avl, 
>> vtypei);
>> +    } else {
>> +        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
>> +        tcg_out_opc_vconfig(s, OPC_VSETVLI, TCG_REG_ZERO, 
>> TCG_REG_TMP0, vtypei);
>> +    }
>> +}
>> +
>> +/*
>> + * TODO: If the vtype value is not supported by the implementation,
>> + * then the vill bit is set in vtype, the remaining bits in
>> + * vtype are set to zero, and the vl register is also set to zero
>> + */
>> +
>> +static __thread unsigned prev_size;
>> +static __thread unsigned prev_vece = MO_8;
>> +static __thread bool vec_vtpye_init = true;
>
> Typo in vtpye.
OK.
>
> That said, init should be redundant.  I think you only need one 
> variable here:
>
>   static __thread int prev_vtype;
Agree.
>
> Since any vtype < 0 is vill, the "uninitialized" value is easily -1.
OK. We will set it to -1 in tcg_out_tb_start.
>
>> +static inline void tcg_target_set_vec_config(TCGContext *s, TCGType 
>> type,
>> +                                      unsigned vece)
>> +{
>> +    unsigned oprsz = get_vec_type_bytes(type);
>> +
>> +    if (!vec_vtpye_init && (prev_size == oprsz && prev_vece == vece)) {
>> +        return ;
>> +    }
>
>     int vtype = encode_vtype(...);
>     if (vtype != prev_vtype) {
>         prev_vtype = vtype;
>         tcg_out_vsetvl(s, vtype);
>     }
>
>> @@ -1914,6 +2029,11 @@ static void tcg_out_vec_op(TCGContext *s, 
>> TCGOpcode opc,
>>                              const TCGArg args[TCG_MAX_OP_ARGS],
>>                              const int const_args[TCG_MAX_OP_ARGS])
>>   {
>> +    TCGType type = vecl + TCG_TYPE_V64;
>> +
>> +    if (vec_vtpye_init) {
>> +        tcg_target_set_vec_config(s, type, vece);
>> +    }
>
> Here is perhaps too early... see patch 8 re logicals.

I guess you mean we don't have implemented any vector op, so there is no 
need to set vsetvl in this patch. We will postpone it do really ops need it.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support
  2024-08-19  1:34     ` LIU Zhiwei
@ 2024-08-19  2:35       ` Richard Henderson
  2024-08-19  2:53         ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-19  2:35 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/19/24 11:34, LIU Zhiwei wrote:
>>> @@ -1914,6 +2029,11 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>>>                              const TCGArg args[TCG_MAX_OP_ARGS],
>>>                              const int const_args[TCG_MAX_OP_ARGS])
>>>   {
>>> +    TCGType type = vecl + TCG_TYPE_V64;
>>> +
>>> +    if (vec_vtpye_init) {
>>> +        tcg_target_set_vec_config(s, type, vece);
>>> +    }
>>
>> Here is perhaps too early... see patch 8 re logicals.
> 
> I guess you mean we don't have implemented any vector op, so there is no need to set 
> vsetvl in this patch. We will postpone it do really ops need it.

What I meant is "too early in the function", i.e. before the switch.

Per my comment in patch 8, there are some vector ops that are agnostic to type and only 
care about length.  Thus perhaps

   switch (op) {
   case xxx:
     set_vec_config_len(s, type);
     something;

   case yyy:
     set_vec_config_len_elt(s, type, vece);
     something_else;

   ...
   }

Or some other structure that makes sense.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support
  2024-08-19  2:35       ` Richard Henderson
@ 2024-08-19  2:53         ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-19  2:53 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/19 10:35, Richard Henderson wrote:
> On 8/19/24 11:34, LIU Zhiwei wrote:
>>>> @@ -1914,6 +2029,11 @@ static void tcg_out_vec_op(TCGContext *s, 
>>>> TCGOpcode opc,
>>>>                              const TCGArg args[TCG_MAX_OP_ARGS],
>>>>                              const int const_args[TCG_MAX_OP_ARGS])
>>>>   {
>>>> +    TCGType type = vecl + TCG_TYPE_V64;
>>>> +
>>>> +    if (vec_vtpye_init) {
>>>> +        tcg_target_set_vec_config(s, type, vece);
>>>> +    }
>>>
>>> Here is perhaps too early... see patch 8 re logicals.
>>
>> I guess you mean we don't have implemented any vector op, so there is 
>> no need to set vsetvl in this patch. We will postpone it do really 
>> ops need it.
>
> What I meant is "too early in the function", i.e. before the switch.
>
> Per my comment in patch 8, there are some vector ops that are agnostic 
> to type and only care about length.  Thus perhaps
>
>   switch (op) {
>   case xxx:
>     set_vec_config_len(s, type);
>     something;
>
>   case yyy:
>     set_vec_config_len_elt(s, type, vece);
>     something_else;
>
>   ...
>   }
>
> Or some other structure that makes sense.

Thanks for clarify once again. It's much better to explicitly have two 
API types.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 06/15] tcg/riscv: Implement vector load/store
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (4 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  9:01   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  2 +
 tcg/riscv/tcg-target.c.inc     | 92 ++++++++++++++++++++++++++++++++--
 2 files changed, 91 insertions(+), 3 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index aac5ceee2b..d73a62b0f2 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -21,3 +21,5 @@ C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
+C_O0_I2(v, r)
+C_O1_I1(v, r)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index d17f523187..f17d679d71 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -279,6 +279,15 @@ typedef enum {
     OPC_VSETVLI  = 0x57 | V_OPCFG,
     OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
     OPC_VSETVL   = 0x80000057 | V_OPCFG,
+
+    OPC_VLE8_V  = 0x7 | V_LUMOP,
+    OPC_VLE16_V = 0x5007 | V_LUMOP,
+    OPC_VLE32_V = 0x6007 | V_LUMOP,
+    OPC_VLE64_V = 0x7007 | V_LUMOP,
+    OPC_VSE8_V  = 0x27 | V_SUMOP,
+    OPC_VSE16_V = 0x5027 | V_SUMOP,
+    OPC_VSE32_V = 0x6027 | V_SUMOP,
+    OPC_VSE64_V = 0x7027 | V_SUMOP,
 } RISCVInsn;
 
 /*
@@ -810,6 +819,13 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
     case OPC_SD:
         tcg_out_opc_store(s, opc, addr, data, imm12);
         break;
+    case OPC_VSE8_V:
+    case OPC_VSE16_V:
+    case OPC_VSE32_V:
+    case OPC_VSE64_V:
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, imm12);
+        tcg_out_opc_ldst_vec(s, opc, data, TCG_REG_TMP0, true);
+        break;
     case OPC_LB:
     case OPC_LBU:
     case OPC_LH:
@@ -819,6 +835,13 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
     case OPC_LD:
         tcg_out_opc_imm(s, opc, data, addr, imm12);
         break;
+    case OPC_VLE8_V:
+    case OPC_VLE16_V:
+    case OPC_VLE32_V:
+    case OPC_VLE64_V:
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, imm12);
+        tcg_out_opc_ldst_vec(s, opc, data, TCG_REG_TMP0, true);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                        TCGReg arg1, intptr_t arg2)
 {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    } else {
+        tcg_debug_assert(arg >= TCG_REG_V1);
+        switch (prev_vece) {
+        case MO_8:
+            insn = OPC_VLE8_V;
+            break;
+        case MO_16:
+            insn = OPC_VLE16_V;
+            break;
+        case MO_32:
+            insn = OPC_VLE32_V;
+            break;
+        case MO_64:
+            insn = OPC_VLE64_V;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
     tcg_out_ldst(s, insn, arg, arg1, arg2);
 }
 
 static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                        TCGReg arg1, intptr_t arg2)
 {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_SW : OPC_SD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = type == TCG_TYPE_I32 ? OPC_SW : OPC_SD;
+        tcg_out_ldst(s, insn, arg, arg1, arg2);
+    } else {
+        tcg_debug_assert(arg >= TCG_REG_V1);
+        switch (prev_vece) {
+        case MO_8:
+            insn = OPC_VSE8_V;
+            break;
+        case MO_16:
+            insn = OPC_VSE16_V;
+            break;
+        case MO_32:
+            insn = OPC_VSE32_V;
+            break;
+        case MO_64:
+            insn = OPC_VSE64_V;
+            break;
+        default:
+            g_assert_not_reached();
+        }
+    }
     tcg_out_ldst(s, insn, arg, arg1, arg2);
 }
 
@@ -2030,11 +2098,25 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const int const_args[TCG_MAX_OP_ARGS])
 {
     TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0, a1, a2;
+
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
 
-    if (vec_vtpye_init) {
+    if (!vec_vtpye_init &&
+        (opc == INDEX_op_ld_vec || opc == INDEX_op_st_vec)) {
+        tcg_target_set_vec_config(s, type, prev_vece);
+    } else {
         tcg_target_set_vec_config(s, type, vece);
     }
     switch (opc) {
+    case INDEX_op_ld_vec:
+        tcg_out_ld(s, type, a0, a1, a2);
+        break;
+    case INDEX_op_st_vec:
+        tcg_out_st(s, type, a0, a1, a2);
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2198,6 +2280,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_a64_i64:
         return C_O0_I2(rZ, r);
 
+    case INDEX_op_st_vec:
+        return C_O0_I2(v, r);
+    case INDEX_op_ld_vec:
+        return C_O1_I1(v, r);
     default:
         g_assert_not_reached();
     }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 06/15] tcg/riscv: Implement vector load/store
  2024-08-13 11:34 ` [PATCH v1 06/15] tcg/riscv: Implement vector load/store LIU Zhiwei
@ 2024-08-14  9:01   ` Richard Henderson
  2024-08-19  1:41     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:01 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> @@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
>   static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>                          TCGReg arg1, intptr_t arg2)
>   {
> -    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
> +    RISCVInsn insn;
> +
> +    if (type < TCG_TYPE_V64) {
> +        insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
> +    } else {
> +        tcg_debug_assert(arg >= TCG_REG_V1);
> +        switch (prev_vece) {
> +        case MO_8:
> +            insn = OPC_VLE8_V;
> +            break;
> +        case MO_16:
> +            insn = OPC_VLE16_V;
> +            break;
> +        case MO_32:
> +            insn = OPC_VLE32_V;
> +            break;
> +        case MO_64:
> +            insn = OPC_VLE64_V;
> +            break;
> +        default:
> +            g_assert_not_reached();
> +        }
> +    }
>       tcg_out_ldst(s, insn, arg, arg1, arg2);

tcg_out_ld/st are called directly from register allocation spill/fill.
You'll need to set vtype here, and cannot rely on this having been done in tcg_out_vec_op.

That said, with a little-endian host, the selected element size doesn't matter *too* much. 
  A write of 8 uint16_t or a write of 2 uint64_t produces the same bits in memory.

Therefore you can examine prev_vtype and adjust only if the vector length changes.  But we 
do that -- e.g. load V256, store V256, store V128 to perform a 384-bit store for AArch64 
SVE when VQ=3.

Is there an advantage to using the vector load/store whole register insns, if the 
requested length is not too small?  IIRC the NF field can be used to store multiples, but 
we can't store half of a register with these.

r~

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 06/15] tcg/riscv: Implement vector load/store
  2024-08-14  9:01   ` Richard Henderson
@ 2024-08-19  1:41     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-19  1:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:01, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> @@ -827,14 +850,59 @@ static void tcg_out_ldst(TCGContext *s, 
>> RISCVInsn opc, TCGReg data,
>>   static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>>                          TCGReg arg1, intptr_t arg2)
>>   {
>> -    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
>> +    RISCVInsn insn;
>> +
>> +    if (type < TCG_TYPE_V64) {
>> +        insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
>> +    } else {
>> +        tcg_debug_assert(arg >= TCG_REG_V1);
>> +        switch (prev_vece) {
>> +        case MO_8:
>> +            insn = OPC_VLE8_V;
>> +            break;
>> +        case MO_16:
>> +            insn = OPC_VLE16_V;
>> +            break;
>> +        case MO_32:
>> +            insn = OPC_VLE32_V;
>> +            break;
>> +        case MO_64:
>> +            insn = OPC_VLE64_V;
>> +            break;
>> +        default:
>> +            g_assert_not_reached();
>> +        }
>> +    }
>>       tcg_out_ldst(s, insn, arg, arg1, arg2);
>
> tcg_out_ld/st are called directly from register allocation spill/fill.
> You'll need to set vtype here, and cannot rely on this having been 
> done in tcg_out_vec_op.
OK.
>
> That said, with a little-endian host, the selected element size 
> doesn't matter *too* much.  A write of 8 uint16_t or a write of 2 
> uint64_t produces the same bits in memory.
>
> Therefore you can examine prev_vtype and adjust only if the vector 
> length changes.
OK.
>   But we do that -- e.g. load V256, store V256, store V128 to perform 
> a 384-bit store for AArch64 SVE when VQ=3.
>
> Is there an advantage to using the vector load/store whole register 
> insns, if the requested length is not too small?
For vector type equal or bigger than vlen in host, we will use the whole 
register instructions.
>   IIRC the NF field can be used to store multiples, but we can't store 
> half of a register with these.

I think we can still use the unit-stride instructions for them.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (5 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 06/15] tcg/riscv: Implement vector load/store LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  9:11   ` Richard Henderson
  2024-08-20  9:00   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 43 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index f17d679d71..f60913e805 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -288,6 +288,10 @@ typedef enum {
     OPC_VSE16_V = 0x5027 | V_SUMOP,
     OPC_VSE32_V = 0x6027 | V_SUMOP,
     OPC_VSE64_V = 0x7027 | V_SUMOP,
+
+    OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
+    OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
+    OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
 } RISCVInsn;
 
 /*
@@ -641,6 +645,13 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     case TCG_TYPE_I64:
         tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
         break;
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0);
+        tcg_target_set_vec_config(s, type, prev_vece);
+        tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -977,6 +988,33 @@ static void tcg_out_addsub2(TCGContext *s,
     }
 }
 
+static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                                   TCGReg dst, TCGReg src)
+{
+    tcg_target_set_vec_config(s, type, vece);
+    tcg_out_opc_vx(s, OPC_VMV_V_X, dst, TCG_REG_V0, src, true);
+    return true;
+}
+
+static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, TCGReg base, intptr_t offset)
+{
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
+    return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
+
+static inline void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, int64_t arg)
+{
+    if (arg < 16 && arg >= -16) {
+        tcg_target_set_vec_config(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VMV_V_I, dst, TCG_REG_V0, arg, true);
+        return;
+    }
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, arg);
+    tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
+
 static const struct {
     RISCVInsn op;
     bool swap;
@@ -2111,6 +2149,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         tcg_target_set_vec_config(s, type, vece);
     }
     switch (opc) {
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
     case INDEX_op_ld_vec:
         tcg_out_ld(s, type, a0, a1, a2);
         break;
@@ -2282,6 +2323,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_st_vec:
         return C_O0_I2(v, r);
+    case INDEX_op_dup_vec:
+    case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
     default:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-13 11:34 ` [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-08-14  9:11   ` Richard Henderson
  2024-08-15 10:49     ` LIU Zhiwei
  2024-08-20  9:00   ` Richard Henderson
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:11 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> @@ -641,6 +645,13 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>       case TCG_TYPE_I64:
>           tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
>           break;
> +    case TCG_TYPE_V64:
> +    case TCG_TYPE_V128:
> +    case TCG_TYPE_V256:
> +        tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0);
> +        tcg_target_set_vec_config(s, type, prev_vece);
> +        tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);

I suggest these asserts be in tcg_out_opc_*
That way you don't need to replicate to all uses.

> +static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
> +                                   TCGReg dst, TCGReg src)

Oh, please drop all of the inline markup, from all patches.
Let the compiler decide.

> +static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
> +                                    TCGReg dst, TCGReg base, intptr_t offset)
> +{
> +    tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
> +    return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
> +}

Is this really better than using strided load with rs2 = r0?


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-14  9:11   ` Richard Henderson
@ 2024-08-15 10:49     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-15 10:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:11, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> @@ -641,6 +645,13 @@ static bool tcg_out_mov(TCGContext *s, TCGType 
>> type, TCGReg ret, TCGReg arg)
>>       case TCG_TYPE_I64:
>>           tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
>>           break;
>> +    case TCG_TYPE_V64:
>> +    case TCG_TYPE_V128:
>> +    case TCG_TYPE_V256:
>> +        tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0);
>> +        tcg_target_set_vec_config(s, type, prev_vece);
>> +        tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
>
> I suggest these asserts be in tcg_out_opc_*
> That way you don't need to replicate to all uses.
OK.
>
>> +static inline bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
>> unsigned vece,
>> +                                   TCGReg dst, TCGReg src)
>
> Oh, please drop all of the inline markup, from all patches.
> Let the compiler decide.
>
OK.
>> +static inline bool tcg_out_dupm_vec(TCGContext *s, TCGType type, 
>> unsigned vece,
>> +                                    TCGReg dst, TCGReg base, 
>> intptr_t offset)
>> +{
>> +    tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
>> +    return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
>> +}
>
> Is this really better than using strided load with rs2 = r0?

It depends.  For our test board, it is.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-13 11:34 ` [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
  2024-08-14  9:11   ` Richard Henderson
@ 2024-08-20  9:00   ` Richard Henderson
  2024-08-20  9:26     ` LIU Zhiwei
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-20  9:00 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> +    case TCG_TYPE_V64:
> +    case TCG_TYPE_V128:
> +    case TCG_TYPE_V256:
> +        tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0);
> +        tcg_target_set_vec_config(s, type, prev_vece);
> +        tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
> +        break;

Is it worth using whole register move (vmvNr.v) for the appropriate VLEN?


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i}
  2024-08-20  9:00   ` Richard Henderson
@ 2024-08-20  9:26     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-20  9:26 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/20 17:00, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> +    case TCG_TYPE_V64:
>> +    case TCG_TYPE_V128:
>> +    case TCG_TYPE_V256:
>> +        tcg_debug_assert(ret > TCG_REG_V0 && arg > TCG_REG_V0);
>> +        tcg_target_set_vec_config(s, type, prev_vece);
>> +        tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
>> +        break;
>
> Is it worth using whole register move (vmvNr.v) for the appropriate VLEN?

Yes. We will use vmvNr.v next version. Thanks for your suggestion.

Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (6 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  9:13   ` Richard Henderson
  2024-08-14  9:17   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops LIU Zhiwei
                   ` (6 subsequent siblings)
  14 siblings, 2 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 33 +++++++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index d73a62b0f2..8a0de18257 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -23,3 +23,4 @@ C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
 C_O0_I2(v, r)
 C_O1_I1(v, r)
+C_O1_I2(v, v, v)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index f60913e805..650b5eff1a 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -289,6 +289,12 @@ typedef enum {
     OPC_VSE32_V = 0x6027 | V_SUMOP,
     OPC_VSE64_V = 0x7027 | V_SUMOP,
 
+    OPC_VADD_VV = 0x57 | V_OPIVV,
+    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
+    OPC_VAND_VV = 0x24000057 | V_OPIVV,
+    OPC_VOR_VV = 0x28000057 | V_OPIVV,
+    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -2158,6 +2164,21 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_add_vec:
+        tcg_out_opc_vv(s, OPC_VADD_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sub_vec:
+        tcg_out_opc_vv(s, OPC_VSUB_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_and_vec:
+        tcg_out_opc_vv(s, OPC_VAND_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_or_vec:
+        tcg_out_opc_vv(s, OPC_VOR_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_xor_vec:
+        tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2177,6 +2198,12 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 {
     switch (opc) {
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+        return 1;
     default:
         return 0;
     }
@@ -2327,6 +2354,12 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+        return C_O1_I2(v, v, v);
     default:
         g_assert_not_reached();
     }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes
  2024-08-13 11:34 ` [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-08-14  9:13   ` Richard Henderson
  2024-08-20  1:56     ` LIU Zhiwei
  2024-08-14  9:17   ` Richard Henderson
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:13 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-set.h |  1 +
>   tcg/riscv/tcg-target.c.inc     | 33 +++++++++++++++++++++++++++++++++
>   2 files changed, 34 insertions(+)
> 
> diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
> index d73a62b0f2..8a0de18257 100644
> --- a/tcg/riscv/tcg-target-con-set.h
> +++ b/tcg/riscv/tcg-target-con-set.h
> @@ -23,3 +23,4 @@ C_O1_I4(r, r, rI, rM, rM)
>   C_O2_I4(r, r, rZ, rZ, rM, rM)
>   C_O0_I2(v, r)
>   C_O1_I1(v, r)
> +C_O1_I2(v, v, v)
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index f60913e805..650b5eff1a 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -289,6 +289,12 @@ typedef enum {
>       OPC_VSE32_V = 0x6027 | V_SUMOP,
>       OPC_VSE64_V = 0x7027 | V_SUMOP,
>   
> +    OPC_VADD_VV = 0x57 | V_OPIVV,
> +    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
> +    OPC_VAND_VV = 0x24000057 | V_OPIVV,
> +    OPC_VOR_VV = 0x28000057 | V_OPIVV,
> +    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
> +
>       OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
>       OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
>       OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
> @@ -2158,6 +2164,21 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>       case INDEX_op_st_vec:
>           tcg_out_st(s, type, a0, a1, a2);
>           break;
> +    case INDEX_op_add_vec:
> +        tcg_out_opc_vv(s, OPC_VADD_VV, a0, a1, a2, true);
> +        break;
> +    case INDEX_op_sub_vec:
> +        tcg_out_opc_vv(s, OPC_VSUB_VV, a0, a1, a2, true);
> +        break;
> +    case INDEX_op_and_vec:
> +        tcg_out_opc_vv(s, OPC_VAND_VV, a0, a1, a2, true);
> +        break;
> +    case INDEX_op_or_vec:
> +        tcg_out_opc_vv(s, OPC_VOR_VV, a0, a1, a2, true);
> +        break;
> +    case INDEX_op_xor_vec:
> +        tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
> +        break;

As with load/store/move, and/or/xor can avoid changing element type.
Thus I think the vtype setup before the switch is premature.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes
  2024-08-14  9:13   ` Richard Henderson
@ 2024-08-20  1:56     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-20  1:56 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:13, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>> ---
>>   tcg/riscv/tcg-target-con-set.h |  1 +
>>   tcg/riscv/tcg-target.c.inc     | 33 +++++++++++++++++++++++++++++++++
>>   2 files changed, 34 insertions(+)
>>
>> diff --git a/tcg/riscv/tcg-target-con-set.h 
>> b/tcg/riscv/tcg-target-con-set.h
>> index d73a62b0f2..8a0de18257 100644
>> --- a/tcg/riscv/tcg-target-con-set.h
>> +++ b/tcg/riscv/tcg-target-con-set.h
>> @@ -23,3 +23,4 @@ C_O1_I4(r, r, rI, rM, rM)
>>   C_O2_I4(r, r, rZ, rZ, rM, rM)
>>   C_O0_I2(v, r)
>>   C_O1_I1(v, r)
>> +C_O1_I2(v, v, v)
>> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
>> index f60913e805..650b5eff1a 100644
>> --- a/tcg/riscv/tcg-target.c.inc
>> +++ b/tcg/riscv/tcg-target.c.inc
>> @@ -289,6 +289,12 @@ typedef enum {
>>       OPC_VSE32_V = 0x6027 | V_SUMOP,
>>       OPC_VSE64_V = 0x7027 | V_SUMOP,
>>   +    OPC_VADD_VV = 0x57 | V_OPIVV,
>> +    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
>> +    OPC_VAND_VV = 0x24000057 | V_OPIVV,
>> +    OPC_VOR_VV = 0x28000057 | V_OPIVV,
>> +    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
>> +
>>       OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
>>       OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
>>       OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
>> @@ -2158,6 +2164,21 @@ static void tcg_out_vec_op(TCGContext *s, 
>> TCGOpcode opc,
>>       case INDEX_op_st_vec:
>>           tcg_out_st(s, type, a0, a1, a2);
>>           break;
>> +    case INDEX_op_add_vec:
>> +        tcg_out_opc_vv(s, OPC_VADD_VV, a0, a1, a2, true);
>> +        break;
>> +    case INDEX_op_sub_vec:
>> +        tcg_out_opc_vv(s, OPC_VSUB_VV, a0, a1, a2, true);
>> +        break;
>> +    case INDEX_op_and_vec:
>> +        tcg_out_opc_vv(s, OPC_VAND_VV, a0, a1, a2, true);
>> +        break;
>> +    case INDEX_op_or_vec:
>> +        tcg_out_opc_vv(s, OPC_VOR_VV, a0, a1, a2, true);
>> +        break;
>> +    case INDEX_op_xor_vec:
>> +        tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
>> +        break;
>
> As with load/store/move, and/or/xor can avoid changing element type.
> Thus I think the vtype setup before the switch is premature.

Agree. We have implemented this feature on the v2 patch set.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes
  2024-08-13 11:34 ` [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
  2024-08-14  9:13   ` Richard Henderson
@ 2024-08-14  9:17   ` Richard Henderson
  2024-08-20  1:57     ` LIU Zhiwei
  1 sibling, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:17 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> +    OPC_VADD_VV = 0x57 | V_OPIVV,
> +    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
> +    OPC_VAND_VV = 0x24000057 | V_OPIVV,
> +    OPC_VOR_VV = 0x28000057 | V_OPIVV,
> +    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,

Immediate operand variants to be handled as a follow-up?


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes
  2024-08-14  9:17   ` Richard Henderson
@ 2024-08-20  1:57     ` LIU Zhiwei
  2024-08-20  5:14       ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-20  1:57 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:17, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> +    OPC_VADD_VV = 0x57 | V_OPIVV,
>> +    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
>> +    OPC_VAND_VV = 0x24000057 | V_OPIVV,
>> +    OPC_VOR_VV = 0x28000057 | V_OPIVV,
>> +    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
>
> Immediate operand variants to be handled as a follow-up?
Do you mean VXOR_VI? We use vxor.vi for not_vec already in patch 10.

Zhiwei
>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes
  2024-08-20  1:57     ` LIU Zhiwei
@ 2024-08-20  5:14       ` Richard Henderson
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Henderson @ 2024-08-20  5:14 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/20/24 11:57, LIU Zhiwei wrote:
> 
> On 2024/8/14 17:17, Richard Henderson wrote:
>> On 8/13/24 21:34, LIU Zhiwei wrote:
>>> +    OPC_VADD_VV = 0x57 | V_OPIVV,
>>> +    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
>>> +    OPC_VAND_VV = 0x24000057 | V_OPIVV,
>>> +    OPC_VOR_VV = 0x28000057 | V_OPIVV,
>>> +    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
>>
>> Immediate operand variants to be handled as a follow-up?
> Do you mean VXOR_VI? We use vxor.vi for not_vec already in patch 10.

Yes, and you match the 5-bit signed operand in patch 9.
All that is required is to put the two together here.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (7 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  9:39   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops LIU Zhiwei
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
comparison instructions.

2.Extend comparison results from mask registers to SEW-width elements,
  following recommendations in The RISC-V SPEC Volume I (Version 20240411).

This aligns with TCG's cmp_vec behavior by expanding compare results to
full element width: all 1s for true, all 0s for false.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |   2 +
 tcg/riscv/tcg-target-con-str.h |   1 +
 tcg/riscv/tcg-target.c.inc     | 188 +++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.opc.h     |   3 +
 4 files changed, 194 insertions(+)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 8a0de18257..23b391dd07 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -22,5 +22,7 @@ C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
 C_O0_I2(v, r)
+C_O0_I2(v, vK)
 C_O1_I1(v, r)
 C_O1_I2(v, v, v)
+C_O1_I2(v, v, vK)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index b2b3211bcb..0aaad7b753 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -17,6 +17,7 @@ REGS('v', ALL_VECTOR_REGS)
  */
 CONST('I', TCG_CT_CONST_S12)
 CONST('J', TCG_CT_CONST_J12)
+CONST('K', TCG_CT_CONST_S5)
 CONST('N', TCG_CT_CONST_N12)
 CONST('M', TCG_CT_CONST_M12)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 650b5eff1a..3f1e215e90 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -113,6 +113,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_N12   0x400
 #define TCG_CT_CONST_M12   0x800
 #define TCG_CT_CONST_J12  0x1000
+#define TCG_CT_CONST_S5   0x2000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
@@ -160,6 +161,13 @@ static bool tcg_target_const_match(int64_t val, int ct,
     if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
         return 1;
     }
+    /*
+     * Sign extended from 5 bits: [-0x10, 0x0f].
+     * Used for vector-immediate.
+     */
+    if ((ct & TCG_CT_CONST_S5) && val >= -0x10 && val <= 0x0f) {
+        return 1;
+    }
     return 0;
 }
 
@@ -289,12 +297,39 @@ typedef enum {
     OPC_VSE32_V = 0x6027 | V_SUMOP,
     OPC_VSE64_V = 0x7027 | V_SUMOP,
 
+    OPC_VMERGE_VIM = 0x5c000057 | V_OPIVI,
+    OPC_VMERGE_VVM = 0x5c000057 | V_OPIVV,
+    OPC_VMNAND_MM = 0x74000057 | V_OPMVV,
+
     OPC_VADD_VV = 0x57 | V_OPIVV,
     OPC_VSUB_VV = 0x8000057 | V_OPIVV,
     OPC_VAND_VV = 0x24000057 | V_OPIVV,
     OPC_VOR_VV = 0x28000057 | V_OPIVV,
     OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
 
+    OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
+    OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
+    OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
+    OPC_VMSNE_VV = 0x64000057 | V_OPIVV,
+    OPC_VMSNE_VI = 0x64000057 | V_OPIVI,
+    OPC_VMSNE_VX = 0x64000057 | V_OPIVX,
+
+    OPC_VMSLTU_VV = 0x68000057 | V_OPIVV,
+    OPC_VMSLTU_VX = 0x68000057 | V_OPIVX,
+    OPC_VMSLT_VV = 0x6c000057 | V_OPIVV,
+    OPC_VMSLT_VX = 0x6c000057 | V_OPIVX,
+    OPC_VMSLEU_VV = 0x70000057 | V_OPIVV,
+    OPC_VMSLEU_VX = 0x70000057 | V_OPIVX,
+    OPC_VMSLE_VV = 0x74000057 | V_OPIVV,
+    OPC_VMSLE_VX = 0x74000057 | V_OPIVX,
+
+    OPC_VMSLEU_VI = 0x70000057 | V_OPIVI,
+    OPC_VMSLE_VI = 0x74000057 | V_OPIVI,
+    OPC_VMSGTU_VI = 0x78000057 | V_OPIVI,
+    OPC_VMSGTU_VX = 0x78000057 | V_OPIVX,
+    OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
+    OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -575,6 +610,15 @@ static void tcg_out_opc_vec_config(TCGContext *s, RISCVInsn opc,
 #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
     tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
 
+#define tcg_out_opc_vim_mask(s, opc, vd, vs2, imm) \
+    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, false);
+
+#define tcg_out_opc_vvm_mask(s, opc, vd, vs2, vs1) \
+    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, false);
+
+#define tcg_out_opc_mvv(s, opc, vd, vs2, vs1, vm) \
+    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
+
 #define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
     tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);
 
@@ -1037,6 +1081,22 @@ static const struct {
     [TCG_COND_GTU] = { OPC_BLTU, true  }
 };
 
+static const struct {
+    RISCVInsn opc;
+    bool swap;
+} tcg_cmpcond_to_rvv_vv[] = {
+    [TCG_COND_EQ] =  { OPC_VMSEQ_VV,  false },
+    [TCG_COND_NE] =  { OPC_VMSNE_VV,  false },
+    [TCG_COND_LT] =  { OPC_VMSLT_VV,  false },
+    [TCG_COND_GE] =  { OPC_VMSLE_VV,  true  },
+    [TCG_COND_GT] =  { OPC_VMSLT_VV,  true  },
+    [TCG_COND_LE] =  { OPC_VMSLE_VV,  false },
+    [TCG_COND_LTU] = { OPC_VMSLTU_VV, false },
+    [TCG_COND_GEU] = { OPC_VMSLEU_VV, true  },
+    [TCG_COND_GTU] = { OPC_VMSLTU_VV, true  },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VV, false }
+};
+
 static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
                            TCGReg arg2, TCGLabel *l)
 {
@@ -1054,6 +1114,79 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
     tcg_out_opc_branch(s, op, arg1, arg2, 0);
 }
 
+static const struct {
+    RISCVInsn op;
+    bool expand;
+}  tcg_cmpcond_to_rvv_vx[] = {
+    [TCG_COND_EQ]  = { OPC_VMSEQ_VX,  false },
+    [TCG_COND_NE]  = { OPC_VMSNE_VX,  false },
+    [TCG_COND_GT]  = { OPC_VMSGT_VX,  false },
+    [TCG_COND_LE]  = { OPC_VMSLE_VX,  false },
+    [TCG_COND_LT]  = { OPC_VMSLT_VX,  false },
+    [TCG_COND_LTU] = { OPC_VMSLTU_VX, false },
+    [TCG_COND_GTU] = { OPC_VMSGTU_VX, false },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VX, false },
+    [TCG_COND_GE]  = { OPC_VMSLT_VX,  true  },
+    [TCG_COND_GEU] = { OPC_VMSLTU_VX, true  },
+};
+
+static void tcg_out_cmp_vec_vx(TCGContext *s, TCGCond cond, TCGReg arg1,
+                               tcg_target_long arg2)
+{
+    RISCVInsn op;
+
+    tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vx));
+    op = tcg_cmpcond_to_rvv_vx[cond].op;
+    tcg_debug_assert(op != 0);
+
+    tcg_out_opc_vx(s, op, TCG_REG_V0, arg1, arg2, true);
+    if (tcg_cmpcond_to_rvv_vx[cond].expand) {
+        tcg_out_opc_mvv(s, OPC_VMNAND_MM, TCG_REG_V0, TCG_REG_V0,
+                        TCG_REG_V0, false);
+    }
+}
+
+static const struct {
+    RISCVInsn op;
+    int min;
+    int max;
+    bool adjust;
+}  tcg_cmpcond_to_rvv_vi[] = {
+    [TCG_COND_EQ]  = { OPC_VMSEQ_VI,  -16, 15, false },
+    [TCG_COND_NE]  = { OPC_VMSNE_VI,  -16, 15, false },
+    [TCG_COND_GT]  = { OPC_VMSGT_VI,  -16, 15, false },
+    [TCG_COND_LE]  = { OPC_VMSLE_VI,  -16, 15, false },
+    [TCG_COND_LT]  = { OPC_VMSLE_VI,  -15, 16, true  },
+    [TCG_COND_GE]  = { OPC_VMSGT_VI,  -15, 16, true  },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VI,   0, 15, false },
+    [TCG_COND_GTU] = { OPC_VMSGTU_VI,   0, 15, false },
+    [TCG_COND_LTU] = { OPC_VMSLEU_VI,   1, 16, true  },
+    [TCG_COND_GEU] = { OPC_VMSGTU_VI,   1, 16, true  },
+};
+
+static void tcg_out_cmp_vec_vi(TCGContext *s, TCGCond cond, TCGReg arg1,
+                               tcg_target_long arg2)
+{
+    RISCVInsn op;
+    signed imm_min, imm_max;
+
+    tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vi));
+    op = tcg_cmpcond_to_rvv_vi[cond].op;
+    tcg_debug_assert(op != 0);
+    imm_min = tcg_cmpcond_to_rvv_vi[cond].min;
+    imm_max = tcg_cmpcond_to_rvv_vi[cond].max;
+
+    if (arg2 >= imm_min && arg2 <= imm_max) {
+        if (tcg_cmpcond_to_rvv_vi[cond].adjust) {
+            arg2 -= 1;
+        }
+        tcg_out_opc_vi(s, op, TCG_REG_V0, arg1, arg2, true);
+    } else {
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, arg2);
+        tcg_out_cmp_vec_vx(s, cond, arg1, TCG_REG_TMP0);
+    }
+}
+
 #define SETCOND_INV    TCG_TARGET_NB_REGS
 #define SETCOND_NEZ    (SETCOND_INV << 1)
 #define SETCOND_FLAGS  (SETCOND_INV | SETCOND_NEZ)
@@ -2179,6 +2312,33 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_xor_vec:
         tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_rvv_cmpcond_vec:
+        {
+            RISCVInsn op;
+            if (const_args[1]) {
+                tcg_out_cmp_vec_vi(s, a2, a0, a1);
+            } else {
+                op = tcg_cmpcond_to_rvv_vv[a2].opc;
+                tcg_debug_assert(op != 0);
+
+                if (tcg_cmpcond_to_rvv_vv[a2].swap) {
+                    TCGReg t = a0;
+                    a0 = a1;
+                    a1 = t;
+                }
+                tcg_out_opc_vv(s, op, TCG_REG_V0, a0, a1, true);
+            }
+        }
+        break;
+    case INDEX_op_rvv_merge_vec:
+        if (const_args[2]) {
+            /* vd[i] = v0.mask[i] ? imm : vs2[i] */
+            tcg_out_opc_vim_mask(s, OPC_VMERGE_VIM, a0, a1, a2);
+        } else {
+            /* vd[i] = v0.mask[i] ? vs1[i] : vs2[i] */
+            tcg_out_opc_vvm_mask(s, OPC_VMERGE_VVM, a0, a1, a2);
+        }
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2189,10 +2349,31 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
+    va_list va;
+    TCGv_vec v0, v1;
+    TCGArg a2, a3;
+
+    va_start(va, a0);
+    v0 = temp_tcgv_vec(arg_temp(a0));
+    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    a2 = va_arg(va, TCGArg);
+
     switch (opc) {
+    case INDEX_op_cmp_vec:
+        {
+            a3 = va_arg(va, TCGArg);
+            vec_gen_3(INDEX_op_rvv_cmpcond_vec, type, vece,
+                      tcgv_vec_arg(v1), a2, a3);
+            tcg_gen_mov_vec(v0, tcg_constant_vec_matching(v0, vece, 0));
+            vec_gen_3(INDEX_op_rvv_merge_vec, type, vece,
+                      tcgv_vec_arg(v0), tcgv_vec_arg(v0),
+                      tcgv_i64_arg(tcg_constant_i64(-1)));
+        }
+        break;
     default:
         g_assert_not_reached();
     }
+    va_end(va);
 }
 
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
@@ -2204,6 +2385,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
         return 1;
+    case INDEX_op_cmp_vec:
+        return -1;
     default:
         return 0;
     }
@@ -2360,6 +2543,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
         return C_O1_I2(v, v, v);
+    case INDEX_op_cmp_vec:
+    case INDEX_op_rvv_merge_vec:
+        return C_O1_I2(v, v, vK);
+    case INDEX_op_rvv_cmpcond_vec:
+        return C_O0_I2(v, vK);
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
index b80b39e1e5..2f23453c35 100644
--- a/tcg/riscv/tcg-target.opc.h
+++ b/tcg/riscv/tcg-target.opc.h
@@ -10,3 +10,6 @@
  * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
  * consider these to be UNSPEC with names.
  */
+
+DEF(rvv_cmpcond_vec, 0, 2, 1, IMPLVEC)
+DEF(rvv_merge_vec, 1, 2, 0, IMPLVEC)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops
  2024-08-13 11:34 ` [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops LIU Zhiwei
@ 2024-08-14  9:39   ` Richard Henderson
  2024-08-27  7:50     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:39 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> 1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
> comparison instructions.
> 
> 2.Extend comparison results from mask registers to SEW-width elements,
>    following recommendations in The RISC-V SPEC Volume I (Version 20240411).
> 
> This aligns with TCG's cmp_vec behavior by expanding compare results to
> full element width: all 1s for true, all 0s for false.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-set.h |   2 +
>   tcg/riscv/tcg-target-con-str.h |   1 +
>   tcg/riscv/tcg-target.c.inc     | 188 +++++++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.opc.h     |   3 +
>   4 files changed, 194 insertions(+)
> 
> diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
> index 8a0de18257..23b391dd07 100644
> --- a/tcg/riscv/tcg-target-con-set.h
> +++ b/tcg/riscv/tcg-target-con-set.h
> @@ -22,5 +22,7 @@ C_N1_I2(r, r, rM)
>   C_O1_I4(r, r, rI, rM, rM)
>   C_O2_I4(r, r, rZ, rZ, rM, rM)
>   C_O0_I2(v, r)
> +C_O0_I2(v, vK)
>   C_O1_I1(v, r)
>   C_O1_I2(v, v, v)
> +C_O1_I2(v, v, vK)
> diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
> index b2b3211bcb..0aaad7b753 100644
> --- a/tcg/riscv/tcg-target-con-str.h
> +++ b/tcg/riscv/tcg-target-con-str.h
> @@ -17,6 +17,7 @@ REGS('v', ALL_VECTOR_REGS)
>    */
>   CONST('I', TCG_CT_CONST_S12)
>   CONST('J', TCG_CT_CONST_J12)
> +CONST('K', TCG_CT_CONST_S5)
>   CONST('N', TCG_CT_CONST_N12)
>   CONST('M', TCG_CT_CONST_M12)
>   CONST('Z', TCG_CT_CONST_ZERO)
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index 650b5eff1a..3f1e215e90 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -113,6 +113,7 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
>   #define TCG_CT_CONST_N12   0x400
>   #define TCG_CT_CONST_M12   0x800
>   #define TCG_CT_CONST_J12  0x1000
> +#define TCG_CT_CONST_S5   0x2000
>   
>   #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
>   #define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
> @@ -160,6 +161,13 @@ static bool tcg_target_const_match(int64_t val, int ct,
>       if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
>           return 1;
>       }
> +    /*
> +     * Sign extended from 5 bits: [-0x10, 0x0f].
> +     * Used for vector-immediate.
> +     */
> +    if ((ct & TCG_CT_CONST_S5) && val >= -0x10 && val <= 0x0f) {
> +        return 1;
> +    }
>       return 0;
>   }
>   
> @@ -289,12 +297,39 @@ typedef enum {
>       OPC_VSE32_V = 0x6027 | V_SUMOP,
>       OPC_VSE64_V = 0x7027 | V_SUMOP,
>   
> +    OPC_VMERGE_VIM = 0x5c000057 | V_OPIVI,
> +    OPC_VMERGE_VVM = 0x5c000057 | V_OPIVV,
> +    OPC_VMNAND_MM = 0x74000057 | V_OPMVV,
> +
>       OPC_VADD_VV = 0x57 | V_OPIVV,
>       OPC_VSUB_VV = 0x8000057 | V_OPIVV,
>       OPC_VAND_VV = 0x24000057 | V_OPIVV,
>       OPC_VOR_VV = 0x28000057 | V_OPIVV,
>       OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
>   
> +    OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
> +    OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
> +    OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
> +    OPC_VMSNE_VV = 0x64000057 | V_OPIVV,
> +    OPC_VMSNE_VI = 0x64000057 | V_OPIVI,
> +    OPC_VMSNE_VX = 0x64000057 | V_OPIVX,
> +
> +    OPC_VMSLTU_VV = 0x68000057 | V_OPIVV,
> +    OPC_VMSLTU_VX = 0x68000057 | V_OPIVX,
> +    OPC_VMSLT_VV = 0x6c000057 | V_OPIVV,
> +    OPC_VMSLT_VX = 0x6c000057 | V_OPIVX,
> +    OPC_VMSLEU_VV = 0x70000057 | V_OPIVV,
> +    OPC_VMSLEU_VX = 0x70000057 | V_OPIVX,
> +    OPC_VMSLE_VV = 0x74000057 | V_OPIVV,
> +    OPC_VMSLE_VX = 0x74000057 | V_OPIVX,
> +
> +    OPC_VMSLEU_VI = 0x70000057 | V_OPIVI,
> +    OPC_VMSLE_VI = 0x74000057 | V_OPIVI,
> +    OPC_VMSGTU_VI = 0x78000057 | V_OPIVI,
> +    OPC_VMSGTU_VX = 0x78000057 | V_OPIVX,
> +    OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
> +    OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
> +
>       OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
>       OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
>       OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
> @@ -575,6 +610,15 @@ static void tcg_out_opc_vec_config(TCGContext *s, RISCVInsn opc,
>   #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
>       tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
>   
> +#define tcg_out_opc_vim_mask(s, opc, vd, vs2, imm) \
> +    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, false);
> +
> +#define tcg_out_opc_vvm_mask(s, opc, vd, vs2, vs1) \
> +    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, false);
> +
> +#define tcg_out_opc_mvv(s, opc, vd, vs2, vs1, vm) \
> +    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
> +
>   #define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
>       tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);
>   
> @@ -1037,6 +1081,22 @@ static const struct {
>       [TCG_COND_GTU] = { OPC_BLTU, true  }
>   };
>   
> +static const struct {
> +    RISCVInsn opc;
> +    bool swap;
> +} tcg_cmpcond_to_rvv_vv[] = {
> +    [TCG_COND_EQ] =  { OPC_VMSEQ_VV,  false },
> +    [TCG_COND_NE] =  { OPC_VMSNE_VV,  false },
> +    [TCG_COND_LT] =  { OPC_VMSLT_VV,  false },
> +    [TCG_COND_GE] =  { OPC_VMSLE_VV,  true  },
> +    [TCG_COND_GT] =  { OPC_VMSLT_VV,  true  },
> +    [TCG_COND_LE] =  { OPC_VMSLE_VV,  false },
> +    [TCG_COND_LTU] = { OPC_VMSLTU_VV, false },
> +    [TCG_COND_GEU] = { OPC_VMSLEU_VV, true  },
> +    [TCG_COND_GTU] = { OPC_VMSLTU_VV, true  },
> +    [TCG_COND_LEU] = { OPC_VMSLEU_VV, false }
> +};
> +
>   static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
>                              TCGReg arg2, TCGLabel *l)
>   {
> @@ -1054,6 +1114,79 @@ static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
>       tcg_out_opc_branch(s, op, arg1, arg2, 0);
>   }
>   
> +static const struct {
> +    RISCVInsn op;
> +    bool expand;

invert is probably a better name.
Why are these tables so far apart?

> +static void tcg_out_cmp_vec_vx(TCGContext *s, TCGCond cond, TCGReg arg1,
> +                               tcg_target_long arg2)
> +{
> +    RISCVInsn op;
> +
> +    tcg_debug_assert((unsigned)cond < ARRAY_SIZE(tcg_cmpcond_to_rvv_vx));
> +    op = tcg_cmpcond_to_rvv_vx[cond].op;
> +    tcg_debug_assert(op != 0);
> +
> +    tcg_out_opc_vx(s, op, TCG_REG_V0, arg1, arg2, true);
> +    if (tcg_cmpcond_to_rvv_vx[cond].expand) {
> +        tcg_out_opc_mvv(s, OPC_VMNAND_MM, TCG_REG_V0, TCG_REG_V0,
> +                        TCG_REG_V0, false);
> +    }
> +}

I think you'll be better served by handling the invert during expand, because you can 
always swap the sense of the predicate in the user.

Compare tcg/i386 expand_vec_cmp_noinv.

> +            tcg_gen_mov_vec(v0, tcg_constant_vec_matching(v0, vece, 0));

You don't need to copy to v0; just use the tcg_constant_vec directly as

> +            vec_gen_3(INDEX_op_rvv_merge_vec, type, vece,
> +                      tcgv_vec_arg(v0), tcgv_vec_arg(v0),
> +                      tcgv_i64_arg(tcg_constant_i64(-1)));

the first source operand.

You can swap 0 and -1 if the comparison instruction requires the predicate to be inverted.

r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops
  2024-08-14  9:39   ` Richard Henderson
@ 2024-08-27  7:50     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-27  7:50 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:39, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>>
>> 1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
>> comparison instructions.
>>
>> 2.Extend comparison results from mask registers to SEW-width elements,
>>    following recommendations in The RISC-V SPEC Volume I (Version 
>> 20240411).
>>
>> This aligns with TCG's cmp_vec behavior by expanding compare results to
>> full element width: all 1s for true, all 0s for false.
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>> ---
>>   tcg/riscv/tcg-target-con-set.h |   2 +
>>   tcg/riscv/tcg-target-con-str.h |   1 +
>>   tcg/riscv/tcg-target.c.inc     | 188 +++++++++++++++++++++++++++++++++
>>   tcg/riscv/tcg-target.opc.h     |   3 +
>>   4 files changed, 194 insertions(+)
>>
>> diff --git a/tcg/riscv/tcg-target-con-set.h 
>> b/tcg/riscv/tcg-target-con-set.h
>> index 8a0de18257..23b391dd07 100644
>> --- a/tcg/riscv/tcg-target-con-set.h
>> +++ b/tcg/riscv/tcg-target-con-set.h
>> @@ -22,5 +22,7 @@ C_N1_I2(r, r, rM)
>>   C_O1_I4(r, r, rI, rM, rM)
>>   C_O2_I4(r, r, rZ, rZ, rM, rM)
>>   C_O0_I2(v, r)
>> +C_O0_I2(v, vK)
>>   C_O1_I1(v, r)
>>   C_O1_I2(v, v, v)
>> +C_O1_I2(v, v, vK)
>> diff --git a/tcg/riscv/tcg-target-con-str.h 
>> b/tcg/riscv/tcg-target-con-str.h
>> index b2b3211bcb..0aaad7b753 100644
>> --- a/tcg/riscv/tcg-target-con-str.h
>> +++ b/tcg/riscv/tcg-target-con-str.h
>> @@ -17,6 +17,7 @@ REGS('v', ALL_VECTOR_REGS)
>>    */
>>   CONST('I', TCG_CT_CONST_S12)
>>   CONST('J', TCG_CT_CONST_J12)
>> +CONST('K', TCG_CT_CONST_S5)
>>   CONST('N', TCG_CT_CONST_N12)
>>   CONST('M', TCG_CT_CONST_M12)
>>   CONST('Z', TCG_CT_CONST_ZERO)
>> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
>> index 650b5eff1a..3f1e215e90 100644
>> --- a/tcg/riscv/tcg-target.c.inc
>> +++ b/tcg/riscv/tcg-target.c.inc
>> @@ -113,6 +113,7 @@ static TCGReg 
>> tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
>>   #define TCG_CT_CONST_N12   0x400
>>   #define TCG_CT_CONST_M12   0x800
>>   #define TCG_CT_CONST_J12  0x1000
>> +#define TCG_CT_CONST_S5   0x2000
>>     #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
>>   #define ALL_VECTOR_REGS    MAKE_64BIT_MASK(33, 31)
>> @@ -160,6 +161,13 @@ static bool tcg_target_const_match(int64_t val, 
>> int ct,
>>       if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
>>           return 1;
>>       }
>> +    /*
>> +     * Sign extended from 5 bits: [-0x10, 0x0f].
>> +     * Used for vector-immediate.
>> +     */
>> +    if ((ct & TCG_CT_CONST_S5) && val >= -0x10 && val <= 0x0f) {
>> +        return 1;
>> +    }
>>       return 0;
>>   }
>>   @@ -289,12 +297,39 @@ typedef enum {
>>       OPC_VSE32_V = 0x6027 | V_SUMOP,
>>       OPC_VSE64_V = 0x7027 | V_SUMOP,
>>   +    OPC_VMERGE_VIM = 0x5c000057 | V_OPIVI,
>> +    OPC_VMERGE_VVM = 0x5c000057 | V_OPIVV,
>> +    OPC_VMNAND_MM = 0x74000057 | V_OPMVV,
>> +
>>       OPC_VADD_VV = 0x57 | V_OPIVV,
>>       OPC_VSUB_VV = 0x8000057 | V_OPIVV,
>>       OPC_VAND_VV = 0x24000057 | V_OPIVV,
>>       OPC_VOR_VV = 0x28000057 | V_OPIVV,
>>       OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
>>   +    OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
>> +    OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
>> +    OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
>> +    OPC_VMSNE_VV = 0x64000057 | V_OPIVV,
>> +    OPC_VMSNE_VI = 0x64000057 | V_OPIVI,
>> +    OPC_VMSNE_VX = 0x64000057 | V_OPIVX,
>> +
>> +    OPC_VMSLTU_VV = 0x68000057 | V_OPIVV,
>> +    OPC_VMSLTU_VX = 0x68000057 | V_OPIVX,
>> +    OPC_VMSLT_VV = 0x6c000057 | V_OPIVV,
>> +    OPC_VMSLT_VX = 0x6c000057 | V_OPIVX,
>> +    OPC_VMSLEU_VV = 0x70000057 | V_OPIVV,
>> +    OPC_VMSLEU_VX = 0x70000057 | V_OPIVX,
>> +    OPC_VMSLE_VV = 0x74000057 | V_OPIVV,
>> +    OPC_VMSLE_VX = 0x74000057 | V_OPIVX,
>> +
>> +    OPC_VMSLEU_VI = 0x70000057 | V_OPIVI,
>> +    OPC_VMSLE_VI = 0x74000057 | V_OPIVI,
>> +    OPC_VMSGTU_VI = 0x78000057 | V_OPIVI,
>> +    OPC_VMSGTU_VX = 0x78000057 | V_OPIVX,
>> +    OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
>> +    OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
>> +
>>       OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
>>       OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
>>       OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
>> @@ -575,6 +610,15 @@ static void tcg_out_opc_vec_config(TCGContext 
>> *s, RISCVInsn opc,
>>   #define tcg_out_opc_vi(s, opc, vd, vs2, imm, vm) \
>>       tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, vm);
>>   +#define tcg_out_opc_vim_mask(s, opc, vd, vs2, imm) \
>> +    tcg_out_opc_reg_vec_i(s, opc, vd, imm, vs2, false);
>> +
>> +#define tcg_out_opc_vvm_mask(s, opc, vd, vs2, vs1) \
>> +    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, false);
>> +
>> +#define tcg_out_opc_mvv(s, opc, vd, vs2, vs1, vm) \
>> +    tcg_out_opc_reg_vec(s, opc, vd, vs1, vs2, vm);
>> +
>>   #define tcg_out_opc_vconfig(s, opc, rd, avl, vtypei) \
>>       tcg_out_opc_vec_config(s, opc, rd, avl, vtypei);
>>   @@ -1037,6 +1081,22 @@ static const struct {
>>       [TCG_COND_GTU] = { OPC_BLTU, true  }
>>   };
>>   +static const struct {
>> +    RISCVInsn opc;
>> +    bool swap;
>> +} tcg_cmpcond_to_rvv_vv[] = {
>> +    [TCG_COND_EQ] =  { OPC_VMSEQ_VV,  false },
>> +    [TCG_COND_NE] =  { OPC_VMSNE_VV,  false },
>> +    [TCG_COND_LT] =  { OPC_VMSLT_VV,  false },
>> +    [TCG_COND_GE] =  { OPC_VMSLE_VV,  true  },
>> +    [TCG_COND_GT] =  { OPC_VMSLT_VV,  true  },
>> +    [TCG_COND_LE] =  { OPC_VMSLE_VV,  false },
>> +    [TCG_COND_LTU] = { OPC_VMSLTU_VV, false },
>> +    [TCG_COND_GEU] = { OPC_VMSLEU_VV, true  },
>> +    [TCG_COND_GTU] = { OPC_VMSLTU_VV, true  },
>> +    [TCG_COND_LEU] = { OPC_VMSLEU_VV, false }
>> +};
>> +
>>   static void tcg_out_brcond(TCGContext *s, TCGCond cond, TCGReg arg1,
>>                              TCGReg arg2, TCGLabel *l)
>>   {
>> @@ -1054,6 +1114,79 @@ static void tcg_out_brcond(TCGContext *s, 
>> TCGCond cond, TCGReg arg1,
>>       tcg_out_opc_branch(s, op, arg1, arg2, 0);
>>   }
>>   +static const struct {
>> +    RISCVInsn op;
>> +    bool expand;
>
> invert is probably a better name.
OK.
> Why are these tables so far apart?
>> +static void tcg_out_cmp_vec_vx(TCGContext *s, TCGCond cond, TCGReg 
>> arg1,
>> +                               tcg_target_long arg2)
>> +{
>> +    RISCVInsn op;
>> +
>> +    tcg_debug_assert((unsigned)cond < 
>> ARRAY_SIZE(tcg_cmpcond_to_rvv_vx));
>> +    op = tcg_cmpcond_to_rvv_vx[cond].op;
>> +    tcg_debug_assert(op != 0);
>> +
>> +    tcg_out_opc_vx(s, op, TCG_REG_V0, arg1, arg2, true);
>> +    if (tcg_cmpcond_to_rvv_vx[cond].expand) {
>> +        tcg_out_opc_mvv(s, OPC_VMNAND_MM, TCG_REG_V0, TCG_REG_V0,
>> +                        TCG_REG_V0, false);
>> +    }
>> +}
>
> I think you'll be better served by handling the invert during expand, 
> because you can always swap the sense of the predicate in the user.
OK. We have implemented this IR in expand way, where we call 
rvv_cmp_vv/rvv_cmp_vx/ rvv_cmp_vi according to the type and value of arg2.
>
> Compare tcg/i386 expand_vec_cmp_noinv.
>
>> +            tcg_gen_mov_vec(v0, tcg_constant_vec_matching(v0, vece, 
>> 0));
>
> You don't need to copy to v0; just use the tcg_constant_vec directly as
>
>> + vec_gen_3(INDEX_op_rvv_merge_vec, type, vece,
>> +                      tcgv_vec_arg(v0), tcgv_vec_arg(v0),
>> +                      tcgv_i64_arg(tcg_constant_i64(-1)));
>
> the first source operand.
>
> You can swap 0 and -1 if the comparison instruction requires the 
> predicate to be inverted.

OK.

Thanks,
Zhiwei

>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (8 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  9:45   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 11/15] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 13 +++++++++++++
 tcg/riscv/tcg-target.h         |  4 ++--
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 23b391dd07..781b18a09e 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -24,5 +24,6 @@ C_O2_I4(r, r, rZ, rZ, rM, rM)
 C_O0_I2(v, r)
 C_O0_I2(v, vK)
 C_O1_I1(v, r)
+C_O1_I1(v, v)
 C_O1_I2(v, v, v)
 C_O1_I2(v, v, vK)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 3f1e215e90..a33c634dbb 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -306,7 +306,9 @@ typedef enum {
     OPC_VAND_VV = 0x24000057 | V_OPIVV,
     OPC_VOR_VV = 0x28000057 | V_OPIVV,
     OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
+    OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
+    OPC_VRSUB_VX = 0xc000057 | V_OPIVX,
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2312,6 +2314,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_xor_vec:
         tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_not_vec:
+        tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
+        break;
+    case INDEX_op_neg_vec:
+        tcg_out_opc_vx(s, OPC_VRSUB_VX, a0, a1, TCG_REG_ZERO, true);
+        break;
     case INDEX_op_rvv_cmpcond_vec:
         {
             RISCVInsn op;
@@ -2384,6 +2392,8 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_and_vec:
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
+    case INDEX_op_not_vec:
+    case INDEX_op_neg_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2537,6 +2547,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
+    case INDEX_op_neg_vec:
+    case INDEX_op_not_vec:
+        return C_O1_I1(v, v);
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
     case INDEX_op_and_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 12a7a37aaa..401696d639 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -151,8 +151,8 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_vec         0
 #define TCG_TARGET_HAS_nor_vec          0
 #define TCG_TARGET_HAS_eqv_vec          0
-#define TCG_TARGET_HAS_not_vec          0
-#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_not_vec          1
+#define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_rots_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops
  2024-08-13 11:34 ` [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops LIU Zhiwei
@ 2024-08-14  9:45   ` Richard Henderson
  2024-08-27  7:55     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:45 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> @@ -2312,6 +2314,12 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>       case INDEX_op_xor_vec:
>           tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
>           break;
> +    case INDEX_op_not_vec:
> +        tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
> +        break;
> +    case INDEX_op_neg_vec:
> +        tcg_out_opc_vx(s, OPC_VRSUB_VX, a0, a1, TCG_REG_ZERO, true);
> +        break;

Any reason not to use vrsub.vi?  Not wrong, just surprising.

Obviously, NOT does not require SEW change.

r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops
  2024-08-14  9:45   ` Richard Henderson
@ 2024-08-27  7:55     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-27  7:55 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:45, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> @@ -2312,6 +2314,12 @@ static void tcg_out_vec_op(TCGContext *s, 
>> TCGOpcode opc,
>>       case INDEX_op_xor_vec:
>>           tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
>>           break;
>> +    case INDEX_op_not_vec:
>> +        tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
>> +        break;
>> +    case INDEX_op_neg_vec:
>> +        tcg_out_opc_vx(s, OPC_VRSUB_VX, a0, a1, TCG_REG_ZERO, true);
>> +        break;
>
> Any reason not to use vrsub.vi?  Not wrong, just surprising.
We will use vrsub.vi.
>
> Obviously, NOT does not require SEW change.

OK.

Thanks,
Zhiwei

>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 11/15] tcg/riscv: Implement vector sat/mul ops
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (9 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 12/15] tcg/riscv: Implement vector min/max ops LIU Zhiwei
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 32 ++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  4 ++--
 2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index a33c634dbb..af21b4593c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -309,6 +309,13 @@ typedef enum {
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
     OPC_VRSUB_VX = 0xc000057 | V_OPIVX,
+
+    OPC_VMUL_VV = 0x94000057 | V_OPMVV,
+    OPC_VSADD_VV = 0x84000057 | V_OPIVV,
+    OPC_VSSUB_VV = 0x8c000057 | V_OPIVV,
+    OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
+    OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
+
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2320,6 +2327,21 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_neg_vec:
         tcg_out_opc_vx(s, OPC_VRSUB_VX, a0, a1, TCG_REG_ZERO, true);
         break;
+    case INDEX_op_mul_vec:
+        tcg_out_opc_vv(s, OPC_VMUL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_ssadd_vec:
+        tcg_out_opc_vv(s, OPC_VSADD_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sssub_vec:
+        tcg_out_opc_vv(s, OPC_VSSUB_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_usadd_vec:
+        tcg_out_opc_vv(s, OPC_VSADDU_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_ussub_vec:
+        tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmpcond_vec:
         {
             RISCVInsn op;
@@ -2394,6 +2416,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
     case INDEX_op_neg_vec:
+    case INDEX_op_mul_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2555,6 +2582,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_and_vec:
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
+    case INDEX_op_mul_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_cmp_vec:
     case INDEX_op_rvv_merge_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 401696d639..21251f8b23 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -160,8 +160,8 @@ typedef enum {
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
-#define TCG_TARGET_HAS_mul_vec          0
-#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_mul_vec          1
+#define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       0
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 12/15] tcg/riscv: Implement vector min/max ops
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (10 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 11/15] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 13/15] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 25 +++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index af21b4593c..c9c69d61fb 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -316,6 +316,11 @@ typedef enum {
     OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
     OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
 
+    OPC_VMAX_VV = 0x1c000057 | V_OPIVV,
+    OPC_VMAXU_VV = 0x18000057 | V_OPIVV,
+    OPC_VMIN_VV = 0x14000057 | V_OPIVV,
+    OPC_VMINU_VV = 0x10000057 | V_OPIVV,
+
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2342,6 +2347,18 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_ussub_vec:
         tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_smax_vec:
+        tcg_out_opc_vv(s, OPC_VMAX_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_smin_vec:
+        tcg_out_opc_vv(s, OPC_VMIN_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_umax_vec:
+        tcg_out_opc_vv(s, OPC_VMAXU_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_umin_vec:
+        tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmpcond_vec:
         {
             RISCVInsn op;
@@ -2421,6 +2438,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2587,6 +2608,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_cmp_vec:
     case INDEX_op_rvv_merge_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 21251f8b23..35e7086ad7 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -162,7 +162,7 @@ typedef enum {
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
-#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       0
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 13/15] tcg/riscv: Implement vector shs/v ops
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (11 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 12/15] tcg/riscv: Implement vector min/max ops LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
  2024-08-13 11:34 ` [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native LIU Zhiwei
  14 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 38 ++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h         |  4 ++--
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 781b18a09e..6510bb5605 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -27,3 +27,4 @@ C_O1_I1(v, r)
 C_O1_I1(v, v)
 C_O1_I2(v, v, v)
 C_O1_I2(v, v, vK)
+C_O1_I2(v, v, r)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c9c69d61fb..467437e175 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -344,6 +344,13 @@ typedef enum {
     OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
     OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
 
+    OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+    OPC_VSLL_VX = 0x94000057 | V_OPIVX,
+    OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+    OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
+    OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+    OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -2359,6 +2366,24 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_umin_vec:
         tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_shls_vec:
+        tcg_out_opc_vx(s, OPC_VSLL_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_shrs_vec:
+        tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_sars_vec:
+        tcg_out_opc_vx(s, OPC_VSRA_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_shlv_vec:
+        tcg_out_opc_vv(s, OPC_VSLL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_shrv_vec:
+        tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sarv_vec:
+        tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmpcond_vec:
         {
             RISCVInsn op;
@@ -2442,6 +2467,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return 1;
     case INDEX_op_cmp_vec:
         return -1;
@@ -2612,7 +2643,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return C_O1_I2(v, v, v);
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
+        return C_O1_I2(v, v, r);
     case INDEX_op_cmp_vec:
     case INDEX_op_rvv_merge_vec:
         return C_O1_I2(v, v, vK);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 35e7086ad7..41c6c446e8 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -158,8 +158,8 @@ typedef enum {
 #define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
-#define TCG_TARGET_HAS_shs_vec          0
-#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_shs_vec          1
+#define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (12 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 13/15] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14  9:55   ` Richard Henderson
  2024-08-13 11:34 ` [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native LIU Zhiwei
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.c.inc | 107 ++++++++++++++++++++++++++++++++++++-
 tcg/riscv/tcg-target.h     |   8 +--
 tcg/riscv/tcg-target.opc.h |   3 ++
 3 files changed, 113 insertions(+), 5 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 467437e175..59d23ed622 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -345,10 +345,13 @@ typedef enum {
     OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
 
     OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+    OPC_VSLL_VI = 0x94000057 | V_OPIVI,
     OPC_VSLL_VX = 0x94000057 | V_OPIVX,
     OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+    OPC_VSRL_VI = 0xa0000057 | V_OPIVI,
     OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
     OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+    OPC_VSRA_VI = 0xa4000057 | V_OPIVI,
     OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
 
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
@@ -2384,6 +2387,15 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_sarv_vec:
         tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_rvv_shli_vec:
+        tcg_out_opc_vi(s, OPC_VSLL_VI, a0, a1, a2, true);
+        break;
+    case INDEX_op_rvv_shri_vec:
+        tcg_out_opc_vi(s, OPC_VSRL_VI, a0, a1, a2, true);
+        break;
+    case INDEX_op_rvv_sari_vec:
+        tcg_out_opc_vi(s, OPC_VSRA_VI, a0, a1, a2, true);
+        break;
     case INDEX_op_rvv_cmpcond_vec:
         {
             RISCVInsn op;
@@ -2422,7 +2434,8 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
     va_list va;
-    TCGv_vec v0, v1;
+    TCGv_vec v0, v1, v2, c1, t1;
+    TCGv_i32 t2;
     TCGArg a2, a3;
 
     va_start(va, a0);
@@ -2442,6 +2455,81 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                       tcgv_i64_arg(tcg_constant_i64(-1)));
         }
         break;
+    case INDEX_op_shli_vec:
+        if (a2 > 31) {
+            t2 = tcg_temp_new_i32();
+            tcg_gen_movi_i32(t2, (int32_t)a2);
+            tcg_gen_shls_vec(vece, v0, v1, t2);
+            tcg_temp_free_i32(t2);
+        } else {
+            vec_gen_3(INDEX_op_rvv_shli_vec, type, vece, tcgv_vec_arg(v0),
+                      tcgv_vec_arg(v1), a2);
+        }
+        break;
+    case INDEX_op_shri_vec:
+        if (a2 > 31) {
+            t2 = tcg_temp_new_i32();
+            tcg_gen_movi_i32(t2, (int32_t)a2);
+            tcg_gen_shrs_vec(vece, v0, v1, t2);
+            tcg_temp_free_i32(t2);
+        } else {
+            vec_gen_3(INDEX_op_rvv_shri_vec, type, vece, tcgv_vec_arg(v0),
+                      tcgv_vec_arg(v1), a2);
+        }
+        break;
+    case INDEX_op_sari_vec:
+        if (a2 > 31) {
+            t2 = tcg_temp_new_i32();
+            tcg_gen_movi_i32(t2, (int32_t)a2);
+            tcg_gen_sars_vec(vece, v0, v1, t2);
+            tcg_temp_free_i32(t2);
+        } else {
+            vec_gen_3(INDEX_op_rvv_sari_vec, type, vece, tcgv_vec_arg(v0),
+                      tcgv_vec_arg(v1), a2);
+        }
+        break;
+    case INDEX_op_rotli_vec:
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_shli_vec(vece, t1, v1, a2);
+        tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - a2);
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
+    case INDEX_op_rotls_vec:
+        t1 = tcg_temp_new_vec(type);
+        t2 = tcg_temp_new_i32();
+        tcg_gen_sub_i32(t2, tcg_constant_i32(8 << vece),
+                        temp_tcgv_i32(arg_temp(a2)));
+        tcg_gen_shrs_vec(vece, v0, v1, t2);
+        tcg_gen_shls_vec(vece, t1, v1, temp_tcgv_i32(arg_temp(a2)));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        tcg_temp_free_i32(t2);
+        break;
+    case INDEX_op_rotlv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t1 = tcg_temp_new_vec(type);
+        c1 = tcg_constant_vec(type, vece, 8 << vece);
+        tcg_gen_sub_vec(vece, t1, c1, v2);
+        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
+    case INDEX_op_rotrv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t1 = tcg_temp_new_vec(type);
+        c1 = tcg_constant_vec(type, vece, 8 << vece);
+        tcg_gen_sub_vec(vece, t1, c1, v2);
+        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -2475,6 +2563,13 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sarv_vec:
         return 1;
     case INDEX_op_cmp_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_sari_vec:
+    case INDEX_op_rotls_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
+    case INDEX_op_rotli_vec:
         return -1;
     default:
         return 0;
@@ -2628,6 +2723,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I1(v, r);
     case INDEX_op_neg_vec:
     case INDEX_op_not_vec:
+    case INDEX_op_rotli_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_sari_vec:
+    case INDEX_op_rvv_shli_vec:
+    case INDEX_op_rvv_shri_vec:
+    case INDEX_op_rvv_sari_vec:
         return C_O1_I1(v, v);
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
@@ -2646,10 +2748,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_shls_vec:
     case INDEX_op_shrs_vec:
     case INDEX_op_sars_vec:
+    case INDEX_op_rotls_vec:
         return C_O1_I2(v, v, r);
     case INDEX_op_cmp_vec:
     case INDEX_op_rvv_merge_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 41c6c446e8..eb5129a976 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -154,10 +154,10 @@ typedef enum {
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          0
-#define TCG_TARGET_HAS_roti_vec         0
-#define TCG_TARGET_HAS_rots_vec         0
-#define TCG_TARGET_HAS_rotv_vec         0
-#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_roti_vec         -1
+#define TCG_TARGET_HAS_rots_vec         -1
+#define TCG_TARGET_HAS_rotv_vec         -1
+#define TCG_TARGET_HAS_shi_vec          -1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
index 2f23453c35..3a010e853e 100644
--- a/tcg/riscv/tcg-target.opc.h
+++ b/tcg/riscv/tcg-target.opc.h
@@ -13,3 +13,6 @@
 
 DEF(rvv_cmpcond_vec, 0, 2, 1, IMPLVEC)
 DEF(rvv_merge_vec, 1, 2, 0, IMPLVEC)
+DEF(rvv_shli_vec, 1, 1, 1, IMPLVEC)
+DEF(rvv_shri_vec, 1, 1, 1, IMPLVEC)
+DEF(rvv_sari_vec, 1, 1, 1, IMPLVEC)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops
  2024-08-13 11:34 ` [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
@ 2024-08-14  9:55   ` Richard Henderson
  2024-08-27  7:57     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14  9:55 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> +    case INDEX_op_shli_vec:
> +        if (a2 > 31) {
> +            t2 = tcg_temp_new_i32();
> +            tcg_gen_movi_i32(t2, (int32_t)a2);
> +            tcg_gen_shls_vec(vece, v0, v1, t2);

Drop the movi, just pass tcg_constant_i32(a2) as the second source.

> +    case INDEX_op_rotls_vec:
> +        t1 = tcg_temp_new_vec(type);
> +        t2 = tcg_temp_new_i32();
> +        tcg_gen_sub_i32(t2, tcg_constant_i32(8 << vece),
> +                        temp_tcgv_i32(arg_temp(a2)));
> +        tcg_gen_shrs_vec(vece, v0, v1, t2);

Only the low lg2(SEW) bits are used; you can just tcg_gen_neg_i32.

> +    case INDEX_op_rotlv_vec:
> +        v2 = temp_tcgv_vec(arg_temp(a2));
> +        t1 = tcg_temp_new_vec(type);
> +        c1 = tcg_constant_vec(type, vece, 8 << vece);
> +        tcg_gen_sub_vec(vece, t1, c1, v2);

Likewise tcg_gen_neg_vec.

> +    case INDEX_op_rotrv_vec:
> +        v2 = temp_tcgv_vec(arg_temp(a2));
> +        t1 = tcg_temp_new_vec(type);
> +        c1 = tcg_constant_vec(type, vece, 8 << vece);
> +        tcg_gen_sub_vec(vece, t1, c1, v2);

Likewise.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops
  2024-08-14  9:55   ` Richard Henderson
@ 2024-08-27  7:57     ` LIU Zhiwei
  0 siblings, 0 replies; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-27  7:57 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/8/14 17:55, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> +    case INDEX_op_shli_vec:
>> +        if (a2 > 31) {
>> +            t2 = tcg_temp_new_i32();
>> +            tcg_gen_movi_i32(t2, (int32_t)a2);
>> +            tcg_gen_shls_vec(vece, v0, v1, t2);
>
> Drop the movi, just pass tcg_constant_i32(a2) as the second source.
OK.
>
>> +    case INDEX_op_rotls_vec:
>> +        t1 = tcg_temp_new_vec(type);
>> +        t2 = tcg_temp_new_i32();
>> +        tcg_gen_sub_i32(t2, tcg_constant_i32(8 << vece),
>> +                        temp_tcgv_i32(arg_temp(a2)));
>> +        tcg_gen_shrs_vec(vece, v0, v1, t2);
>
> Only the low lg2(SEW) bits are used; you can just tcg_gen_neg_i32.
Good idea.
>
>> +    case INDEX_op_rotlv_vec:
>> +        v2 = temp_tcgv_vec(arg_temp(a2));
>> +        t1 = tcg_temp_new_vec(type);
>> +        c1 = tcg_constant_vec(type, vece, 8 << vece);
>> +        tcg_gen_sub_vec(vece, t1, c1, v2);
>
> Likewise tcg_gen_neg_vec.
>
>> +    case INDEX_op_rotrv_vec:
>> +        v2 = temp_tcgv_vec(arg_temp(a2));
>> +        t1 = tcg_temp_new_vec(type);
>> +        c1 = tcg_constant_vec(type, vece, 8 << vece);
>> +        tcg_gen_sub_vec(vece, t1, c1, v2);
>
> Likewise.
Thanks,
Zhiwei
>
>
> r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native
  2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
                   ` (13 preceding siblings ...)
  2024-08-13 11:34 ` [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
@ 2024-08-13 11:34 ` LIU Zhiwei
  2024-08-14 10:15   ` Richard Henderson
  14 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-13 11:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index eb5129a976..fe6c50e49e 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -143,9 +143,13 @@ typedef enum {
 #define TCG_TARGET_HAS_tst              0
 
 /* vector instructions */
-#define TCG_TARGET_HAS_v64              0
-#define TCG_TARGET_HAS_v128             0
-#define TCG_TARGET_HAS_v256             0
+extern int riscv_vlen;
+#define have_rvv    ((cpuinfo & CPUINFO_ZVE64X) && \
+                     (riscv_vlen >= 64))
+
+#define TCG_TARGET_HAS_v64              have_rvv
+#define TCG_TARGET_HAS_v128             have_rvv
+#define TCG_TARGET_HAS_v256             have_rvv
 #define TCG_TARGET_HAS_andc_vec         0
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_nand_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native
  2024-08-13 11:34 ` [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native LIU Zhiwei
@ 2024-08-14 10:15   ` Richard Henderson
  2024-08-27  8:31     ` LIU Zhiwei
  0 siblings, 1 reply; 49+ messages in thread
From: Richard Henderson @ 2024-08-14 10:15 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/13/24 21:34, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target.h | 10 +++++++---
>   1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
> index eb5129a976..fe6c50e49e 100644
> --- a/tcg/riscv/tcg-target.h
> +++ b/tcg/riscv/tcg-target.h
> @@ -143,9 +143,13 @@ typedef enum {
>   #define TCG_TARGET_HAS_tst              0
>   
>   /* vector instructions */
> -#define TCG_TARGET_HAS_v64              0
> -#define TCG_TARGET_HAS_v128             0
> -#define TCG_TARGET_HAS_v256             0
> +extern int riscv_vlen;
> +#define have_rvv    ((cpuinfo & CPUINFO_ZVE64X) && \
> +                     (riscv_vlen >= 64))
> +
> +#define TCG_TARGET_HAS_v64              have_rvv
> +#define TCG_TARGET_HAS_v128             have_rvv
> +#define TCG_TARGET_HAS_v256             have_rvv

Can ELEN ever be less than 64 for riscv64?
I thought ELEN had to be at least XLEN.
Anyway, if ELEN >= 64, then VLEN must also be >= 64.

In any case, I think we should not set CPUINFO_ZVE64X if the vlen is too small.  We can 
initialize both values in util/cpuinfo-riscv.c, rather than initializing vlen in tcg.

> +static void riscv_get_vlenb(void){
> +    /* Get vlenb for Vector: csrrs %0, vlenb, zero. */
> +    asm volatile("csrrs %0, 0xc22, x0" : "=r"(riscv_vlen));
> +    riscv_vlen *= 8;
> +}

While this is an interesting and required datum, if ELEN < XLEN is possible, then perhaps

     asm("vsetvli %0, r0, e64" : "=r"(vl));

is a better probe, verifying that vl != 0, i.e. e64 is supported, and recording vlen as vl 
* 64, i.e. VLMAX.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native
  2024-08-14 10:15   ` Richard Henderson
@ 2024-08-27  8:31     ` LIU Zhiwei
  2024-08-28 23:35       ` Richard Henderson
  0 siblings, 1 reply; 49+ messages in thread
From: LIU Zhiwei @ 2024-08-27  8:31 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]


On 2024/8/14 18:15, Richard Henderson wrote:
> On 8/13/24 21:34, LIU Zhiwei wrote:
>> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>> ---
>>   tcg/riscv/tcg-target.h | 10 +++++++---
>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
>> index eb5129a976..fe6c50e49e 100644
>> --- a/tcg/riscv/tcg-target.h
>> +++ b/tcg/riscv/tcg-target.h
>> @@ -143,9 +143,13 @@ typedef enum {
>>   #define TCG_TARGET_HAS_tst              0
>>     /* vector instructions */
>> -#define TCG_TARGET_HAS_v64              0
>> -#define TCG_TARGET_HAS_v128             0
>> -#define TCG_TARGET_HAS_v256             0
>> +extern int riscv_vlen;
>> +#define have_rvv    ((cpuinfo & CPUINFO_ZVE64X) && \
>> +                     (riscv_vlen >= 64))
>> +
>> +#define TCG_TARGET_HAS_v64              have_rvv
>> +#define TCG_TARGET_HAS_v128             have_rvv
>> +#define TCG_TARGET_HAS_v256             have_rvv
>
> Can ELEN ever be less than 64 for riscv64?

I think so.  At least the specification allow this case. According to 
the specification,

"Any of these extensions can be added to base ISAs with XLEN=32 or XLEN=64."

includes zve32x, where ELEN is 32 and XLEN is 64.

> I thought ELEN had to be at least XLEN.
> Anyway, if ELEN >= 64, then VLEN must also be >= 64.
YES.
>
> In any case, I think we should not set CPUINFO_ZVE64X if the vlen is 
> too small. 
Agree.
> We can initialize both values in util/cpuinfo-riscv.c, rather than 
> initializing vlen in tcg.
>
>> +static void riscv_get_vlenb(void){
>> +    /* Get vlenb for Vector: csrrs %0, vlenb, zero. */
>> +    asm volatile("csrrs %0, 0xc22, x0" : "=r"(riscv_vlen));
>> +    riscv_vlen *= 8;
>> +}
>
> While this is an interesting and required datum, if ELEN < XLEN is 
> possible, then perhaps
>
>     asm("vsetvli %0, r0, e64" : "=r"(vl));
>
> is a better probe, verifying that vl != 0, i.e. e64 is supported, and 
> recording vlen as vl * 64, i.e. VLMAX.

We will use this one. But probe the vlen in util/cpuinfo-riscv.c has no 
meaning as we sometimes use the compiler settings or hw_probe API. In 
these cases, the vlen detected in util/cpuinfo-riscv.c is zero.

Thanks,
Zhiwei

>
>
> r~

[-- Attachment #2: Type: text/html, Size: 4521 bytes --]

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native
  2024-08-27  8:31     ` LIU Zhiwei
@ 2024-08-28 23:35       ` Richard Henderson
  0 siblings, 0 replies; 49+ messages in thread
From: Richard Henderson @ 2024-08-28 23:35 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 8/27/24 18:31, LIU Zhiwei wrote:
> We will use this one. But probe the vlen in util/cpuinfo-riscv.c has no meaning as we 
> sometimes use the compiler settings or hw_probe API. In these cases, the vlen detected in 
> util/cpuinfo-riscv.c is zero.

Pardon?

While you might check __riscv_zve64x at compile-time, you would still fall through to


---
     }

+   if (info & CPUINFO_ZVE64X) {
+       unsigned long vl;
+       asm("vsetvli %0, r0, e64" : "=r"(vl));
+       if (vl) {
+           riscv_vlen = vl * 8;
+       } else {
+           info &= ~CPUINFO_ZVE64X;
+       }
+   }
+
     info |= CPUINFO_ALWAYS;
     cpuinfo = info;
---

Do not attempt to merge the vsetvli from the SIGILL probe; I expect that path to become 
unused and eventually vanish.


r~


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2024-08-28 23:35 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 11:34 [PATCH v1 00/15] tcg/riscv: Add support for vector LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 01/15] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 02/15] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 03/15] tcg: Fix register allocation constraints LIU Zhiwei
2024-08-13 11:52   ` Richard Henderson
2024-08-14  0:58     ` LIU Zhiwei
2024-08-14  2:04       ` Richard Henderson
2024-08-14  2:27         ` LIU Zhiwei
2024-08-14  3:08           ` Richard Henderson
2024-08-14  3:30             ` LIU Zhiwei
2024-08-14  4:18               ` Richard Henderson
2024-08-14  7:47                 ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 04/15] tcg/riscv: Add basic support for vector LIU Zhiwei
2024-08-13 12:19   ` Richard Henderson
2024-08-13 11:34 ` [PATCH v1 05/15] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
2024-08-14  8:24   ` Richard Henderson
2024-08-19  1:34     ` LIU Zhiwei
2024-08-19  2:35       ` Richard Henderson
2024-08-19  2:53         ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 06/15] tcg/riscv: Implement vector load/store LIU Zhiwei
2024-08-14  9:01   ` Richard Henderson
2024-08-19  1:41     ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 07/15] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
2024-08-14  9:11   ` Richard Henderson
2024-08-15 10:49     ` LIU Zhiwei
2024-08-20  9:00   ` Richard Henderson
2024-08-20  9:26     ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 08/15] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
2024-08-14  9:13   ` Richard Henderson
2024-08-20  1:56     ` LIU Zhiwei
2024-08-14  9:17   ` Richard Henderson
2024-08-20  1:57     ` LIU Zhiwei
2024-08-20  5:14       ` Richard Henderson
2024-08-13 11:34 ` [PATCH v1 09/15] tcg/riscv: Implement vector cmp ops LIU Zhiwei
2024-08-14  9:39   ` Richard Henderson
2024-08-27  7:50     ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 10/15] tcg/riscv: Implement vector not/neg ops LIU Zhiwei
2024-08-14  9:45   ` Richard Henderson
2024-08-27  7:55     ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 11/15] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 12/15] tcg/riscv: Implement vector min/max ops LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 13/15] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 14/15] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
2024-08-14  9:55   ` Richard Henderson
2024-08-27  7:57     ` LIU Zhiwei
2024-08-13 11:34 ` [PATCH v1 15/15] tcg/riscv: Enable vector TCG host-native LIU Zhiwei
2024-08-14 10:15   ` Richard Henderson
2024-08-27  8:31     ` LIU Zhiwei
2024-08-28 23:35       ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).