[PATCH v3 00/14] Add support for vector

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/14] Add support for vector
@ 2024-09-04 14:27 LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
                   ` (14 more replies)
  0 siblings, 15 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

This patch set introduces support for the RISC-V vector extension
in TCG backend for RISC-V targets.

v3:
  1. Use the .insn form in cpuinfo probing.
  
  2. Use reserved_regs to constrain the register group index instead of using constrain.
  
  3. Avoid using macros to implement functions whenever possible.
  
  4. Rename vtypei to vtype.
  
  5. Move the __thread prev_vtype variable to TCGContext.
  
  6. Support fractional LMUL setting, but since MF2 has a minimum ELEN of 32,
    restrict fractional LMUL to cases where SEW < 64.
  
  7. Handle vector load/store imm12 split in a different function.
  
  8. Remove compare vx and implement INDEX_op_cmpsel_vec for INDEX_op_cmp_vec in a more concise way.

  9. Move the implementation of shi_vec from tcg_expand_vec_op to tcg_out_vec_op.
  
  10. Address some formatting issues.

v2:
  https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00679.html

v1:
  https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00205.html

Swung0x48 (1):
  tcg/riscv: Add basic support for vector

TANG Tiancheng (13):
  tcg/op-gvec: Fix iteration step in 32-bit operation
  util: Add RISC-V vector extension probe in cpuinfo
  tcg/riscv: Add riscv vset{i}vli support
  tcg/riscv: Implement vector load/store
  tcg/riscv: Implement vector mov/dup{m/i}
  tcg/riscv: Add support for basic vector opcodes
  tcg/riscv: Implement vector cmp ops
  tcg/riscv: Implement vector neg ops
  tcg/riscv: Implement vector sat/mul ops
  tcg/riscv: Implement vector min/max ops
  tcg/riscv: Implement vector shs/v ops
  tcg/riscv: Implement vector roti/v/x shi ops
  tcg/riscv: Enable native vector support for TCG host

 host/include/riscv/host/cpuinfo.h |    3 +
 include/tcg/tcg.h                 |    3 +
 tcg/riscv/tcg-target-con-set.h    |    7 +
 tcg/riscv/tcg-target-con-str.h    |    3 +
 tcg/riscv/tcg-target.c.inc        | 1047 ++++++++++++++++++++++++++---
 tcg/riscv/tcg-target.h            |   80 ++-
 tcg/riscv/tcg-target.opc.h        |   12 +
 tcg/tcg-internal.h                |    2 +
 tcg/tcg-op-gvec.c                 |    2 +-
 tcg/tcg-op-vec.c                  |    2 +-
 util/cpuinfo-riscv.c              |   26 +-
 11 files changed, 1062 insertions(+), 125 deletions(-)
 create mode 100644 tcg/riscv/tcg-target.opc.h

-- 
2.43.0



^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

The loop in the 32-bit case of the vector compare operation
was incorrectly incrementing by 8 bytes per iteration instead
of 4 bytes. This caused the function to process only half of
the intended elements.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Fixes: 9622c697d1 (tcg: Add gvec compare with immediate and scalar operand)
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg-op-gvec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 0308732d9b..78ee1ced80 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -3939,7 +3939,7 @@ void tcg_gen_gvec_cmps(TCGCond cond, unsigned vece, uint32_t dofs,
         uint32_t i;
 
         tcg_gen_extrl_i64_i32(t1, c);
-        for (i = 0; i < oprsz; i += 8) {
+        for (i = 0; i < oprsz; i += 4) {
             tcg_gen_ld_i32(t0, tcg_env, aofs + i);
             tcg_gen_negsetcond_i32(cond, t0, t0, t1);
             tcg_gen_st_i32(t0, tcg_env, dofs + i);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  3:34   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Add support for probing RISC-V vector extension availability in
the backend. This information will be used when deciding whether
to use vector instructions in code generation.

While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
we use RISCV_HWPROBE_IMA_V instead.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 host/include/riscv/host/cpuinfo.h |  3 +++
 util/cpuinfo-riscv.c              | 26 ++++++++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/host/include/riscv/host/cpuinfo.h b/host/include/riscv/host/cpuinfo.h
index 2b00660e36..727cb3204b 100644
--- a/host/include/riscv/host/cpuinfo.h
+++ b/host/include/riscv/host/cpuinfo.h
@@ -10,9 +10,12 @@
 #define CPUINFO_ZBA             (1u << 1)
 #define CPUINFO_ZBB             (1u << 2)
 #define CPUINFO_ZICOND          (1u << 3)
+#define CPUINFO_ZVE64X          (1u << 4)
 
 /* Initialized with a constructor. */
 extern unsigned cpuinfo;
+extern unsigned riscv_vlenb;
+#define riscv_vlen (riscv_vlenb * 8)
 
 /*
  * We cannot rely on constructor ordering, so other constructors must
diff --git a/util/cpuinfo-riscv.c b/util/cpuinfo-riscv.c
index 497ce12680..05917c42d8 100644
--- a/util/cpuinfo-riscv.c
+++ b/util/cpuinfo-riscv.c
@@ -12,6 +12,7 @@
 #endif
 
 unsigned cpuinfo;
+unsigned riscv_vlenb;
 static volatile sig_atomic_t got_sigill;
 
 static void sigill_handler(int signo, siginfo_t *si, void *data)
@@ -33,7 +34,7 @@ static void sigill_handler(int signo, siginfo_t *si, void *data)
 /* Called both as constructor and (possibly) via other constructors. */
 unsigned __attribute__((constructor)) cpuinfo_init(void)
 {
-    unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND;
+    unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND | CPUINFO_ZVE64X;
     unsigned info = cpuinfo;
 
     if (info) {
@@ -49,6 +50,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
 #endif
 #if defined(__riscv_arch_test) && defined(__riscv_zicond)
     info |= CPUINFO_ZICOND;
+#endif
+#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
+    info |= CPUINFO_ZVE64X;
 #endif
     left &= ~info;
 
@@ -64,7 +68,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
             && pair.key >= 0) {
             info |= pair.value & RISCV_HWPROBE_EXT_ZBA ? CPUINFO_ZBA : 0;
             info |= pair.value & RISCV_HWPROBE_EXT_ZBB ? CPUINFO_ZBB : 0;
-            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
+            info |= pair.value & RISCV_HWPROBE_IMA_V ? CPUINFO_ZVE64X : 0;
+            left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
 #ifdef RISCV_HWPROBE_EXT_ZICOND
             info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ? CPUINFO_ZICOND : 0;
             left &= ~CPUINFO_ZICOND;
@@ -112,6 +117,23 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
         assert(left == 0);
     }
 
+    if (info & CPUINFO_ZVE64X) {
+        /*
+         * Get vlenb for Vector: vsetvli rd, x0, e64.
+         * VLMAX = LMUL * VLEN / SEW.
+         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = VLMAX",
+         * so "vlenb = VLMAX * 64 / 8".
+         */
+        unsigned long vlmax = 0;
+        asm volatile(".insn i 0x57, 7, %0, zero, (3 << 3)" : "=r"(vlmax));
+        if (vlmax) {
+            riscv_vlenb = vlmax * 8;
+            assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen - 1)));
+        } else {
+            info &= ~CPUINFO_ZVE64X;
+        }
+    }
+
     info |= CPUINFO_ALWAYS;
     cpuinfo = info;
     return info;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-09-04 14:27 ` [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-09-05  3:34   ` Richard Henderson
  2024-09-09  7:18     ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  3:34 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> +    if (info & CPUINFO_ZVE64X) {
> +        /*
> +         * Get vlenb for Vector: vsetvli rd, x0, e64.
> +         * VLMAX = LMUL * VLEN / SEW.
> +         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = VLMAX",
> +         * so "vlenb = VLMAX * 64 / 8".
> +         */
> +        unsigned long vlmax = 0;
> +        asm volatile(".insn i 0x57, 7, %0, zero, (3 << 3)" : "=r"(vlmax));
> +        if (vlmax) {
> +            riscv_vlenb = vlmax * 8;
> +            assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen - 1)));
> +        } else {
> +            info &= ~CPUINFO_ZVE64X;
> +        }
> +    }

Surely this does not compile, since the riscv_vlen referenced in the assert does not exist.

That said, I've done some experimentation and I believe there is a further simplification 
to be had in instead saving log2(vlenb).

     if (info & CPUINFO_ZVE64X) {
         /*
          * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
          * We are guaranteed by Zve64x that VLEN >= 64, and that
          * EEW of {8,16,32,64} are supported.
          *
          * Cache VLEN in a convenient form.
          */
         unsigned long vlenb;
         asm("csrr %0, vlenb" : "=r"(vlenb));
         riscv_lg2_vlenb = ctz32(vlenb);
     }

I'll talk about how this can be used against the next patch with vsetvl.


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-09-05  3:34   ` Richard Henderson
@ 2024-09-09  7:18     ` LIU Zhiwei
  2024-09-09 15:45       ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-09  7:18 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/5 11:34, Richard Henderson wrote:
> On 9/4/24 07:27, LIU Zhiwei wrote:
>> +    if (info & CPUINFO_ZVE64X) {
>> +        /*
>> +         * Get vlenb for Vector: vsetvli rd, x0, e64.
>> +         * VLMAX = LMUL * VLEN / SEW.
>> +         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = 
>> VLMAX",
>> +         * so "vlenb = VLMAX * 64 / 8".
>> +         */
>> +        unsigned long vlmax = 0;
>> +        asm volatile(".insn i 0x57, 7, %0, zero, (3 << 3)" : 
>> "=r"(vlmax));
>> +        if (vlmax) {
>> +            riscv_vlenb = vlmax * 8;
>> +            assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen - 
>> 1)));
>> +        } else {
>> +            info &= ~CPUINFO_ZVE64X;
>> +        }
>> +    }
>
> Surely this does not compile, since the riscv_vlen referenced in the 
> assert does not exist.
riscv_vlen is macro about riscv_vlenb. I think you miss it.
>
> That said, I've done some experimentation and I believe there is a 
> further simplification to be had in instead saving log2(vlenb).
>
>     if (info & CPUINFO_ZVE64X) {
>         /*
>          * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
>          * We are guaranteed by Zve64x that VLEN >= 64, and that
>          * EEW of {8,16,32,64} are supported.
>          *
>          * Cache VLEN in a convenient form.
>          */
>         unsigned long vlenb;
>         asm("csrr %0, vlenb" : "=r"(vlenb));

Should we use the .insn format here? Maybe we are having a compiler 
doesn't support vector.

> riscv_lg2_vlenb = ctz32(vlenb);
>     }
>
OK.

Thanks,
Zhiwei

> I'll talk about how this can be used against the next patch with vsetvl.

>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-09-09  7:18     ` LIU Zhiwei
@ 2024-09-09 15:45       ` Richard Henderson
  2024-09-10  2:47         ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-09 15:45 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/9/24 00:18, LIU Zhiwei wrote:
> 
> On 2024/9/5 11:34, Richard Henderson wrote:
>> On 9/4/24 07:27, LIU Zhiwei wrote:
>>> +    if (info & CPUINFO_ZVE64X) {
>>> +        /*
>>> +         * Get vlenb for Vector: vsetvli rd, x0, e64.
>>> +         * VLMAX = LMUL * VLEN / SEW.
>>> +         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd = VLMAX",
>>> +         * so "vlenb = VLMAX * 64 / 8".
>>> +         */
>>> +        unsigned long vlmax = 0;
>>> +        asm volatile(".insn i 0x57, 7, %0, zero, (3 << 3)" : "=r"(vlmax));
>>> +        if (vlmax) {
>>> +            riscv_vlenb = vlmax * 8;
>>> +            assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen - 1)));
>>> +        } else {
>>> +            info &= ~CPUINFO_ZVE64X;
>>> +        }
>>> +    }
>>
>> Surely this does not compile, since the riscv_vlen referenced in the assert does not exist.
> riscv_vlen is macro about riscv_vlenb. I think you miss it.

I did miss the macro.  But there's also no need for it to exist.

>>
>> That said, I've done some experimentation and I believe there is a further 
>> simplification to be had in instead saving log2(vlenb).
>>
>>     if (info & CPUINFO_ZVE64X) {
>>         /*
>>          * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
>>          * We are guaranteed by Zve64x that VLEN >= 64, and that
>>          * EEW of {8,16,32,64} are supported.
>>          *
>>          * Cache VLEN in a convenient form.
>>          */
>>         unsigned long vlenb;
>>         asm("csrr %0, vlenb" : "=r"(vlenb));
> 
> Should we use the .insn format here? Maybe we are having a compiler doesn't support vector.

Neither gcc nor clang requires V be enabled at compile time in order to access the CSR.
It does seem like a mistake, but I'm happy to use it.


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo
  2024-09-09 15:45       ` Richard Henderson
@ 2024-09-10  2:47         ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  2:47 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/9 23:45, Richard Henderson wrote:
> On 9/9/24 00:18, LIU Zhiwei wrote:
>>
>> On 2024/9/5 11:34, Richard Henderson wrote:
>>> On 9/4/24 07:27, LIU Zhiwei wrote:
>>>> +    if (info & CPUINFO_ZVE64X) {
>>>> +        /*
>>>> +         * Get vlenb for Vector: vsetvli rd, x0, e64.
>>>> +         * VLMAX = LMUL * VLEN / SEW.
>>>> +         * The "vsetvli rd, x0, e64" means "LMUL = 1, SEW = 64, rd 
>>>> = VLMAX",
>>>> +         * so "vlenb = VLMAX * 64 / 8".
>>>> +         */
>>>> +        unsigned long vlmax = 0;
>>>> +        asm volatile(".insn i 0x57, 7, %0, zero, (3 << 3)" : 
>>>> "=r"(vlmax));
>>>> +        if (vlmax) {
>>>> +            riscv_vlenb = vlmax * 8;
>>>> +            assert(riscv_vlen >= 64 && !(riscv_vlen & (riscv_vlen 
>>>> - 1)));
>>>> +        } else {
>>>> +            info &= ~CPUINFO_ZVE64X;
>>>> +        }
>>>> +    }
>>>
>>> Surely this does not compile, since the riscv_vlen referenced in the 
>>> assert does not exist.
>> riscv_vlen is macro about riscv_vlenb. I think you miss it.
>
> I did miss the macro.  But there's also no need for it to exist.
>
>>>
>>> That said, I've done some experimentation and I believe there is a 
>>> further simplification to be had in instead saving log2(vlenb).
>>>
>>>     if (info & CPUINFO_ZVE64X) {
>>>         /*
>>>          * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
>>>          * We are guaranteed by Zve64x that VLEN >= 64, and that
>>>          * EEW of {8,16,32,64} are supported.
>>>          *
>>>          * Cache VLEN in a convenient form.
>>>          */
>>>         unsigned long vlenb;
>>>         asm("csrr %0, vlenb" : "=r"(vlenb));
>>
>> Should we use the .insn format here? Maybe we are having a compiler 
>> doesn't support vector.
>
> Neither gcc nor clang requires V be enabled at compile time in order 
> to access the CSR.
> It does seem like a mistake, but I'm happy to use it.

Can we follow you here? 🙂

Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 03/14] tcg/riscv: Add basic support for vector
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  4:05   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, Swung0x48,
	TANG Tiancheng

From: Swung0x48 <swung0x48@outlook.com>

The RISC-V vector instruction set utilizes the LMUL field to group
multiple registers, enabling variable-length vector registers. This
implementation uses only the first register number of each group while
reserving the other register numbers within the group.

In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
host runtime needs to adjust LMUL based on the type to use different
register groups.

This presents challenges for TCG's register allocation. Currently, we
avoid modifying the register allocation part of TCG and only expose the
minimum number of vector registers.

For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
LMUL equal to 4, we use 4 vector registers as one register group. We can
use a maximum of 8 register groups, but the V0 register number is reserved
as a mask register, so we can effectively use at most 7 register groups.
Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
forced to be used. This is because TCG cannot yet dynamically constrain
registers with type; likewise, when the host vlen is 128 bits and
TCG_TYPE_V256, we can use at most 15 registers.

There is not much pressure on vector register allocation in TCG now, so
using 7 registers is feasible and will not have a major impact on code
generation.

This patch:
1. Reserves vector register 0 for use as a mask register.
2. When using register groups, reserves the additional registers within
   each group.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Co-authored-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-str.h |   1 +
 tcg/riscv/tcg-target.c.inc     | 166 ++++++++++++++++++++++++++-------
 tcg/riscv/tcg-target.h         |  78 +++++++++-------
 tcg/riscv/tcg-target.opc.h     |  12 +++
 4 files changed, 191 insertions(+), 66 deletions(-)
 create mode 100644 tcg/riscv/tcg-target.opc.h

diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index d5c419dff1..b2b3211bcb 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -9,6 +9,7 @@
  * REGS(letter, register_mask)
  */
 REGS('r', ALL_GENERAL_REGS)
+REGS('v', ALL_VECTOR_REGS)
 
 /*
  * Define constraint letters for constants:
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index d334857226..c3f018ff0c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -32,38 +32,14 @@
 
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
-    "zero",
-    "ra",
-    "sp",
-    "gp",
-    "tp",
-    "t0",
-    "t1",
-    "t2",
-    "s0",
-    "s1",
-    "a0",
-    "a1",
-    "a2",
-    "a3",
-    "a4",
-    "a5",
-    "a6",
-    "a7",
-    "s2",
-    "s3",
-    "s4",
-    "s5",
-    "s6",
-    "s7",
-    "s8",
-    "s9",
-    "s10",
-    "s11",
-    "t3",
-    "t4",
-    "t5",
-    "t6"
+    "zero", "ra",  "sp",  "gp",  "tp",  "t0",  "t1",  "t2",
+    "s0",   "s1",  "a0",  "a1",  "a2",  "a3",  "a4",  "a5",
+    "a6",   "a7",  "s2",  "s3",  "s4",  "s5",  "s6",  "s7",
+    "s8",   "s9",  "s10", "s11", "t3",  "t4",  "t5",  "t6",
+    "v0",   "v1",  "v2",  "v3",  "v4",  "v5",  "v6",  "v7",
+    "v8",   "v9",  "v10", "v11", "v12", "v13", "v14", "v15",
+    "v16",  "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+    "v24",  "v25", "v26", "v27", "v28", "v29", "v30", "v31",
 };
 #endif
 
@@ -100,6 +76,16 @@ static const int tcg_target_reg_alloc_order[] = {
     TCG_REG_A5,
     TCG_REG_A6,
     TCG_REG_A7,
+
+    /* Vector registers and TCG_REG_V0 reserved for mask. */
+    TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,  TCG_REG_V4,
+    TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,  TCG_REG_V8,
+    TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11, TCG_REG_V12,
+    TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, TCG_REG_V16,
+    TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, TCG_REG_V20,
+    TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, TCG_REG_V24,
+    TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, TCG_REG_V28,
+    TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 };
 
 static const int tcg_target_call_iarg_regs[] = {
@@ -127,6 +113,9 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 #define TCG_CT_CONST_J12  0x1000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGS    MAKE_64BIT_MASK(32, 32)
+#define ALL_DVECTOR_REG_GROUPS 0x5555555500000000
+#define ALL_QVECTOR_REG_GROUPS 0x1111111100000000
 
 #define sextreg  sextract64
 
@@ -363,6 +352,24 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
     return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
 }
 
+/* Type-OPIVI */
+
+static int32_t encode_vi(RISCVInsn opc, TCGReg rd, int32_t imm,
+                         TCGReg vs2, bool vm)
+{
+    return opc | (rd & 0x1f) << 7 | (imm & 0x1f) << 15 |
+           (vs2 & 0x1f) << 20 | (vm << 25);
+}
+
+/* Type-OPIVV/OPMVV/OPIVX/OPMVX, Vector load and store */
+
+static int32_t encode_v(RISCVInsn opc, TCGReg d, TCGReg s1,
+                        TCGReg s2, bool vm)
+{
+    return opc | (d & 0x1f) << 7 | (s1 & 0x1f) << 15 |
+           (s2 & 0x1f) << 20 | (vm << 25);
+}
+
 /*
  * RISC-V instruction emitters
  */
@@ -475,6 +482,43 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
     }
 }
 
+/*
+ * RISC-V vector instruction emitters
+ */
+
+/*
+ * Vector registers uses the same 5 lower bits as GPR registers,
+ * and vm=0 (vm = false) means vector masking ENABLED.
+ * With RVV 1.0, vs2 is the first operand, while rs1/imm is the
+ * second operand.
+ */
+static void tcg_out_opc_vv(TCGContext *s, RISCVInsn opc, TCGReg vd,
+                           TCGReg vs2, TCGReg vs1, bool vm)
+{
+    tcg_out32(s, encode_v(opc, vd, vs1, vs2, vm));
+}
+
+static void tcg_out_opc_vx(TCGContext *s, RISCVInsn opc, TCGReg vd,
+                           TCGReg vs2, TCGReg rs1, bool vm)
+{
+    tcg_out32(s, encode_v(opc, vd, rs1, vs2, vm));
+}
+
+static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, TCGReg vd,
+                           TCGReg vs2, int32_t imm, bool vm)
+{
+    tcg_out32(s, encode_vi(opc, vd, imm, vs2, vm));
+}
+
+/*
+ * Only unit-stride addressing implemented; may extend in future.
+ */
+static void tcg_out_opc_ldst_vec(TCGContext *s, RISCVInsn opc, TCGReg data,
+                                 TCGReg rs1, bool vm)
+{
+    tcg_out32(s, encode_v(opc, data, rs1, 0, vm));
+}
+
 /*
  * TCG intrinsics
  */
@@ -1881,6 +1925,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
     }
 }
 
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+                           unsigned vecl, unsigned vece,
+                           const TCGArg args[TCG_MAX_OP_ARGS],
+                           const int const_args[TCG_MAX_OP_ARGS])
+{
+    switch (opc) {
+    case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
+    case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
+    default:
+        g_assert_not_reached();
+    }
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+                       TCGArg a0, ...)
+{
+    switch (opc) {
+    default:
+        g_assert_not_reached();
+    }
+}
+
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+    switch (opc) {
+    default:
+        return 0;
+    }
+}
+
 static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 {
     switch (op) {
@@ -2100,6 +2174,32 @@ static void tcg_target_init(TCGContext *s)
 {
     tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
     tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+    s->reserved_regs = 0;
+
+    if (cpuinfo & CPUINFO_ZVE64X) {
+        switch (riscv_vlen) {
+        case 64:
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
+            s->reserved_regs |= (~ALL_QVECTOR_REG_GROUPS & 0xffffffff00000000);
+            break;
+        case 128:
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
+            s->reserved_regs |= (~ALL_DVECTOR_REG_GROUPS & 0xffffffff00000000);
+            break;
+        case 256:
+            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+            break;
+        default:
+            g_assert_not_reached();
+            break;
+        }
+    }
 
     tcg_target_call_clobber_regs = -1u;
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S0);
@@ -2115,7 +2215,6 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S10);
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S11);
 
-    s->reserved_regs = 0;
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_ZERO);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP0);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1);
@@ -2123,6 +2222,7 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);
     tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
+    tcg_regset_set_reg(s->reserved_regs, TCG_REG_V0);
 }
 
 typedef struct {
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1a347eaf6e..12a7a37aaa 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -28,42 +28,28 @@
 #include "host/cpuinfo.h"
 
 #define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
 #define MAX_CODE_GEN_BUFFER_SIZE  ((size_t)-1)
 
 typedef enum {
-    TCG_REG_ZERO,
-    TCG_REG_RA,
-    TCG_REG_SP,
-    TCG_REG_GP,
-    TCG_REG_TP,
-    TCG_REG_T0,
-    TCG_REG_T1,
-    TCG_REG_T2,
-    TCG_REG_S0,
-    TCG_REG_S1,
-    TCG_REG_A0,
-    TCG_REG_A1,
-    TCG_REG_A2,
-    TCG_REG_A3,
-    TCG_REG_A4,
-    TCG_REG_A5,
-    TCG_REG_A6,
-    TCG_REG_A7,
-    TCG_REG_S2,
-    TCG_REG_S3,
-    TCG_REG_S4,
-    TCG_REG_S5,
-    TCG_REG_S6,
-    TCG_REG_S7,
-    TCG_REG_S8,
-    TCG_REG_S9,
-    TCG_REG_S10,
-    TCG_REG_S11,
-    TCG_REG_T3,
-    TCG_REG_T4,
-    TCG_REG_T5,
-    TCG_REG_T6,
+    TCG_REG_ZERO, TCG_REG_RA,  TCG_REG_SP,  TCG_REG_GP,
+    TCG_REG_TP,   TCG_REG_T0,  TCG_REG_T1,  TCG_REG_T2,
+    TCG_REG_S0,   TCG_REG_S1,  TCG_REG_A0,  TCG_REG_A1,
+    TCG_REG_A2,   TCG_REG_A3,  TCG_REG_A4,  TCG_REG_A5,
+    TCG_REG_A6,   TCG_REG_A7,  TCG_REG_S2,  TCG_REG_S3,
+    TCG_REG_S4,   TCG_REG_S5,  TCG_REG_S6,  TCG_REG_S7,
+    TCG_REG_S8,   TCG_REG_S9,  TCG_REG_S10, TCG_REG_S11,
+    TCG_REG_T3,   TCG_REG_T4,  TCG_REG_T5,  TCG_REG_T6,
+
+    /* RISC-V V Extension registers */
+    TCG_REG_V0,   TCG_REG_V1,  TCG_REG_V2,  TCG_REG_V3,
+    TCG_REG_V4,   TCG_REG_V5,  TCG_REG_V6,  TCG_REG_V7,
+    TCG_REG_V8,   TCG_REG_V9,  TCG_REG_V10, TCG_REG_V11,
+    TCG_REG_V12,  TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+    TCG_REG_V16,  TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+    TCG_REG_V20,  TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+    TCG_REG_V24,  TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+    TCG_REG_V28,  TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
 
     /* aliases */
     TCG_AREG0          = TCG_REG_S0,
@@ -156,6 +142,32 @@ typedef enum {
 
 #define TCG_TARGET_HAS_tst              0
 
+/* vector instructions */
+#define TCG_TARGET_HAS_v64              0
+#define TCG_TARGET_HAS_v128             0
+#define TCG_TARGET_HAS_v256             0
+#define TCG_TARGET_HAS_andc_vec         0
+#define TCG_TARGET_HAS_orc_vec          0
+#define TCG_TARGET_HAS_nand_vec         0
+#define TCG_TARGET_HAS_nor_vec          0
+#define TCG_TARGET_HAS_eqv_vec          0
+#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_abs_vec          0
+#define TCG_TARGET_HAS_roti_vec         0
+#define TCG_TARGET_HAS_rots_vec         0
+#define TCG_TARGET_HAS_rotv_vec         0
+#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_shs_vec          0
+#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_mul_vec          0
+#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_bitsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       0
+
+#define TCG_TARGET_HAS_tst_vec          0
+
 #define TCG_TARGET_DEFAULT_MO (0)
 
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
new file mode 100644
index 0000000000..b80b39e1e5
--- /dev/null
+++ b/tcg/riscv/tcg-target.opc.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (c) C-SKY Microsystems Co., Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ *
+ * See the COPYING file in the top-level directory for details.
+ *
+ * Target-specific opcodes for host vector expansion.  These will be
+ * emitted by tcg_expand_vec_op.  For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 03/14] tcg/riscv: Add basic support for vector
  2024-09-04 14:27 ` [PATCH v3 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-09-05  4:05   ` Richard Henderson
  2024-09-10  2:49     ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  4:05 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, Swung0x48, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> From: Swung0x48 <swung0x48@outlook.com>
> 
> The RISC-V vector instruction set utilizes the LMUL field to group
> multiple registers, enabling variable-length vector registers. This
> implementation uses only the first register number of each group while
> reserving the other register numbers within the group.
> 
> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
> host runtime needs to adjust LMUL based on the type to use different
> register groups.
> 
> This presents challenges for TCG's register allocation. Currently, we
> avoid modifying the register allocation part of TCG and only expose the
> minimum number of vector registers.
> 
> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
> LMUL equal to 4, we use 4 vector registers as one register group. We can
> use a maximum of 8 register groups, but the V0 register number is reserved
> as a mask register, so we can effectively use at most 7 register groups.
> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
> forced to be used. This is because TCG cannot yet dynamically constrain
> registers with type; likewise, when the host vlen is 128 bits and
> TCG_TYPE_V256, we can use at most 15 registers.
> 
> There is not much pressure on vector register allocation in TCG now, so
> using 7 registers is feasible and will not have a major impact on code
> generation.
> 
> This patch:
> 1. Reserves vector register 0 for use as a mask register.
> 2. When using register groups, reserves the additional registers within
>     each group.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Co-authored-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>

This patch does not compile.

../src/tcg/tcg.c:135:13: error: 'tcg_out_dup_vec' used but never defined [-Werror]
   135 | static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
       |             ^~~~~~~~~~~~~~~
../src/tcg/tcg.c:137:13: error: 'tcg_out_dupm_vec' used but never defined [-Werror]
   137 | static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
       |             ^~~~~~~~~~~~~~~~
../src/tcg/tcg.c:139:13: error: 'tcg_out_dupi_vec' used but never defined [-Werror]
   139 | static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
       |             ^~~~~~~~~~~~~~~~
In file included from ../src/tcg/tcg.c:755:
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:516:13: error: 'tcg_out_opc_ldst_vec' 
defined but not used [-Werror=unused-function]
   516 | static void tcg_out_opc_ldst_vec(TCGContext *s, RISCVInsn opc, TCGReg data,
       |             ^~~~~~~~~~~~~~~~~~~~
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:507:13: error: 'tcg_out_opc_vi' defined but 
not used [-Werror=unused-function]
   507 | static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, TCGReg vd,
       |             ^~~~~~~~~~~~~~
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:501:13: error: 'tcg_out_opc_vx' defined but 
not used [-Werror=unused-function]
   501 | static void tcg_out_opc_vx(TCGContext *s, RISCVInsn opc, TCGReg vd,
       |             ^~~~~~~~~~~~~~
/home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:495:13: error: 'tcg_out_opc_vv' defined but 
not used [-Werror=unused-function]
   495 | static void tcg_out_opc_vv(TCGContext *s, RISCVInsn opc, TCGReg vd,
       |             ^~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Either:
(1) Provide stubs for the functions that are required, and delay implementation
     of the unused functions until the patch(es) that use them.
(2) Merge the dup patch so that these functions are defined and implemented,
     which will also provide uses for most of the tcg_out_opc_* functions.


> @@ -2100,6 +2174,32 @@ static void tcg_target_init(TCGContext *s)
>   {
>       tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
>       tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
> +    s->reserved_regs = 0;
> +
> +    if (cpuinfo & CPUINFO_ZVE64X) {
> +        switch (riscv_vlen) {
> +        case 64:
> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
> +            s->reserved_regs |= (~ALL_QVECTOR_REG_GROUPS & 0xffffffff00000000);

No need for ().
Use ALL_VECTOR_REGS instead of the immediate integer.

> +            break;
> +        case 128:
> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
> +            s->reserved_regs |= (~ALL_DVECTOR_REG_GROUPS & 0xffffffff00000000);
> +            break;
> +        case 256:
> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
> +            break;
> +        default:
> +            g_assert_not_reached();

The first host with 512-bit or larger vectors will trigger the assert.

With my suggestion against patch 2, this becomes

     switch (riscv_lg2_vlenb) {
     case TCG_TYPE_V64:
         ...
     case TCG_TYPE_V128:
         ...
     default:
         /* Guaranteed by Zve64x. */
         tcg_debug_assert(riscv_lg2_vlenb >= TCG_TYPE_V256);
     }


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 03/14] tcg/riscv: Add basic support for vector
  2024-09-05  4:05   ` Richard Henderson
@ 2024-09-10  2:49     ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  2:49 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, Swung0x48, TANG Tiancheng


On 2024/9/5 12:05, Richard Henderson wrote:
> On 9/4/24 07:27, LIU Zhiwei wrote:
>> From: Swung0x48 <swung0x48@outlook.com>
>>
>> The RISC-V vector instruction set utilizes the LMUL field to group
>> multiple registers, enabling variable-length vector registers. This
>> implementation uses only the first register number of each group while
>> reserving the other register numbers within the group.
>>
>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>> host runtime needs to adjust LMUL based on the type to use different
>> register groups.
>>
>> This presents challenges for TCG's register allocation. Currently, we
>> avoid modifying the register allocation part of TCG and only expose the
>> minimum number of vector registers.
>>
>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, 
>> with
>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>> use a maximum of 8 register groups, but the V0 register number is 
>> reserved
>> as a mask register, so we can effectively use at most 7 register groups.
>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>> forced to be used. This is because TCG cannot yet dynamically constrain
>> registers with type; likewise, when the host vlen is 128 bits and
>> TCG_TYPE_V256, we can use at most 15 registers.
>>
>> There is not much pressure on vector register allocation in TCG now, so
>> using 7 registers is feasible and will not have a major impact on code
>> generation.
>>
>> This patch:
>> 1. Reserves vector register 0 for use as a mask register.
>> 2. When using register groups, reserves the additional registers within
>>     each group.
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Co-authored-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>
> This patch does not compile.
>
> ../src/tcg/tcg.c:135:13: error: 'tcg_out_dup_vec' used but never 
> defined [-Werror]
>   135 | static bool tcg_out_dup_vec(TCGContext *s, TCGType type, 
> unsigned vece,
>       |             ^~~~~~~~~~~~~~~
> ../src/tcg/tcg.c:137:13: error: 'tcg_out_dupm_vec' used but never 
> defined [-Werror]
>   137 | static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, 
> unsigned vece,
>       |             ^~~~~~~~~~~~~~~~
> ../src/tcg/tcg.c:139:13: error: 'tcg_out_dupi_vec' used but never 
> defined [-Werror]
>   139 | static void tcg_out_dupi_vec(TCGContext *s, TCGType type, 
> unsigned vece,
>       |             ^~~~~~~~~~~~~~~~
> In file included from ../src/tcg/tcg.c:755:
> /home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:516:13: error: 
> 'tcg_out_opc_ldst_vec' defined but not used [-Werror=unused-function]
>   516 | static void tcg_out_opc_ldst_vec(TCGContext *s, RISCVInsn opc, 
> TCGReg data,
>       |             ^~~~~~~~~~~~~~~~~~~~
> /home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:507:13: error: 
> 'tcg_out_opc_vi' defined but not used [-Werror=unused-function]
>   507 | static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, 
> TCGReg vd,
>       |             ^~~~~~~~~~~~~~
> /home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:501:13: error: 
> 'tcg_out_opc_vx' defined but not used [-Werror=unused-function]
>   501 | static void tcg_out_opc_vx(TCGContext *s, RISCVInsn opc, 
> TCGReg vd,
>       |             ^~~~~~~~~~~~~~
> /home/rth/qemu/src/tcg/riscv/tcg-target.c.inc:495:13: error: 
> 'tcg_out_opc_vv' defined but not used [-Werror=unused-function]
>   495 | static void tcg_out_opc_vv(TCGContext *s, RISCVInsn opc, 
> TCGReg vd,
>       |             ^~~~~~~~~~~~~~
> cc1: all warnings being treated as errors
Oops. We miss compiling each patch one by one.
>
> Either:
> (1) Provide stubs for the functions that are required, and delay 
> implementation
>     of the unused functions until the patch(es) that use them.
We will take this way.
> (2) Merge the dup patch so that these functions are defined and 
> implemented,
>     which will also provide uses for most of the tcg_out_opc_* functions.
>
>
>> @@ -2100,6 +2174,32 @@ static void tcg_target_init(TCGContext *s)
>>   {
>>       tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
>>       tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
>> +    s->reserved_regs = 0;
>> +
>> +    if (cpuinfo & CPUINFO_ZVE64X) {
>> +        switch (riscv_vlen) {
>> +        case 64:
>> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
>> +            tcg_target_available_regs[TCG_TYPE_V128] = 
>> ALL_DVECTOR_REG_GROUPS;
>> +            tcg_target_available_regs[TCG_TYPE_V256] = 
>> ALL_QVECTOR_REG_GROUPS;
>> +            s->reserved_regs |= (~ALL_QVECTOR_REG_GROUPS & 
>> 0xffffffff00000000);
>
> No need for ().
> Use ALL_VECTOR_REGS instead of the immediate integer.
OK.
>
>> +            break;
>> +        case 128:
>> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
>> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
>> +            tcg_target_available_regs[TCG_TYPE_V256] = 
>> ALL_DVECTOR_REG_GROUPS;
>> +            s->reserved_regs |= (~ALL_DVECTOR_REG_GROUPS & 
>> 0xffffffff00000000);
>> +            break;
>> +        case 256:
>> +            tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
>> +            tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
>> +            tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
>> +            break;
>> +        default:
>> +            g_assert_not_reached();
>
> The first host with 512-bit or larger vectors will trigger the assert.
>
> With my suggestion against patch 2, this becomes
>
>     switch (riscv_lg2_vlenb) {
>     case TCG_TYPE_V64:
>         ...
>     case TCG_TYPE_V128:
>         ...
>     default:
>         /* Guaranteed by Zve64x. */
>         tcg_debug_assert(riscv_lg2_vlenb >= TCG_TYPE_V256);
>     }
>
Agree.


Thanks,

Zhiwei
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (2 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  6:03   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

In RISC-V, vector operations require initial configuration using
the vset{i}vl{i} instruction.

This instruction:
  1. Sets the vector length (vl) in bytes
  2. Configures the vtype register, which includes:
    SEW (Single Element Width)
    LMUL (vector register group multiplier)
    Other vector operation parameters

This configuration is crucial for defining subsequent vector
operation behavior. To optimize performance, the configuration
process is managed dynamically:
  1. Reconfiguration using vset{i}vl{i} is necessary when SEW
     or vector register group width changes.
  2. The vset instruction can be omitted when configuration
     remains unchanged.

This optimization is only effective within a single TB.
Each TB requires reconfiguration at its start, as the current
state cannot be obtained from hardware.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: Weiwei Li <liwei1518@gmail.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 include/tcg/tcg.h          |   3 +
 tcg/riscv/tcg-target.c.inc | 128 +++++++++++++++++++++++++++++++++++++
 2 files changed, 131 insertions(+)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 21d5884741..267e6ff95c 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -566,6 +566,9 @@ struct TCGContext {
 
     /* Exit to translator on overflow. */
     sigjmp_buf jmp_trans;
+
+    /* For host-specific values. */
+    int riscv_host_vtype;
 };
 
 static inline bool temp_readonly(TCGTemp *ts)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c3f018ff0c..df96d350a3 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -165,6 +165,26 @@ static bool tcg_target_const_match(int64_t val, int ct,
  * RISC-V Base ISA opcodes (IM)
  */
 
+#define V_OPIVV (0x0 << 12)
+#define V_OPFVV (0x1 << 12)
+#define V_OPMVV (0x2 << 12)
+#define V_OPIVI (0x3 << 12)
+#define V_OPIVX (0x4 << 12)
+#define V_OPFVF (0x5 << 12)
+#define V_OPMVX (0x6 << 12)
+#define V_OPCFG (0x7 << 12)
+
+typedef enum {
+    VLMUL_M1 = 0, /* LMUL=1 */
+    VLMUL_M2,     /* LMUL=2 */
+    VLMUL_M4,     /* LMUL=4 */
+    VLMUL_M8,     /* LMUL=8 */
+    VLMUL_RESERVED,
+    VLMUL_MF8,    /* LMUL=1/8 */
+    VLMUL_MF4,    /* LMUL=1/4 */
+    VLMUL_MF2,    /* LMUL=1/2 */
+} RISCVVlmul;
+
 typedef enum {
     OPC_ADD = 0x33,
     OPC_ADDI = 0x13,
@@ -260,6 +280,11 @@ typedef enum {
     /* Zicond: integer conditional operations */
     OPC_CZERO_EQZ = 0x0e005033,
     OPC_CZERO_NEZ = 0x0e007033,
+
+    /* V: Vector extension 1.0 */
+    OPC_VSETVLI  = 0x57 | V_OPCFG,
+    OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
+    OPC_VSETVL   = 0x80000057 | V_OPCFG,
 } RISCVInsn;
 
 /*
@@ -370,6 +395,26 @@ static int32_t encode_v(RISCVInsn opc, TCGReg d, TCGReg s1,
            (s2 & 0x1f) << 20 | (vm << 25);
 }
 
+/* Vector Configuration */
+
+static uint32_t encode_vtype(bool vta, bool vma,
+                            MemOp vsew, RISCVVlmul vlmul)
+{
+    return vma << 7 | vta << 6 | vsew << 3 | vlmul;
+}
+
+static int32_t encode_vcfg(RISCVInsn opc, TCGReg rd,
+                           TCGArg rs1, uint32_t vtype)
+{
+    return opc | (rd & 0x1f) << 7 | (rs1 & 0x1f) << 15 | (vtype & 0x7ff) << 20;
+}
+
+static int32_t encode_vcfgi(RISCVInsn opc, TCGReg rd,
+                            uint32_t uimm, uint32_t vtype)
+{
+    return opc | (rd & 0x1f) << 7 | (uimm & 0x1f) << 15 | (vtype & 0x3ff) << 20;
+}
+
 /*
  * RISC-V instruction emitters
  */
@@ -519,6 +564,88 @@ static void tcg_out_opc_ldst_vec(TCGContext *s, RISCVInsn opc, TCGReg data,
     tcg_out32(s, encode_v(opc, data, rs1, 0, vm));
 }
 
+static void tcg_out_opc_vec_config(TCGContext *s, RISCVInsn opc,
+                                  TCGReg rd, TCGReg rs1, int32_t vtype)
+{
+    tcg_out32(s, encode_vcfg(opc, rd, rs1, vtype));
+}
+
+static void tcg_out_opc_vec_configi(TCGContext *s, RISCVInsn opc,
+                                  TCGReg rd, uint32_t avl, int32_t vtype)
+{
+    tcg_out32(s, encode_vcfgi(opc, rd, avl, vtype));
+}
+
+static void tcg_out_vset(TCGContext *s, uint32_t avl, int vtype)
+{
+    if (avl < 32) {
+        vtype = sextreg(vtype, 0, 10);
+        tcg_out_opc_vec_configi(s, OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
+    } else {
+        vtype = sextreg(vtype, 0, 11);
+        tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
+        tcg_out_opc_vec_config(s, OPC_VSETVLI, TCG_REG_ZERO,
+                               TCG_REG_TMP0, vtype);
+    }
+}
+
+/* LMUL_MAX = 8, vlmax = vlen / sew * LMUL_MAX. */
+static unsigned get_vlmax(MemOp vsew)
+{
+    return riscv_vlen / (8 << vsew) * 8;
+}
+
+static unsigned get_vec_type_bytes(TCGType type)
+{
+    tcg_debug_assert(type >= TCG_TYPE_V64);
+    return 8 << (type - TCG_TYPE_V64);
+}
+
+static RISCVVlmul calc_vlmul(MemOp vsew, unsigned oprsz)
+{
+    if (oprsz > riscv_vlenb) {
+        return ctzl(oprsz / riscv_vlenb);
+    } else {
+        if (vsew < MO_64) {
+            switch (riscv_vlenb / oprsz) {
+            case 2:
+                return VLMUL_MF2;
+            case 4:
+                return VLMUL_MF4;
+            case 8:
+                return VLMUL_MF8;
+            default:
+                break;
+            }
+        }
+    }
+    return VLMUL_M1;
+}
+
+static void tcg_target_set_vec_config(TCGContext *s, TCGType type,
+                                      MemOp vsew)
+{
+    unsigned oprsz, avl;
+    int vtype;
+    RISCVVlmul vlmul;
+
+    tcg_debug_assert(vsew <= MO_64);
+
+    oprsz = get_vec_type_bytes(type);
+    avl = oprsz / (1 << vsew);
+    vlmul = calc_vlmul(vsew, oprsz);
+
+    tcg_debug_assert(avl <= get_vlmax(vsew));
+    tcg_debug_assert(vlmul <= VLMUL_MF2);
+
+    vtype = encode_vtype(false, false, vsew, vlmul);
+
+    if (vtype != s->riscv_host_vtype) {
+        s->riscv_host_vtype = vtype;
+        tcg_out_vset(s, avl, vtype);
+    }
+}
+
 /*
  * TCG intrinsics
  */
@@ -2167,6 +2294,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 
 static void tcg_out_tb_start(TCGContext *s)
 {
+    s->riscv_host_vtype = -1;
     /* nothing to do */
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-09-04 14:27 ` [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
@ 2024-09-05  6:03   ` Richard Henderson
  2024-09-10  2:46     ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  6:03 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> 
> In RISC-V, vector operations require initial configuration using
> the vset{i}vl{i} instruction.
> 
> This instruction:
>    1. Sets the vector length (vl) in bytes
>    2. Configures the vtype register, which includes:
>      SEW (Single Element Width)
>      LMUL (vector register group multiplier)
>      Other vector operation parameters
> 
> This configuration is crucial for defining subsequent vector
> operation behavior. To optimize performance, the configuration
> process is managed dynamically:
>    1. Reconfiguration using vset{i}vl{i} is necessary when SEW
>       or vector register group width changes.
>    2. The vset instruction can be omitted when configuration
>       remains unchanged.
> 
> This optimization is only effective within a single TB.
> Each TB requires reconfiguration at its start, as the current
> state cannot be obtained from hardware.
> 
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Signed-off-by: Weiwei Li <liwei1518@gmail.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
>   include/tcg/tcg.h          |   3 +
>   tcg/riscv/tcg-target.c.inc | 128 +++++++++++++++++++++++++++++++++++++
>   2 files changed, 131 insertions(+)
> 
> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
> index 21d5884741..267e6ff95c 100644
> --- a/include/tcg/tcg.h
> +++ b/include/tcg/tcg.h
> @@ -566,6 +566,9 @@ struct TCGContext {
>   
>       /* Exit to translator on overflow. */
>       sigjmp_buf jmp_trans;
> +
> +    /* For host-specific values. */
> +    int riscv_host_vtype;
>   };

(1) At minimum this needs #ifdef __riscv.
     I planned to think of a cleaner way to do this,
     but haven't gotten there yet.
     I had also planned to place it higher in the structure, before
     the large temp arrays, so that the structure offset would be smaller.

(2) I have determined through experimentation that vtype alone is insufficient.
     While vtype + avl would be sufficient, it is inefficient.
     Best to store the original inputs: TCGType and SEW, since that way
     there's no effort required when querying the current SEW for use in
     load/store/logicals.

     The bug here appears as TCG swaps between TCGTypes for different
     operations.  E.g. if the vtype computed for (V64, E8) is the same
     as the vtype computed for (V128, E8), with AVL differing, then we
     will incorrectly omit the vsetvl instruction.

     My test case was tcg/tests/aarch64-linux-user/sha1-vector


The naming of these functions is varied and inconsistent.
I suggest the following:


static void set_vtype(TCGContext *s, TCGType type, MemOp vsew)
{
     unsigned vtype, insn, avl;
     int lmul;
     RISCVVlmul vlmul;
     bool lmul_eq_avl;

     s->riscv_cur_type = type;
     s->riscv_cur_vsew = vsew;

     /* Match riscv_lg2_vlenb to TCG_TYPE_V64. */
     QEMU_BUILD_BUG_ON(TCG_TYPE_V64 != 3);

     lmul = type - riscv_lg2_vlenb;
     if (lmul < -3) {
         /* Host VLEN >= 1024 bits. */
         vlmul = VLMUL_M1;
         lmul_eq_avl = false;
     } else if (lmul < 3) {
         /* 1/8 ... 1 ... 8 */
         vlmul = lmul & 7;
         lmul_eq_avl = true;
     } else {
         /* Guaranteed by Zve64x. */
         g_assert_not_reached();
     }

     avl = tcg_type_size(type) >> vsew;
     vtype = encode_vtype(true, true, vsew, vlmul);

     if (avl < 32) {
         insn = encode_i(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
     } else if (lmul_eq_avl) {
         /* rd != 0 and rs1 == 0 uses vlmax */
         insn = encode_i(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, vtype);
     } else {
         tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
         insn = encode_i(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtype);
     }
     tcg_out32(s, insn);
}

static MemOp set_vtype_len(TCGContext *s, TCGType type)
{
     if (type != s->riscv_cur_type) {
         set_type(s, type, MO_64);
     }
     return s->riscv_cur_vsew;
}

static void set_vtype_len_sew(TCGContext *s, TCGType type, MemOp vsew)
{
     if (type != s->riscv_cur_type || vsew != s->riscv_cur_vsew) {
         set_type(s, type, vsew);
     }
}


(1) The storing of lg2(vlenb) means we can convert all of the division into subtraction.
(2) get_vec_type_bytes() already exists as tcg_type_size().
(3) Make use of the signed 3-bit encoding of vlmul.
(4) Make use of rd != 0, rs1 = 0 for the relatively common case of AVL = 32.


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-09-05  6:03   ` Richard Henderson
@ 2024-09-10  2:46     ` LIU Zhiwei
  2024-09-10  4:34       ` Richard Henderson
  0 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  2:46 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/5 14:03, Richard Henderson wrote:
> On 9/4/24 07:27, LIU Zhiwei wrote:
>> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>>
>> In RISC-V, vector operations require initial configuration using
>> the vset{i}vl{i} instruction.
>>
>> This instruction:
>>    1. Sets the vector length (vl) in bytes
>>    2. Configures the vtype register, which includes:
>>      SEW (Single Element Width)
>>      LMUL (vector register group multiplier)
>>      Other vector operation parameters
>>
>> This configuration is crucial for defining subsequent vector
>> operation behavior. To optimize performance, the configuration
>> process is managed dynamically:
>>    1. Reconfiguration using vset{i}vl{i} is necessary when SEW
>>       or vector register group width changes.
>>    2. The vset instruction can be omitted when configuration
>>       remains unchanged.
>>
>> This optimization is only effective within a single TB.
>> Each TB requires reconfiguration at its start, as the current
>> state cannot be obtained from hardware.
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Signed-off-by: Weiwei Li <liwei1518@gmail.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>> ---
>>   include/tcg/tcg.h          |   3 +
>>   tcg/riscv/tcg-target.c.inc | 128 +++++++++++++++++++++++++++++++++++++
>>   2 files changed, 131 insertions(+)
>>
>> diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
>> index 21d5884741..267e6ff95c 100644
>> --- a/include/tcg/tcg.h
>> +++ b/include/tcg/tcg.h
>> @@ -566,6 +566,9 @@ struct TCGContext {
>>         /* Exit to translator on overflow. */
>>       sigjmp_buf jmp_trans;
>> +
>> +    /* For host-specific values. */
>> +    int riscv_host_vtype;
>>   };
>
> (1) At minimum this needs #ifdef __riscv.
>     I planned to think of a cleaner way to do this,
>     but haven't gotten there yet.
>     I had also planned to place it higher in the structure, before
>     the large temp arrays, so that the structure offset would be smaller.
>
> (2) I have determined through experimentation that vtype alone is 
> insufficient.
>     While vtype + avl would be sufficient, it is inefficient.
>     Best to store the original inputs: TCGType and SEW, since that way
>     there's no effort required when querying the current SEW for use in
>     load/store/logicals.
>
>     The bug here appears as TCG swaps between TCGTypes for different
>     operations.  E.g. if the vtype computed for (V64, E8) is the same
>     as the vtype computed for (V128, E8), with AVL differing, then we
>     will incorrectly omit the vsetvl instruction.
>
>     My test case was tcg/tests/aarch64-linux-user/sha1-vector
>
Agree.
>
> The naming of these functions is varied and inconsistent.
> I suggest the following:
>
>
> static void set_vtype(TCGContext *s, TCGType type, MemOp vsew)
> {
>     unsigned vtype, insn, avl;
>     int lmul;
>     RISCVVlmul vlmul;
>     bool lmul_eq_avl;
>
>     s->riscv_cur_type = type;
>     s->riscv_cur_vsew = vsew;
>
>     /* Match riscv_lg2_vlenb to TCG_TYPE_V64. */
>     QEMU_BUILD_BUG_ON(TCG_TYPE_V64 != 3);
>
>     lmul = type - riscv_lg2_vlenb;
>     if (lmul < -3) {
>         /* Host VLEN >= 1024 bits. */
>         vlmul = VLMUL_M1;
I am not sure if we should use VLMUL_MF8,
> lmul_eq_avl = false;
>     } else if (lmul < 3) {
>         /* 1/8 ... 1 ... 8 */
>         vlmul = lmul & 7;
>         lmul_eq_avl = true;
>     } else {
>         /* Guaranteed by Zve64x. */
>         g_assert_not_reached();
>     }
>
>     avl = tcg_type_size(type) >> vsew;
>     vtype = encode_vtype(true, true, vsew, vlmul);
>
>     if (avl < 32) {
>         insn = encode_i(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
Which may benifit here? we usually use  lmul as smallest as we can for 
macro ops split.
>     } else if (lmul_eq_avl) {
>         /* rd != 0 and rs1 == 0 uses vlmax */
>         insn = encode_i(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, vtype);
>     } else {
>         tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
>         insn = encode_i(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtype);
And perhaps here.
>     }
>     tcg_out32(s, insn);
> }
>
> static MemOp set_vtype_len(TCGContext *s, TCGType type)
> {
>     if (type != s->riscv_cur_type) {
>         set_type(s, type, MO_64);
I think you mean set_vtype here.
>     }
>     return s->riscv_cur_vsew;
> }
>
> static void set_vtype_len_sew(TCGContext *s, TCGType type, MemOp vsew)
> {
>     if (type != s->riscv_cur_type || vsew != s->riscv_cur_vsew) {
>         set_type(s, type, vsew);

and set_vtype here.

Thanks,
Zhiwei

>     }
> }
>
>
> (1) The storing of lg2(vlenb) means we can convert all of the division 
> into subtraction.
> (2) get_vec_type_bytes() already exists as tcg_type_size().
> (3) Make use of the signed 3-bit encoding of vlmul.
> (4) Make use of rd != 0, rs1 = 0 for the relatively common case of AVL 
> = 32.
>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-09-10  2:46     ` LIU Zhiwei
@ 2024-09-10  4:34       ` Richard Henderson
  2024-09-10  7:03         ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-10  4:34 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/9/24 19:46, LIU Zhiwei wrote:
>>     lmul = type - riscv_lg2_vlenb;
>>     if (lmul < -3) {
>>         /* Host VLEN >= 1024 bits. */
>>         vlmul = VLMUL_M1;
> I am not sure if we should use VLMUL_MF8,

Perhaps.  See below.

>>     } else if (lmul < 3) {
>>         /* 1/8 ... 1 ... 8 */
>>         vlmul = lmul & 7;
>>         lmul_eq_avl = true;
>>     } else {
>>         /* Guaranteed by Zve64x. */
>>         g_assert_not_reached();
>>     }
>>
>>     avl = tcg_type_size(type) >> vsew;
>>     vtype = encode_vtype(true, true, vsew, vlmul);
>>
>>     if (avl < 32) {
>>         insn = encode_i(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
> Which may benifit here? we usually use  lmul as smallest as we can for macro ops split.

lmul is unchanged, just explicitly setting AVL as well.
The "benefit" is that AVL is visible in the disassembly,
and that we are able to discard the result.

There doesn't appear to be a down side.  Is there one?

>>     } else if (lmul_eq_avl) {
>>         /* rd != 0 and rs1 == 0 uses vlmax */
>>         insn = encode_i(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, vtype);

As opposed to here, where we must clobber a register.
It is a scratch reg, sure, and probably affects nothing
in any microarch which does register renaming.

>>     } else {
>>         tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
>>         insn = encode_i(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtype);
> And perhaps here.

Here, lmul does *not* equal avl, and so we must set it, and because of non-use of VSETIVLI 
we also know that it does not fit in uimm5.

But here's a follow-up question regarding current micro-architectures:

   How much benefit is there from adjusting LMUL < 1, or AVL < VLMAX?

For instance, on other hosts with 128-bit vectors, we also promise support for 64-bit 
registers, just so we can support guests which have 64-bit vector operations.  In existing 
hosts (x86, ppc, s390x, loongarch) we accept that the host instruction will operate on all 
128-bits; we simply ignore half of any result.

Thus the question becomes: can we minimize the number of vset* instructions by bounding 
minimal lmul to 1 (or whatever) and always leaving avl as the full register?  If so, the 
only vset* changes are for SEW changes, or for load/store that are smaller than V*1REG64.

r~

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support
  2024-09-10  4:34       ` Richard Henderson
@ 2024-09-10  7:03         ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  7:03 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/10 12:34, Richard Henderson wrote:
> On 9/9/24 19:46, LIU Zhiwei wrote:
>>>     lmul = type - riscv_lg2_vlenb;
>>>     if (lmul < -3) {
>>>         /* Host VLEN >= 1024 bits. */
>>>         vlmul = VLMUL_M1;
>> I am not sure if we should use VLMUL_MF8,
>
> Perhaps.  See below.
>
>>>     } else if (lmul < 3) {
>>>         /* 1/8 ... 1 ... 8 */
>>>         vlmul = lmul & 7;
>>>         lmul_eq_avl = true;
>>>     } else {
>>>         /* Guaranteed by Zve64x. */
>>>         g_assert_not_reached();
>>>     }
>>>
>>>     avl = tcg_type_size(type) >> vsew;
>>>     vtype = encode_vtype(true, true, vsew, vlmul);
>>>
>>>     if (avl < 32) {
>>>         insn = encode_i(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
>> Which may benifit here? we usually use  lmul as smallest as we can 
>> for macro ops split.
>
> lmul is unchanged, just explicitly setting AVL as well.
> The "benefit" is that AVL is visible in the disassembly,
> and that we are able to discard the result.
>
> There doesn't appear to be a down side.  Is there one?
>
>>>     } else if (lmul_eq_avl) {
>>>         /* rd != 0 and rs1 == 0 uses vlmax */
>>>         insn = encode_i(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, 
>>> vtype);
>
> As opposed to here, where we must clobber a register.
> It is a scratch reg, sure, and probably affects nothing
> in any microarch which does register renaming.
>
>>>     } else {
>>>         tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
>>>         insn = encode_i(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, 
>>> vtype);
>> And perhaps here.
>
> Here, lmul does *not* equal avl, and so we must set it, and because of 
> non-use of VSETIVLI we also know that it does not fit in uimm5.
>
> But here's a follow-up question regarding current micro-architectures:
>
>   How much benefit is there from adjusting LMUL < 1, or AVL < VLMAX?

It may reduce some macro ops for LMUL < 1 than LMUL = 1. For example, on 
host with 128-bit vector,

1) LMUL = 1/2, only one macro ops.

vsetivli x0, 8, e32, mf2
vadd.v.v  x2, x4, x5


2) LMUL = 1, two macro ops.

vsetivli x0, 8, e32, m1
vadd.v.v x2, x4, x5

>
> For instance, on other hosts with 128-bit vectors, we also promise 
> support for 64-bit registers, just so we can support guests which have 
> 64-bit vector operations.  In existing hosts (x86, ppc, s390x, 
> loongarch) we accept that the host instruction will operate on all 
> 128-bits; we simply ignore half of any result.
>
> Thus the question becomes: can we minimize the number of vset* 
> instructions by bounding minimal lmul to 1 (or whatever) and always 
> leaving avl as the full register? 

I think the question we are talking about is when TCG_TYPE_V* is smaller 
than vlen, should we use fraction lmul?

1) Fraction lmul leads to less macro ops. (Depend on micro-architectures).

2) LMUL = 1 leads to less vset*.

I like to use the 1), because vset*vli we are using can be fusion-ed 
probably.

Thanks,
Zhiwei

> If so, the only vset* changes are for SEW changes, or for load/store 
> that are smaller than V*1REG64.
>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 05/14] tcg/riscv: Implement vector load/store
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (3 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  6:39   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |   2 +
 tcg/riscv/tcg-target.c.inc     | 202 +++++++++++++++++++++++++++++++--
 2 files changed, 196 insertions(+), 8 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index aac5ceee2b..d73a62b0f2 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -21,3 +21,5 @@ C_O1_I2(r, rZ, rZ)
 C_N1_I2(r, r, rM)
 C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
+C_O0_I2(v, r)
+C_O1_I1(v, r)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index df96d350a3..4b1079fc6f 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -174,6 +174,11 @@ static bool tcg_target_const_match(int64_t val, int ct,
 #define V_OPMVX (0x6 << 12)
 #define V_OPCFG (0x7 << 12)
 
+/* NF <= 7 && BNF >= 0 */
+#define V_NF(x) (x << 29)
+#define V_UNIT_STRIDE (0x0 << 20)
+#define V_UNIT_STRIDE_WHOLE_REG (0x8 << 20)
+
 typedef enum {
     VLMUL_M1 = 0, /* LMUL=1 */
     VLMUL_M2,     /* LMUL=2 */
@@ -285,6 +290,25 @@ typedef enum {
     OPC_VSETVLI  = 0x57 | V_OPCFG,
     OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
     OPC_VSETVL   = 0x80000057 | V_OPCFG,
+
+    OPC_VLE8_V  = 0x7 | V_UNIT_STRIDE,
+    OPC_VLE16_V = 0x5007 | V_UNIT_STRIDE,
+    OPC_VLE32_V = 0x6007 | V_UNIT_STRIDE,
+    OPC_VLE64_V = 0x7007 | V_UNIT_STRIDE,
+    OPC_VSE8_V  = 0x27 | V_UNIT_STRIDE,
+    OPC_VSE16_V = 0x5027 | V_UNIT_STRIDE,
+    OPC_VSE32_V = 0x6027 | V_UNIT_STRIDE,
+    OPC_VSE64_V = 0x7027 | V_UNIT_STRIDE,
+
+    OPC_VL1RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(0),
+    OPC_VL2RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
+    OPC_VL4RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
+    OPC_VL8RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+
+    OPC_VS1R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(0),
+    OPC_VS2R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
+    OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
+    OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
 } RISCVInsn;
 
 /*
@@ -646,6 +670,20 @@ static void tcg_target_set_vec_config(TCGContext *s, TCGType type,
     }
 }
 
+static int riscv_set_vec_config_vl(TCGContext *s, TCGType type)
+{
+    int prev_vsew = s->riscv_host_vtype < 0 ? MO_8 :
+                    ((s->riscv_host_vtype >> 3) & 0x7);
+    tcg_target_set_vec_config(s, type, prev_vsew);
+    return prev_vsew;
+}
+
+static void riscv_set_vec_config_vl_vece(TCGContext *s, TCGType type,
+                                         unsigned vece)
+{
+    tcg_target_set_vec_config(s, type, vece);
+}
+
 /*
  * TCG intrinsics
  */
@@ -811,31 +849,52 @@ static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg ret, TCGReg arg)
     tcg_out_ext32s(s, ret, arg);
 }
 
-static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
-                         TCGReg addr, intptr_t offset)
+static intptr_t split_offset_scalar(TCGContext *s, TCGReg *addr,
+                                    intptr_t offset)
 {
     intptr_t imm12 = sextreg(offset, 0, 12);
 
     if (offset != imm12) {
         intptr_t diff = tcg_pcrel_diff(s, (void *)offset);
 
-        if (addr == TCG_REG_ZERO && diff == (int32_t)diff) {
+        if (*addr == TCG_REG_ZERO && diff == (int32_t)diff) {
             imm12 = sextreg(diff, 0, 12);
             tcg_out_opc_upper(s, OPC_AUIPC, TCG_REG_TMP2, diff - imm12);
         } else {
             tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP2, offset - imm12);
-            if (addr != TCG_REG_ZERO) {
-                tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, TCG_REG_TMP2, addr);
+            if (*addr != TCG_REG_ZERO) {
+                tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, TCG_REG_TMP2, *addr);
             }
         }
-        addr = TCG_REG_TMP2;
+        *addr = TCG_REG_TMP2;
+    }
+    return imm12;
+}
+
+static void split_offset_vector(TCGContext *s, TCGReg *addr, intptr_t offset)
+{
+    if (offset != 0) {
+        if (offset == sextreg(offset, 0, 12)) {
+            tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, *addr, offset);
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
+            tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP0, *addr);
+        }
+        *addr = TCG_REG_TMP0;
     }
+}
+
+static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
+                         TCGReg addr, intptr_t offset)
+{
+    intptr_t imm12;
 
     switch (opc) {
     case OPC_SB:
     case OPC_SH:
     case OPC_SW:
     case OPC_SD:
+        imm12 = split_offset_scalar(s, &addr, offset);
         tcg_out_opc_store(s, opc, addr, data, imm12);
         break;
     case OPC_LB:
@@ -845,8 +904,31 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
     case OPC_LW:
     case OPC_LWU:
     case OPC_LD:
+        imm12 = split_offset_scalar(s, &addr, offset);
         tcg_out_opc_imm(s, opc, data, addr, imm12);
         break;
+    case OPC_VSE8_V:
+    case OPC_VSE16_V:
+    case OPC_VSE32_V:
+    case OPC_VSE64_V:
+    case OPC_VS1R_V:
+    case OPC_VS2R_V:
+    case OPC_VS4R_V:
+    case OPC_VS8R_V:
+        split_offset_vector(s, &addr, offset);
+        tcg_out_opc_ldst_vec(s, opc, data, addr, true);
+        break;
+    case OPC_VLE8_V:
+    case OPC_VLE16_V:
+    case OPC_VLE32_V:
+    case OPC_VLE64_V:
+    case OPC_VL1RE64_V:
+    case OPC_VL2RE64_V:
+    case OPC_VL4RE64_V:
+    case OPC_VL8RE64_V:
+        split_offset_vector(s, &addr, offset);
+        tcg_out_opc_ldst_vec(s, opc, data, addr, true);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -855,14 +937,101 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
 static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
                        TCGReg arg1, intptr_t arg2)
 {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = (type == TCG_TYPE_I32) ? OPC_LW : OPC_LD;
+    } else {
+        int nf = get_vec_type_bytes(type) / riscv_vlenb;
+
+        switch (nf) {
+        case 1:
+            insn = OPC_VL1RE64_V;
+            break;
+        case 2:
+            insn = OPC_VL2RE64_V;
+            break;
+        case 4:
+            insn = OPC_VL4RE64_V;
+            break;
+        case 8:
+            insn = OPC_VL8RE64_V;
+            break;
+        default:
+            {
+                int prev_vsew = riscv_set_vec_config_vl(s, type);
+
+                switch (prev_vsew) {
+                case MO_8:
+                    insn = OPC_VLE8_V;
+                    break;
+                case MO_16:
+                    insn = OPC_VLE16_V;
+                    break;
+                case MO_32:
+                    insn = OPC_VLE32_V;
+                    break;
+                case MO_64:
+                    insn = OPC_VLE64_V;
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        }
+    }
     tcg_out_ldst(s, insn, arg, arg1, arg2);
 }
 
 static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
                        TCGReg arg1, intptr_t arg2)
 {
-    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_SW : OPC_SD;
+    RISCVInsn insn;
+
+    if (type < TCG_TYPE_V64) {
+        insn = (type == TCG_TYPE_I32) ? OPC_SW : OPC_SD;
+        tcg_out_ldst(s, insn, arg, arg1, arg2);
+    } else {
+        int nf = get_vec_type_bytes(type) / riscv_vlenb;
+
+        switch (nf) {
+        case 1:
+            insn = OPC_VS1R_V;
+            break;
+        case 2:
+            insn = OPC_VS2R_V;
+            break;
+        case 4:
+            insn = OPC_VS4R_V;
+            break;
+        case 8:
+            insn = OPC_VS8R_V;
+            break;
+        default:
+            {
+                int prev_vsew = riscv_set_vec_config_vl(s, type);
+
+                switch (prev_vsew) {
+                case MO_8:
+                    insn = OPC_VSE8_V;
+                    break;
+                case MO_16:
+                    insn = OPC_VSE16_V;
+                    break;
+                case MO_32:
+                    insn = OPC_VSE32_V;
+                    break;
+                case MO_64:
+                    insn = OPC_VSE64_V;
+                    break;
+                default:
+                    g_assert_not_reached();
+                }
+            }
+            break;
+        }
+    }
     tcg_out_ldst(s, insn, arg, arg1, arg2);
 }
 
@@ -2057,7 +2226,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
                            const TCGArg args[TCG_MAX_OP_ARGS],
                            const int const_args[TCG_MAX_OP_ARGS])
 {
+    TCGType type = vecl + TCG_TYPE_V64;
+    TCGArg a0, a1, a2;
+
+    a0 = args[0];
+    a1 = args[1];
+    a2 = args[2];
+
     switch (opc) {
+    case INDEX_op_ld_vec:
+        tcg_out_ld(s, type, a0, a1, a2);
+        break;
+    case INDEX_op_st_vec:
+        tcg_out_st(s, type, a0, a1, a2);
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2221,6 +2403,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_qemu_st_a64_i64:
         return C_O0_I2(rZ, r);
 
+    case INDEX_op_st_vec:
+        return C_O0_I2(v, r);
+    case INDEX_op_ld_vec:
+        return C_O1_I1(v, r);
     default:
         g_assert_not_reached();
     }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 05/14] tcg/riscv: Implement vector load/store
  2024-09-04 14:27 ` [PATCH v3 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
@ 2024-09-05  6:39   ` Richard Henderson
  2024-09-10  3:04     ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  6:39 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> @@ -811,31 +849,52 @@ static void tcg_out_extrl_i64_i32(TCGContext *s, TCGReg ret, TCGReg arg)
>       tcg_out_ext32s(s, ret, arg);
>   }
>   
> -static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
> -                         TCGReg addr, intptr_t offset)
> +static intptr_t split_offset_scalar(TCGContext *s, TCGReg *addr,
> +                                    intptr_t offset)
>   {
>       intptr_t imm12 = sextreg(offset, 0, 12);
>   
>       if (offset != imm12) {
>           intptr_t diff = tcg_pcrel_diff(s, (void *)offset);
>   
> -        if (addr == TCG_REG_ZERO && diff == (int32_t)diff) {
> +        if (*addr == TCG_REG_ZERO && diff == (int32_t)diff) {
>               imm12 = sextreg(diff, 0, 12);
>               tcg_out_opc_upper(s, OPC_AUIPC, TCG_REG_TMP2, diff - imm12);
>           } else {
>               tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP2, offset - imm12);
> -            if (addr != TCG_REG_ZERO) {
> -                tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, TCG_REG_TMP2, addr);
> +            if (*addr != TCG_REG_ZERO) {
> +                tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, TCG_REG_TMP2, *addr);
>               }
>           }
> -        addr = TCG_REG_TMP2;
> +        *addr = TCG_REG_TMP2;
> +    }
> +    return imm12;
> +}
> +
> +static void split_offset_vector(TCGContext *s, TCGReg *addr, intptr_t offset)
> +{
> +    if (offset != 0) {
> +        if (offset == sextreg(offset, 0, 12)) {
> +            tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, *addr, offset);
> +        } else {
> +            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
> +            tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP0, *addr);
> +        }
> +        *addr = TCG_REG_TMP0;
>       }
> +}
> +
> +static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
> +                         TCGReg addr, intptr_t offset)
> +{
> +    intptr_t imm12;
>   
>       switch (opc) {
>       case OPC_SB:
>       case OPC_SH:
>       case OPC_SW:
>       case OPC_SD:
> +        imm12 = split_offset_scalar(s, &addr, offset);
>           tcg_out_opc_store(s, opc, addr, data, imm12);
>           break;
>       case OPC_LB:
> @@ -845,8 +904,31 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
>       case OPC_LW:
>       case OPC_LWU:
>       case OPC_LD:
> +        imm12 = split_offset_scalar(s, &addr, offset);
>           tcg_out_opc_imm(s, opc, data, addr, imm12);
>           break;
> +    case OPC_VSE8_V:
> +    case OPC_VSE16_V:
> +    case OPC_VSE32_V:
> +    case OPC_VSE64_V:
> +    case OPC_VS1R_V:
> +    case OPC_VS2R_V:
> +    case OPC_VS4R_V:
> +    case OPC_VS8R_V:
> +        split_offset_vector(s, &addr, offset);
> +        tcg_out_opc_ldst_vec(s, opc, data, addr, true);
> +        break;
> +    case OPC_VLE8_V:
> +    case OPC_VLE16_V:
> +    case OPC_VLE32_V:
> +    case OPC_VLE64_V:
> +    case OPC_VL1RE64_V:
> +    case OPC_VL2RE64_V:
> +    case OPC_VL4RE64_V:
> +    case OPC_VL8RE64_V:
> +        split_offset_vector(s, &addr, offset);
> +        tcg_out_opc_ldst_vec(s, opc, data, addr, true);
> +        break;
>       default:
>           g_assert_not_reached();
>       }

This is more complicated than it needs to be, calling a combined function, then using a 
switch to separate, then calling separate functions.  Calling separate functions in the 
first place is simpler.  E.g.

static void tcg_out_vec_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
                              TCGReg addr, intptr_t offset)
{
     tcg_debug_assert(data >= TCG_REG_V0);
     tcg_debug_assert(addr < TCG_REG_V0);

     if (offset) {
         tcg_debug_assert(addr != TCG_REG_ZERO);
         if (offset == sextreg(offset, 0, 12)) {
             tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, offset);
         } else {
             tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
             tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP0, addr);
         }
         addr = TCG_REG_TMP0;
     }

     tcg_out32(s, opc | ((data & 0x1f) << 7) | (addr << 15) | (1 << 25));
}

>   static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>                          TCGReg arg1, intptr_t arg2)
>   {
> -    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
> +    RISCVInsn insn;
> +
> +    if (type < TCG_TYPE_V64) {
> +        insn = (type == TCG_TYPE_I32) ? OPC_LW : OPC_LD;
> +    } else {
> +        int nf = get_vec_type_bytes(type) / riscv_vlenb;
> +
> +        switch (nf) {
> +        case 1:
> +            insn = OPC_VL1RE64_V;
> +            break;
> +        case 2:
> +            insn = OPC_VL2RE64_V;
> +            break;
> +        case 4:
> +            insn = OPC_VL4RE64_V;
> +            break;
> +        case 8:
> +            insn = OPC_VL8RE64_V;
> +            break;
> +        default:
> +            {
> +                int prev_vsew = riscv_set_vec_config_vl(s, type);
> +
> +                switch (prev_vsew) {
> +                case MO_8:
> +                    insn = OPC_VLE8_V;
> +                    break;
> +                case MO_16:
> +                    insn = OPC_VLE16_V;
> +                    break;
> +                case MO_32:
> +                    insn = OPC_VLE32_V;
> +                    break;
> +                case MO_64:
> +                    insn = OPC_VLE64_V;
> +                    break;
> +                default:
> +                    g_assert_not_reached();
> +                }
> +            }
> +            break;

This can be simplified:

     switch (type) {
     case TCG_TYPE_I32:
         tcg_out_ldst(s, OPC_LW, data, base, offset);
         break;
     case TCG_TYPE_I64:
         tcg_out_ldst(s, OPC_LD, data, base, offset);
         break;
     case TCG_TYPE_V64:
     case TCG_TYPE_V128:
     case TCG_TYPE_V256:
         if (type >= riscv_lg2_vlenb) {
             static const RISCVInsn whole_reg_ld[] = {
                 OPC_VL1RE64_V, OPC_VL2RE64_V, OPC_VL4RE64_V, OPC_VL8RE64_V
             };
             unsigned idx = type - riscv_lg2_vlenb;
             insn = whole_reg_ld[idx];
         } else {
             static const RISCVInsn unit_stride_ld[] = {
                 OPC_VLE8_V, OPC_VLE16_V, OPC_VLE32_V, OPC_VLE64_V
             };
             MemOp prev_vsew = set_vtype_len(s, type);
             insn = unit_stride_ld[prev_vsew];
         }
         tcg_out_vec_ldst(s, insn, data, base, offset);
         break;
     default:
         g_assert_not_reached();
     }

and similar for store.


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 05/14] tcg/riscv: Implement vector load/store
  2024-09-05  6:39   ` Richard Henderson
@ 2024-09-10  3:04     ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  3:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/5 14:39, Richard Henderson wrote:
> On 9/4/24 07:27, LIU Zhiwei wrote:
>> @@ -811,31 +849,52 @@ static void tcg_out_extrl_i64_i32(TCGContext 
>> *s, TCGReg ret, TCGReg arg)
>>       tcg_out_ext32s(s, ret, arg);
>>   }
>>   -static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
>> -                         TCGReg addr, intptr_t offset)
>> +static intptr_t split_offset_scalar(TCGContext *s, TCGReg *addr,
>> +                                    intptr_t offset)
>>   {
>>       intptr_t imm12 = sextreg(offset, 0, 12);
>>         if (offset != imm12) {
>>           intptr_t diff = tcg_pcrel_diff(s, (void *)offset);
>>   -        if (addr == TCG_REG_ZERO && diff == (int32_t)diff) {
>> +        if (*addr == TCG_REG_ZERO && diff == (int32_t)diff) {
>>               imm12 = sextreg(diff, 0, 12);
>>               tcg_out_opc_upper(s, OPC_AUIPC, TCG_REG_TMP2, diff - 
>> imm12);
>>           } else {
>>               tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP2, offset - 
>> imm12);
>> -            if (addr != TCG_REG_ZERO) {
>> -                tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, 
>> TCG_REG_TMP2, addr);
>> +            if (*addr != TCG_REG_ZERO) {
>> +                tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP2, 
>> TCG_REG_TMP2, *addr);
>>               }
>>           }
>> -        addr = TCG_REG_TMP2;
>> +        *addr = TCG_REG_TMP2;
>> +    }
>> +    return imm12;
>> +}
>> +
>> +static void split_offset_vector(TCGContext *s, TCGReg *addr, 
>> intptr_t offset)
>> +{
>> +    if (offset != 0) {
>> +        if (offset == sextreg(offset, 0, 12)) {
>> +            tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, *addr, offset);
>> +        } else {
>> +            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
>> +            tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP0, 
>> *addr);
>> +        }
>> +        *addr = TCG_REG_TMP0;
>>       }
>> +}
>> +
>> +static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
>> +                         TCGReg addr, intptr_t offset)
>> +{
>> +    intptr_t imm12;
>>         switch (opc) {
>>       case OPC_SB:
>>       case OPC_SH:
>>       case OPC_SW:
>>       case OPC_SD:
>> +        imm12 = split_offset_scalar(s, &addr, offset);
>>           tcg_out_opc_store(s, opc, addr, data, imm12);
>>           break;
>>       case OPC_LB:
>> @@ -845,8 +904,31 @@ static void tcg_out_ldst(TCGContext *s, 
>> RISCVInsn opc, TCGReg data,
>>       case OPC_LW:
>>       case OPC_LWU:
>>       case OPC_LD:
>> +        imm12 = split_offset_scalar(s, &addr, offset);
>>           tcg_out_opc_imm(s, opc, data, addr, imm12);
>>           break;
>> +    case OPC_VSE8_V:
>> +    case OPC_VSE16_V:
>> +    case OPC_VSE32_V:
>> +    case OPC_VSE64_V:
>> +    case OPC_VS1R_V:
>> +    case OPC_VS2R_V:
>> +    case OPC_VS4R_V:
>> +    case OPC_VS8R_V:
>> +        split_offset_vector(s, &addr, offset);
>> +        tcg_out_opc_ldst_vec(s, opc, data, addr, true);
>> +        break;
>> +    case OPC_VLE8_V:
>> +    case OPC_VLE16_V:
>> +    case OPC_VLE32_V:
>> +    case OPC_VLE64_V:
>> +    case OPC_VL1RE64_V:
>> +    case OPC_VL2RE64_V:
>> +    case OPC_VL4RE64_V:
>> +    case OPC_VL8RE64_V:
>> +        split_offset_vector(s, &addr, offset);
>> +        tcg_out_opc_ldst_vec(s, opc, data, addr, true);
>> +        break;
>>       default:
>>           g_assert_not_reached();
>>       }
>
> This is more complicated than it needs to be, calling a combined 
> function, then using a switch to separate, then calling separate 
> functions.  Calling separate functions in the first place is simpler.  
> E.g.
>
> static void tcg_out_vec_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
>                              TCGReg addr, intptr_t offset)
> {
>     tcg_debug_assert(data >= TCG_REG_V0);
>     tcg_debug_assert(addr < TCG_REG_V0);
>
>     if (offset) {
>         tcg_debug_assert(addr != TCG_REG_ZERO);
>         if (offset == sextreg(offset, 0, 12)) {
>             tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, offset);
>         } else {
>             tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
>             tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP0, 
> addr);
>         }
>         addr = TCG_REG_TMP0;
>     }
>
>     tcg_out32(s, opc | ((data & 0x1f) << 7) | (addr << 15) | (1 << 25));
> }
>
>>   static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
>>                          TCGReg arg1, intptr_t arg2)
>>   {
>> -    RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
>> +    RISCVInsn insn;
>> +
>> +    if (type < TCG_TYPE_V64) {
>> +        insn = (type == TCG_TYPE_I32) ? OPC_LW : OPC_LD;
>> +    } else {
>> +        int nf = get_vec_type_bytes(type) / riscv_vlenb;
>> +
>> +        switch (nf) {
>> +        case 1:
>> +            insn = OPC_VL1RE64_V;
>> +            break;
>> +        case 2:
>> +            insn = OPC_VL2RE64_V;
>> +            break;
>> +        case 4:
>> +            insn = OPC_VL4RE64_V;
>> +            break;
>> +        case 8:
>> +            insn = OPC_VL8RE64_V;
>> +            break;
>> +        default:
>> +            {
>> +                int prev_vsew = riscv_set_vec_config_vl(s, type);
>> +
>> +                switch (prev_vsew) {
>> +                case MO_8:
>> +                    insn = OPC_VLE8_V;
>> +                    break;
>> +                case MO_16:
>> +                    insn = OPC_VLE16_V;
>> +                    break;
>> +                case MO_32:
>> +                    insn = OPC_VLE32_V;
>> +                    break;
>> +                case MO_64:
>> +                    insn = OPC_VLE64_V;
>> +                    break;
>> +                default:
>> +                    g_assert_not_reached();
>> +                }
>> +            }
>> +            break;
>
> This can be simplified:
>
>     switch (type) {
>     case TCG_TYPE_I32:
>         tcg_out_ldst(s, OPC_LW, data, base, offset);
>         break;
>     case TCG_TYPE_I64:
>         tcg_out_ldst(s, OPC_LD, data, base, offset);
>         break;
>     case TCG_TYPE_V64:
>     case TCG_TYPE_V128:
>     case TCG_TYPE_V256:
>         if (type >= riscv_lg2_vlenb) {
>             static const RISCVInsn whole_reg_ld[] = {
>                 OPC_VL1RE64_V, OPC_VL2RE64_V, OPC_VL4RE64_V, 
> OPC_VL8RE64_V
>             };
>             unsigned idx = type - riscv_lg2_vlenb;
>             insn = whole_reg_ld[idx];
>         } else {
>             static const RISCVInsn unit_stride_ld[] = {
>                 OPC_VLE8_V, OPC_VLE16_V, OPC_VLE32_V, OPC_VLE64_V
>             };
>             MemOp prev_vsew = set_vtype_len(s, type);
>             insn = unit_stride_ld[prev_vsew];
>         }
>         tcg_out_vec_ldst(s, insn, data, base, offset);
>         break;
>     default:
>         g_assert_not_reached();
>     }
>
> and similar for store.

Great. We will take this way.

Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i}
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (4 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  6:56   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.c.inc | 53 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 4b1079fc6f..ddb0c8190c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -309,6 +309,12 @@ typedef enum {
     OPC_VS2R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
     OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
     OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+
+    OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
+    OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
+    OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
+
+    OPC_VMVNR_V = 0x9e000057 | V_OPIVI,
 } RISCVInsn;
 
 /*
@@ -698,6 +704,21 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
     case TCG_TYPE_I64:
         tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
         break;
+    case TCG_TYPE_V64:
+    case TCG_TYPE_V128:
+    case TCG_TYPE_V256:
+        {
+            int nf = get_vec_type_bytes(type) / riscv_vlenb;
+
+            if (nf != 0) {
+                tcg_debug_assert(is_power_of_2(nf) && nf <= 8);
+                tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1, true);
+            } else {
+                riscv_set_vec_config_vl(s, type);
+                tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
+            }
+        }
+        break;
     default:
         g_assert_not_reached();
     }
@@ -1106,6 +1127,33 @@ static void tcg_out_addsub2(TCGContext *s,
     }
 }
 
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+                                   TCGReg dst, TCGReg src)
+{
+    riscv_set_vec_config_vl_vece(s, type, vece);
+    tcg_out_opc_vx(s, OPC_VMV_V_X, dst, TCG_REG_V0, src, true);
+    return true;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, TCGReg base, intptr_t offset)
+{
+    tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
+    return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
+
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+                                    TCGReg dst, int64_t arg)
+{
+    if (arg < 16 && arg >= -16) {
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VMV_V_I, dst, TCG_REG_V0, arg, true);
+        return;
+    }
+    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, arg);
+    tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
+}
+
 static const struct {
     RISCVInsn op;
     bool swap;
@@ -2234,6 +2282,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     a2 = args[2];
 
     switch (opc) {
+    case INDEX_op_dupm_vec:
+        tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+        break;
     case INDEX_op_ld_vec:
         tcg_out_ld(s, type, a0, a1, a2);
         break;
@@ -2405,6 +2456,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_st_vec:
         return C_O0_I2(v, r);
+    case INDEX_op_dup_vec:
+    case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
     default:
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i}
  2024-09-04 14:27 ` [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-09-05  6:56   ` Richard Henderson
  2024-09-10  1:13     ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  6:56 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> @@ -698,6 +704,21 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
>       case TCG_TYPE_I64:
>           tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
>           break;
> +    case TCG_TYPE_V64:
> +    case TCG_TYPE_V128:
> +    case TCG_TYPE_V256:
> +        {
> +            int nf = get_vec_type_bytes(type) / riscv_vlenb;
> +
> +            if (nf != 0) {
> +                tcg_debug_assert(is_power_of_2(nf) && nf <= 8);
> +                tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1, true);
> +            } else {
> +                riscv_set_vec_config_vl(s, type);
> +                tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, true);
> +            }
> +        }
> +        break;

Perhaps

         int lmul = type - riscv_lg2_vlenb;
         int nf = 1 << MIN(lmul, 0);
         tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1);

Is there a reason to prefer vmv.v.v over vmvnr.v?
Seems like we can always move one vector reg...

> +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
> +                                    TCGReg dst, int64_t arg)
> +{
> +    if (arg < 16 && arg >= -16) {
> +        riscv_set_vec_config_vl_vece(s, type, vece);
> +        tcg_out_opc_vi(s, OPC_VMV_V_I, dst, TCG_REG_V0, arg, true);
> +        return;
> +    }
> +    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, arg);
> +    tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
> +}

I'll note that 0 and -1 do not require SEW change. I don't know how often that will come 
up, since in my testing with aarch64, we usually needed to swap to TCG_TYPE_V256 anyway.


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i}
  2024-09-05  6:56   ` Richard Henderson
@ 2024-09-10  1:13     ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  1:13 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/5 14:56, Richard Henderson wrote:
> On 9/4/24 07:27, LIU Zhiwei wrote:
>> @@ -698,6 +704,21 @@ static bool tcg_out_mov(TCGContext *s, TCGType 
>> type, TCGReg ret, TCGReg arg)
>>       case TCG_TYPE_I64:
>>           tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
>>           break;
>> +    case TCG_TYPE_V64:
>> +    case TCG_TYPE_V128:
>> +    case TCG_TYPE_V256:
>> +        {
>> +            int nf = get_vec_type_bytes(type) / riscv_vlenb;
>> +
>> +            if (nf != 0) {
>> +                tcg_debug_assert(is_power_of_2(nf) && nf <= 8);
>> +                tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1, true);
>> +            } else {
>> +                riscv_set_vec_config_vl(s, type);
>> +                tcg_out_opc_vv(s, OPC_VMV_V_V, ret, TCG_REG_V0, arg, 
>> true);
>> +            }
>> +        }
>> +        break;
>
> Perhaps
>
>         int lmul = type - riscv_lg2_vlenb;
>         int nf = 1 << MIN(lmul, 0);
>         tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1);
>
> Is there a reason to prefer vmv.v.v over vmvnr.v?

I think it's a trade-off. For some CPUs,  instruction will be split 
internally. Thus the less the fraction lmul is, the less micro ops for 
execution.
That's the benefit of using vmv.v.v. But here we also need a vsetivli. 
On some cpus, it can be fusion-ed to the next instruction.

> Seems like we can always move one vector reg...
OK. I will take this way.
>
>> +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned 
>> vece,
>> +                                    TCGReg dst, int64_t arg)
>> +{
>> +    if (arg < 16 && arg >= -16) {
>> +        riscv_set_vec_config_vl_vece(s, type, vece);
>> +        tcg_out_opc_vi(s, OPC_VMV_V_I, dst, TCG_REG_V0, arg, true);
>> +        return;
>> +    }
>> +    tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, arg);
>> +    tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
>> +}
>
> I'll note that 0 and -1 do not require SEW change. I don't know how 
> often that will come up

On our test on OpenCV, we get a rate of 99.7%. Thus we will optimize 
this next version.

Thanks,
Zhiwei

> , since in my testing with aarch64, we usually needed to swap to 
> TCG_TYPE_V256 anyway.
>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 07/14] tcg/riscv: Add support for basic vector opcodes
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (5 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  6:57   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 46 ++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h         |  2 +-
 3 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index d73a62b0f2..d4504122a2 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -23,3 +23,4 @@ C_O1_I4(r, r, rI, rM, rM)
 C_O2_I4(r, r, rZ, rZ, rM, rM)
 C_O0_I2(v, r)
 C_O1_I1(v, r)
+C_O1_I1(v, v)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index ddb0c8190c..c89d1a5dc9 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -310,6 +310,13 @@ typedef enum {
     OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
     OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
 
+    OPC_VADD_VV = 0x57 | V_OPIVV,
+    OPC_VSUB_VV = 0x8000057 | V_OPIVV,
+    OPC_VAND_VV = 0x24000057 | V_OPIVV,
+    OPC_VOR_VV = 0x28000057 | V_OPIVV,
+    OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
+    OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -2291,6 +2298,30 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
     case INDEX_op_st_vec:
         tcg_out_st(s, type, a0, a1, a2);
         break;
+    case INDEX_op_add_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VADD_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sub_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSUB_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_and_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vv(s, OPC_VAND_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_or_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vv(s, OPC_VOR_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_xor_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_not_vec:
+        riscv_set_vec_config_vl(s, type);
+        tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2310,6 +2341,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
 {
     switch (opc) {
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+    case INDEX_op_not_vec:
+        return 1;
     default:
         return 0;
     }
@@ -2460,6 +2498,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
+    case INDEX_op_not_vec:
+        return C_O1_I1(v, v);
+    case INDEX_op_add_vec:
+    case INDEX_op_sub_vec:
+    case INDEX_op_and_vec:
+    case INDEX_op_or_vec:
+    case INDEX_op_xor_vec:
+        return C_O1_I2(v, v, v);
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 12a7a37aaa..acb8dfdf16 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -151,7 +151,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nand_vec         0
 #define TCG_TARGET_HAS_nor_vec          0
 #define TCG_TARGET_HAS_eqv_vec          0
-#define TCG_TARGET_HAS_not_vec          0
+#define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          0
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 07/14] tcg/riscv: Add support for basic vector opcodes
  2024-09-04 14:27 ` [PATCH v3 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-09-05  6:57   ` Richard Henderson
  0 siblings, 0 replies; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  6:57 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> 
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
>   tcg/riscv/tcg-target-con-set.h |  1 +
>   tcg/riscv/tcg-target.c.inc     | 46 ++++++++++++++++++++++++++++++++++
>   tcg/riscv/tcg-target.h         |  2 +-
>   3 files changed, 48 insertions(+), 1 deletion(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (6 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05  7:12   ` Richard Henderson
  2024-09-04 14:27 ` [PATCH v3 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
comparison instructions.

2.Extend comparison results from mask registers to SEW-width elements,
  following recommendations in The RISC-V SPEC Volume I (Version 20240411).

This aligns with TCG's cmp_vec behavior by expanding compare results to
full element width: all 1s for true, all 0s for false.

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |   3 +
 tcg/riscv/tcg-target-con-str.h |   2 +
 tcg/riscv/tcg-target.c.inc     | 252 +++++++++++++++++++++++++++------
 tcg/riscv/tcg-target.h         |   2 +-
 tcg/tcg-internal.h             |   2 +
 tcg/tcg-op-vec.c               |   2 +-
 6 files changed, 214 insertions(+), 49 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index d4504122a2..cc06102ccf 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -24,3 +24,6 @@ C_O2_I4(r, r, rZ, rZ, rM, rM)
 C_O0_I2(v, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
+C_O1_I2(v, v, v)
+C_O1_I2(v, v, vL)
+C_O1_I4(v, v, vL, vK, vK)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index b2b3211bcb..089efe96ca 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -17,6 +17,8 @@ REGS('v', ALL_VECTOR_REGS)
  */
 CONST('I', TCG_CT_CONST_S12)
 CONST('J', TCG_CT_CONST_J12)
+CONST('K', TCG_CT_CONST_S5)
+CONST('L', TCG_CT_CONST_CMP_VI)
 CONST('N', TCG_CT_CONST_N12)
 CONST('M', TCG_CT_CONST_M12)
 CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c89d1a5dc9..37909e56fb 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -106,11 +106,13 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
     return TCG_REG_A0 + slot;
 }
 
-#define TCG_CT_CONST_ZERO  0x100
-#define TCG_CT_CONST_S12   0x200
-#define TCG_CT_CONST_N12   0x400
-#define TCG_CT_CONST_M12   0x800
-#define TCG_CT_CONST_J12  0x1000
+#define TCG_CT_CONST_ZERO   0x100
+#define TCG_CT_CONST_S12    0x200
+#define TCG_CT_CONST_N12    0x400
+#define TCG_CT_CONST_M12    0x800
+#define TCG_CT_CONST_J12    0x1000
+#define TCG_CT_CONST_S5     0x2000
+#define TCG_CT_CONST_CMP_VI 0x4000
 
 #define ALL_GENERAL_REGS   MAKE_64BIT_MASK(0, 32)
 #define ALL_VECTOR_REGS    MAKE_64BIT_MASK(32, 32)
@@ -119,48 +121,6 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
 
 #define sextreg  sextract64
 
-/* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, int ct,
-                                   TCGType type, TCGCond cond, int vece)
-{
-    if (ct & TCG_CT_CONST) {
-        return 1;
-    }
-    if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
-        return 1;
-    }
-    /*
-     * Sign extended from 12 bits: [-0x800, 0x7ff].
-     * Used for most arithmetic, as this is the isa field.
-     */
-    if ((ct & TCG_CT_CONST_S12) && val >= -0x800 && val <= 0x7ff) {
-        return 1;
-    }
-    /*
-     * Sign extended from 12 bits, negated: [-0x7ff, 0x800].
-     * Used for subtraction, where a constant must be handled by ADDI.
-     */
-    if ((ct & TCG_CT_CONST_N12) && val >= -0x7ff && val <= 0x800) {
-        return 1;
-    }
-    /*
-     * Sign extended from 12 bits, +/- matching: [-0x7ff, 0x7ff].
-     * Used by addsub2 and movcond, which may need the negative value,
-     * and requires the modified constant to be representable.
-     */
-    if ((ct & TCG_CT_CONST_M12) && val >= -0x7ff && val <= 0x7ff) {
-        return 1;
-    }
-    /*
-     * Inverse of sign extended from 12 bits: ~[-0x800, 0x7ff].
-     * Used to map ANDN back to ANDI, etc.
-     */
-    if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
-        return 1;
-    }
-    return 0;
-}
-
 /*
  * RISC-V Base ISA opcodes (IM)
  */
@@ -310,6 +270,9 @@ typedef enum {
     OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
     OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
 
+    OPC_VMERGE_VIM = 0x5c000057 | V_OPIVI,
+    OPC_VMERGE_VVM = 0x5c000057 | V_OPIVV,
+
     OPC_VADD_VV = 0x57 | V_OPIVV,
     OPC_VSUB_VV = 0x8000057 | V_OPIVV,
     OPC_VAND_VV = 0x24000057 | V_OPIVV,
@@ -317,6 +280,29 @@ typedef enum {
     OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
+    OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
+    OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
+    OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
+    OPC_VMSNE_VV = 0x64000057 | V_OPIVV,
+    OPC_VMSNE_VI = 0x64000057 | V_OPIVI,
+    OPC_VMSNE_VX = 0x64000057 | V_OPIVX,
+
+    OPC_VMSLTU_VV = 0x68000057 | V_OPIVV,
+    OPC_VMSLTU_VX = 0x68000057 | V_OPIVX,
+    OPC_VMSLT_VV = 0x6c000057 | V_OPIVV,
+    OPC_VMSLT_VX = 0x6c000057 | V_OPIVX,
+    OPC_VMSLEU_VV = 0x70000057 | V_OPIVV,
+    OPC_VMSLEU_VX = 0x70000057 | V_OPIVX,
+    OPC_VMSLE_VV = 0x74000057 | V_OPIVV,
+    OPC_VMSLE_VX = 0x74000057 | V_OPIVX,
+
+    OPC_VMSLEU_VI = 0x70000057 | V_OPIVI,
+    OPC_VMSLE_VI = 0x74000057 | V_OPIVI,
+    OPC_VMSGTU_VI = 0x78000057 | V_OPIVI,
+    OPC_VMSGTU_VX = 0x78000057 | V_OPIVX,
+    OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
+    OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -324,6 +310,97 @@ typedef enum {
     OPC_VMVNR_V = 0x9e000057 | V_OPIVI,
 } RISCVInsn;
 
+static const struct {
+    RISCVInsn op;
+    bool swap;
+} tcg_cmpcond_to_rvv_vv[] = {
+    [TCG_COND_EQ] =  { OPC_VMSEQ_VV,  false },
+    [TCG_COND_NE] =  { OPC_VMSNE_VV,  false },
+    [TCG_COND_LT] =  { OPC_VMSLT_VV,  false },
+    [TCG_COND_GE] =  { OPC_VMSLE_VV,  true  },
+    [TCG_COND_GT] =  { OPC_VMSLT_VV,  true  },
+    [TCG_COND_LE] =  { OPC_VMSLE_VV,  false },
+    [TCG_COND_LTU] = { OPC_VMSLTU_VV, false },
+    [TCG_COND_GEU] = { OPC_VMSLEU_VV, true  },
+    [TCG_COND_GTU] = { OPC_VMSLTU_VV, true  },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VV, false }
+};
+
+static const struct {
+    RISCVInsn op;
+    int min;
+    int max;
+    bool adjust;
+}  tcg_cmpcond_to_rvv_vi[] = {
+    [TCG_COND_EQ]  = { OPC_VMSEQ_VI,  -16, 15, false },
+    [TCG_COND_NE]  = { OPC_VMSNE_VI,  -16, 15, false },
+    [TCG_COND_GT]  = { OPC_VMSGT_VI,  -16, 15, false },
+    [TCG_COND_LE]  = { OPC_VMSLE_VI,  -16, 15, false },
+    [TCG_COND_LT]  = { OPC_VMSLE_VI,  -15, 16, true  },
+    [TCG_COND_GE]  = { OPC_VMSGT_VI,  -15, 16, true  },
+    [TCG_COND_LEU] = { OPC_VMSLEU_VI,   0, 15, false },
+    [TCG_COND_GTU] = { OPC_VMSGTU_VI,   0, 15, false },
+    [TCG_COND_LTU] = { OPC_VMSLEU_VI,   1, 16, true  },
+    [TCG_COND_GEU] = { OPC_VMSGTU_VI,   1, 16, true  },
+};
+
+/* test if a constant matches the constraint */
+static bool tcg_target_const_match(int64_t val, int ct,
+                                   TCGType type, TCGCond cond, int vece)
+{
+    if (ct & TCG_CT_CONST) {
+        return 1;
+    }
+    if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+        return 1;
+    }
+    /*
+     * Sign extended from 12 bits: [-0x800, 0x7ff].
+     * Used for most arithmetic, as this is the isa field.
+     */
+    if ((ct & TCG_CT_CONST_S12) && val >= -0x800 && val <= 0x7ff) {
+        return 1;
+    }
+    /*
+     * Sign extended from 12 bits, negated: [-0x7ff, 0x800].
+     * Used for subtraction, where a constant must be handled by ADDI.
+     */
+    if ((ct & TCG_CT_CONST_N12) && val >= -0x7ff && val <= 0x800) {
+        return 1;
+    }
+    /*
+     * Sign extended from 12 bits, +/- matching: [-0x7ff, 0x7ff].
+     * Used by addsub2 and movcond, which may need the negative value,
+     * and requires the modified constant to be representable.
+     */
+    if ((ct & TCG_CT_CONST_M12) && val >= -0x7ff && val <= 0x7ff) {
+        return 1;
+    }
+    /*
+     * Inverse of sign extended from 12 bits: ~[-0x800, 0x7ff].
+     * Used to map ANDN back to ANDI, etc.
+     */
+    if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
+        return 1;
+    }
+    /*
+     * Sign extended from 5 bits: [-0x10, 0x0f].
+     * Used for vector-immediate.
+     */
+    if ((ct & TCG_CT_CONST_S5) && val >= -0x10 && val <= 0x0f) {
+        return 1;
+    }
+    /*
+     * Used for vector compare OPIVI instructions.
+     */
+    if ((ct & TCG_CT_CONST_CMP_VI) &&
+        val >= tcg_cmpcond_to_rvv_vi[cond].min &&
+        val <= tcg_cmpcond_to_rvv_vi[cond].max) {
+        return true;
+     }
+    return 0;
+}
+
 /*
  * RISC-V immediate and instruction encoders (excludes 16-bit RVC)
  */
@@ -592,6 +669,18 @@ static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, TCGReg vd,
     tcg_out32(s, encode_vi(opc, vd, imm, vs2, vm));
 }
 
+static void tcg_out_opc_vim_mask(TCGContext *s, RISCVInsn opc, TCGReg vd,
+                           TCGReg vs2, int32_t imm)
+{
+    tcg_out32(s, encode_vi(opc, vd, imm, vs2, false));
+}
+
+static void tcg_out_opc_vvm_mask(TCGContext *s, RISCVInsn opc, TCGReg vd,
+                           TCGReg vs2, TCGReg vs1)
+{
+    tcg_out32(s, encode_v(opc, vd, vs1, vs2, false));
+}
+
 /*
  * Only unit-stride addressing implemented; may extend in future.
  */
@@ -2322,6 +2411,51 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl(s, type);
         tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
         break;
+    case INDEX_op_cmpsel_vec:
+        TCGArg a3, a4;
+        int c3, c4;
+        TCGCond cond;
+
+        a3 = args[3];
+        a4 = args[4];
+        c3 = const_args[3];
+        c4 = const_args[4];
+        cond = args[5];
+        riscv_set_vec_config_vl_vece(s, type, vece);
+
+        /* Use only vmerge_vim if possible, by inverting the test. */
+        if (c4 && !c3) {
+            cond = tcg_invert_cond(cond);
+            a3 = a4;
+            a4 = args[3];
+            c3 = true;
+            c4 = false;
+        }
+
+        /* Perform the comparison into V0 mask. */
+        if (const_args[2]) {
+            tcg_out_opc_vi(s, tcg_cmpcond_to_rvv_vi[cond].op,
+                            TCG_REG_V0, a1,
+                            a2 - tcg_cmpcond_to_rvv_vi[cond].adjust, true);
+        } else if (tcg_cmpcond_to_rvv_vv[cond].swap) {
+            tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op,
+                            TCG_REG_V0, a2, a1, true);
+        } else {
+            tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op,
+                            TCG_REG_V0, a1, a2, true);
+        }
+        if (c3) {
+            if (c4) {
+                tcg_out_opc_vi(s, OPC_VMV_V_I, a0, TCG_REG_V0, a4, true);
+                a4 = a0;
+            }
+            /* vd[i] == v0.mask[i] ? imm : vs2[i] */
+            tcg_out_opc_vim_mask(s, OPC_VMERGE_VIM, a0, a4, a3);
+        } else {
+            /* vd[i] == v0.mask[i] ? vs1[i] : vs2[i] */
+            tcg_out_opc_vvm_mask(s, OPC_VMERGE_VVM, a0, a4, a3);
+        }
+        break;
     case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov.  */
     case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec.  */
     default:
@@ -2332,10 +2466,27 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
 void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
+    va_list va;
+    TCGv_vec v0, v1;
+    TCGArg a2, a3;
+
+    va_start(va, a0);
+    v0 = temp_tcgv_vec(arg_temp(a0));
+    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
+    a2 = va_arg(va, TCGArg);
+
     switch (opc) {
+    case INDEX_op_cmp_vec:
+        a3 = va_arg(va, TCGArg);
+        vec_gen_6(INDEX_op_cmpsel_vec, type, vece, tcgv_vec_arg(v0),
+                    tcgv_vec_arg(v1), a2,
+                    tcgv_i64_arg(tcg_constant_i64(-1)),
+                    tcgv_i64_arg(tcg_constant_i64(0)), a3);
+        break;
     default:
         g_assert_not_reached();
     }
+    va_end(va);
 }
 
 int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
@@ -2347,7 +2498,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
+    case INDEX_op_cmpsel_vec:
         return 1;
+    case INDEX_op_cmp_vec:
+        return -1;
     default:
         return 0;
     }
@@ -2506,6 +2660,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
         return C_O1_I2(v, v, v);
+    case INDEX_op_cmp_vec:
+        return C_O1_I2(v, v, vL);
+    case INDEX_op_cmpsel_vec:
+        return C_O1_I4(v, v, vL, vK, vK);
     default:
         g_assert_not_reached();
     }
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index acb8dfdf16..94034504b2 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -164,7 +164,7 @@ typedef enum {
 #define TCG_TARGET_HAS_sat_vec          0
 #define TCG_TARGET_HAS_minmax_vec       0
 #define TCG_TARGET_HAS_bitsel_vec       0
-#define TCG_TARGET_HAS_cmpsel_vec       0
+#define TCG_TARGET_HAS_cmpsel_vec       1
 
 #define TCG_TARGET_HAS_tst_vec          0
 
diff --git a/tcg/tcg-internal.h b/tcg/tcg-internal.h
index 9b0d982f65..512128e8a7 100644
--- a/tcg/tcg-internal.h
+++ b/tcg/tcg-internal.h
@@ -102,5 +102,7 @@ void tcg_gen_op6(TCGOpcode, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg, TCGArg);
 void vec_gen_2(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg);
 void vec_gen_3(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg, TCGArg);
 void vec_gen_4(TCGOpcode, TCGType, unsigned, TCGArg, TCGArg, TCGArg, TCGArg);
+void vec_gen_6(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r,
+                    TCGArg a, TCGArg b, TCGArg c, TCGArg d, TCGArg e);
 
 #endif /* TCG_INTERNAL_H */
diff --git a/tcg/tcg-op-vec.c b/tcg/tcg-op-vec.c
index 84af210bc0..851322878c 100644
--- a/tcg/tcg-op-vec.c
+++ b/tcg/tcg-op-vec.c
@@ -172,7 +172,7 @@ void vec_gen_4(TCGOpcode opc, TCGType type, unsigned vece,
     op->args[3] = c;
 }
 
-static void vec_gen_6(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r,
+void vec_gen_6(TCGOpcode opc, TCGType type, unsigned vece, TCGArg r,
                       TCGArg a, TCGArg b, TCGArg c, TCGArg d, TCGArg e)
 {
     TCGOp *op = tcg_emit_op(opc, 6);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops
  2024-09-04 14:27 ` [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
@ 2024-09-05  7:12   ` Richard Henderson
  2024-09-10  1:17     ` LIU Zhiwei
  0 siblings, 1 reply; 34+ messages in thread
From: Richard Henderson @ 2024-09-05  7:12 UTC (permalink / raw)
  To: LIU Zhiwei, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng

On 9/4/24 07:27, LIU Zhiwei wrote:
> @@ -2322,6 +2411,51 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>           riscv_set_vec_config_vl(s, type);
>           tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
>           break;
> +    case INDEX_op_cmpsel_vec:
> +        TCGArg a3, a4;
> +        int c3, c4;
> +        TCGCond cond;

While I suppose this compiles, it's not great to have new variables added randomly within 
a switch.  At minimum, add { } around the block, but consider breaking out a separate 
tcg_out_cmpsel function, akin to tcg_out_movcond et al.

> @@ -2332,10 +2466,27 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
>   void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
>                          TCGArg a0, ...)
>   {
> +    va_list va;
> +    TCGv_vec v0, v1;
> +    TCGArg a2, a3;
> +
> +    va_start(va, a0);
> +    v0 = temp_tcgv_vec(arg_temp(a0));
> +    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
> +    a2 = va_arg(va, TCGArg);
> +
>       switch (opc) {
> +    case INDEX_op_cmp_vec:
> +        a3 = va_arg(va, TCGArg);
> +        vec_gen_6(INDEX_op_cmpsel_vec, type, vece, tcgv_vec_arg(v0),
> +                    tcgv_vec_arg(v1), a2,
> +                    tcgv_i64_arg(tcg_constant_i64(-1)),
> +                    tcgv_i64_arg(tcg_constant_i64(0)), a3);
> +        break;
>       default:
>           g_assert_not_reached();
>       }
> +    va_end(va);
>   }

Better to use "TCGArg a0, a1".  Converting through arg_tmp + temp_tcgv_vec to v0/v1 and 
then undoing that with tcgv_vec_arg is confusing.


r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops
  2024-09-05  7:12   ` Richard Henderson
@ 2024-09-10  1:17     ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  1:17 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, TANG Tiancheng


On 2024/9/5 15:12, Richard Henderson wrote:
> On 9/4/24 07:27, LIU Zhiwei wrote:
>> @@ -2322,6 +2411,51 @@ static void tcg_out_vec_op(TCGContext *s, 
>> TCGOpcode opc,
>>           riscv_set_vec_config_vl(s, type);
>>           tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
>>           break;
>> +    case INDEX_op_cmpsel_vec:
>> +        TCGArg a3, a4;
>> +        int c3, c4;
>> +        TCGCond cond;
>
> While I suppose this compiles, it's not great to have new variables 
> added randomly within a switch.  At minimum, add { } around the block, 
> but consider breaking out a separate tcg_out_cmpsel function, akin to 
> tcg_out_movcond et al.
OK.
>
>> @@ -2332,10 +2466,27 @@ static void tcg_out_vec_op(TCGContext *s, 
>> TCGOpcode opc,
>>   void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
>>                          TCGArg a0, ...)
>>   {
>> +    va_list va;
>> +    TCGv_vec v0, v1;
>> +    TCGArg a2, a3;
>> +
>> +    va_start(va, a0);
>> +    v0 = temp_tcgv_vec(arg_temp(a0));
>> +    v1 = temp_tcgv_vec(arg_temp(va_arg(va, TCGArg)));
>> +    a2 = va_arg(va, TCGArg);
>> +
>>       switch (opc) {
>> +    case INDEX_op_cmp_vec:
>> +        a3 = va_arg(va, TCGArg);
>> +        vec_gen_6(INDEX_op_cmpsel_vec, type, vece, tcgv_vec_arg(v0),
>> +                    tcgv_vec_arg(v1), a2,
>> +                    tcgv_i64_arg(tcg_constant_i64(-1)),
>> +                    tcgv_i64_arg(tcg_constant_i64(0)), a3);
>> +        break;
>>       default:
>>           g_assert_not_reached();
>>       }
>> +    va_end(va);
>>   }
>
> Better to use "TCGArg a0, a1".  Converting through arg_tmp + 
> temp_tcgv_vec to v0/v1 and then undoing that with tcgv_vec_arg is 
> confusing.

OK.

Thanks,
Zhiwei

>
>
> r~


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 09/14] tcg/riscv: Implement vector neg ops
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (7 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.c.inc | 7 +++++++
 tcg/riscv/tcg-target.h     | 2 +-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 37909e56fb..60a5a21ff5 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -280,6 +280,7 @@ typedef enum {
     OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
+    OPC_VRSUB_VI = 0xc000057 | V_OPIVI,
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2411,6 +2412,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl(s, type);
         tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
         break;
+    case INDEX_op_neg_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vi(s, OPC_VRSUB_VI, a0, a1, 0, true);
+        break;
     case INDEX_op_cmpsel_vec:
         TCGArg a3, a4;
         int c3, c4;
@@ -2498,6 +2503,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
+    case INDEX_op_neg_vec:
     case INDEX_op_cmpsel_vec:
         return 1;
     case INDEX_op_cmp_vec:
@@ -2652,6 +2658,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_dupm_vec:
     case INDEX_op_ld_vec:
         return C_O1_I1(v, r);
+    case INDEX_op_neg_vec:
     case INDEX_op_not_vec:
         return C_O1_I1(v, v);
     case INDEX_op_add_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 94034504b2..ae10381e02 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -152,7 +152,7 @@ typedef enum {
 #define TCG_TARGET_HAS_nor_vec          0
 #define TCG_TARGET_HAS_eqv_vec          0
 #define TCG_TARGET_HAS_not_vec          1
-#define TCG_TARGET_HAS_neg_vec          0
+#define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          0
 #define TCG_TARGET_HAS_roti_vec         0
 #define TCG_TARGET_HAS_rots_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 10/14] tcg/riscv: Implement vector sat/mul ops
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (8 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.c.inc | 36 ++++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  4 ++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 60a5a21ff5..c31f92731c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -281,6 +281,12 @@ typedef enum {
     OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
 
     OPC_VRSUB_VI = 0xc000057 | V_OPIVI,
+    OPC_VMUL_VV = 0x94000057 | V_OPMVV,
+    OPC_VSADD_VV = 0x84000057 | V_OPIVV,
+    OPC_VSSUB_VV = 0x8c000057 | V_OPIVV,
+    OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
+    OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
+
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2416,6 +2422,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vi(s, OPC_VRSUB_VI, a0, a1, 0, true);
         break;
+    case INDEX_op_mul_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMUL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_ssadd_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSADD_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sssub_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSSUB_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_usadd_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSADDU_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_ussub_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_cmpsel_vec:
         TCGArg a3, a4;
         int c3, c4;
@@ -2504,6 +2530,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_xor_vec:
     case INDEX_op_not_vec:
     case INDEX_op_neg_vec:
+    case INDEX_op_mul_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
     case INDEX_op_cmpsel_vec:
         return 1;
     case INDEX_op_cmp_vec:
@@ -2666,6 +2697,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_and_vec:
     case INDEX_op_or_vec:
     case INDEX_op_xor_vec:
+    case INDEX_op_mul_vec:
+    case INDEX_op_ssadd_vec:
+    case INDEX_op_sssub_vec:
+    case INDEX_op_usadd_vec:
+    case INDEX_op_ussub_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_cmp_vec:
         return C_O1_I2(v, v, vL);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index ae10381e02..1d4d8878ce 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -160,8 +160,8 @@ typedef enum {
 #define TCG_TARGET_HAS_shi_vec          0
 #define TCG_TARGET_HAS_shs_vec          0
 #define TCG_TARGET_HAS_shv_vec          0
-#define TCG_TARGET_HAS_mul_vec          0
-#define TCG_TARGET_HAS_sat_vec          0
+#define TCG_TARGET_HAS_mul_vec          1
+#define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       0
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       1
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 11/14] tcg/riscv: Implement vector min/max ops
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (9 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.c.inc | 29 +++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  2 +-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c31f92731c..507f659fd6 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -287,6 +287,11 @@ typedef enum {
     OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
     OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
 
+    OPC_VMAX_VV = 0x1c000057 | V_OPIVV,
+    OPC_VMAXU_VV = 0x18000057 | V_OPIVV,
+    OPC_VMIN_VV = 0x14000057 | V_OPIVV,
+    OPC_VMINU_VV = 0x10000057 | V_OPIVV,
+
     OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
     OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
     OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2442,6 +2447,22 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_smax_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMAX_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_smin_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMIN_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_umax_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMAXU_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_umin_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_cmpsel_vec:
         TCGArg a3, a4;
         int c3, c4;
@@ -2535,6 +2556,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
     case INDEX_op_cmpsel_vec:
         return 1;
     case INDEX_op_cmp_vec:
@@ -2702,6 +2727,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_sssub_vec:
     case INDEX_op_usadd_vec:
     case INDEX_op_ussub_vec:
+    case INDEX_op_smax_vec:
+    case INDEX_op_smin_vec:
+    case INDEX_op_umax_vec:
+    case INDEX_op_umin_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_cmp_vec:
         return C_O1_I2(v, v, vL);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1d4d8878ce..7005099810 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -162,7 +162,7 @@ typedef enum {
 #define TCG_TARGET_HAS_shv_vec          0
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
-#define TCG_TARGET_HAS_minmax_vec       0
+#define TCG_TARGET_HAS_minmax_vec       1
 #define TCG_TARGET_HAS_bitsel_vec       0
 #define TCG_TARGET_HAS_cmpsel_vec       1
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 12/14] tcg/riscv: Implement vector shs/v ops
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (10 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.c.inc | 44 ++++++++++++++++++++++++++++++++++++++
 tcg/riscv/tcg-target.h     |  4 ++--
 2 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 507f659fd6..2dc6befe09 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -315,6 +315,13 @@ typedef enum {
     OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
     OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
 
+    OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+    OPC_VSLL_VX = 0x94000057 | V_OPIVX,
+    OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+    OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
+    OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+    OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
+
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
     OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
     OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -2463,6 +2470,30 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_shls_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vx(s, OPC_VSLL_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_shrs_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_sars_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vx(s, OPC_VSRA_VX, a0, a1, a2, true);
+        break;
+    case INDEX_op_shlv_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSLL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_shrv_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, a2, true);
+        break;
+    case INDEX_op_sarv_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
+        break;
     case INDEX_op_cmpsel_vec:
         TCGArg a3, a4;
         int c3, c4;
@@ -2560,6 +2591,12 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
     case INDEX_op_cmpsel_vec:
         return 1;
     case INDEX_op_cmp_vec:
@@ -2731,7 +2768,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_smin_vec:
     case INDEX_op_umax_vec:
     case INDEX_op_umin_vec:
+    case INDEX_op_shlv_vec:
+    case INDEX_op_shrv_vec:
+    case INDEX_op_sarv_vec:
         return C_O1_I2(v, v, v);
+    case INDEX_op_shls_vec:
+    case INDEX_op_shrs_vec:
+    case INDEX_op_sars_vec:
+        return C_O1_I2(v, v, r);
     case INDEX_op_cmp_vec:
         return C_O1_I2(v, v, vL);
     case INDEX_op_cmpsel_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 7005099810..3bd8b811ef 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -158,8 +158,8 @@ typedef enum {
 #define TCG_TARGET_HAS_rots_vec         0
 #define TCG_TARGET_HAS_rotv_vec         0
 #define TCG_TARGET_HAS_shi_vec          0
-#define TCG_TARGET_HAS_shs_vec          0
-#define TCG_TARGET_HAS_shv_vec          0
+#define TCG_TARGET_HAS_shs_vec          1
+#define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
 #define TCG_TARGET_HAS_sat_vec          1
 #define TCG_TARGET_HAS_minmax_vec       1
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 13/14] tcg/riscv: Implement vector roti/v/x shi ops
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (11 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-04 14:27 ` [PATCH v3 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
  2024-09-05 23:46 ` [PATCH v3 00/14] Add support for vector Alistair Francis
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
 tcg/riscv/tcg-target-con-set.h |  1 +
 tcg/riscv/tcg-target.c.inc     | 86 +++++++++++++++++++++++++++++++++-
 tcg/riscv/tcg-target.h         |  8 ++--
 3 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index cc06102ccf..53649f750c 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -25,5 +25,6 @@ C_O0_I2(v, r)
 C_O1_I1(v, r)
 C_O1_I1(v, v)
 C_O1_I2(v, v, v)
+C_O1_I2(v, v, r)
 C_O1_I2(v, v, vL)
 C_O1_I4(v, v, vL, vK, vK)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 2dc6befe09..c09055d514 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -316,10 +316,13 @@ typedef enum {
     OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
 
     OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+    OPC_VSLL_VI = 0x94000057 | V_OPIVI,
     OPC_VSLL_VX = 0x94000057 | V_OPIVX,
     OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+    OPC_VSRL_VI = 0xa0000057 | V_OPIVI,
     OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
     OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+    OPC_VSRA_VI = 0xa4000057 | V_OPIVI,
     OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
 
     OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
@@ -2494,6 +2497,33 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
         riscv_set_vec_config_vl_vece(s, type, vece);
         tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
         break;
+    case INDEX_op_shli_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        if (a2 > 31) {
+            tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, a2);
+            tcg_out_opc_vx(s, OPC_VSLL_VX, a0, a1, TCG_REG_TMP0, true);
+        } else {
+            tcg_out_opc_vi(s, OPC_VSLL_VI, a0, a1, a2, true);
+        }
+        break;
+    case INDEX_op_shri_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        if (a2 > 31) {
+            tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, a2);
+            tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, TCG_REG_TMP0, true);
+        } else {
+            tcg_out_opc_vi(s, OPC_VSRL_VI, a0, a1, a2, true);
+        }
+        break;
+    case INDEX_op_sari_vec:
+        riscv_set_vec_config_vl_vece(s, type, vece);
+        if (a2 > 31) {
+            tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, a2);
+            tcg_out_opc_vx(s, OPC_VSRA_VX, a0, a1, TCG_REG_TMP0, true);
+        } else {
+            tcg_out_opc_vi(s, OPC_VSRA_VI, a0, a1, a2, true);
+        }
+        break;
     case INDEX_op_cmpsel_vec:
         TCGArg a3, a4;
         int c3, c4;
@@ -2550,7 +2580,8 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                        TCGArg a0, ...)
 {
     va_list va;
-    TCGv_vec v0, v1;
+    TCGv_vec v0, v1, v2, t1;
+    TCGv_i32 t2;
     TCGArg a2, a3;
 
     va_start(va, a0);
@@ -2566,6 +2597,45 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
                     tcgv_i64_arg(tcg_constant_i64(-1)),
                     tcgv_i64_arg(tcg_constant_i64(0)), a3);
         break;
+    case INDEX_op_rotli_vec:
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_shli_vec(vece, t1, v1, a2);
+        tcg_gen_shri_vec(vece, v0, v1, (8 << vece) - a2);
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
+    case INDEX_op_rotls_vec:
+        t1 = tcg_temp_new_vec(type);
+        t2 = tcg_temp_new_i32();
+        tcg_gen_neg_i32(t2, temp_tcgv_i32(arg_temp(a2)));
+        tcg_gen_shrs_vec(vece, v0, v1, t2);
+        tcg_gen_shls_vec(vece, t1, v1, temp_tcgv_i32(arg_temp(a2)));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        tcg_temp_free_i32(t2);
+        break;
+    case INDEX_op_rotlv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_neg_vec(vece, t1, v2);
+        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
+    case INDEX_op_rotrv_vec:
+        v2 = temp_tcgv_vec(arg_temp(a2));
+        t1 = tcg_temp_new_vec(type);
+        tcg_gen_neg_vec(vece, t1, v2);
+        vec_gen_3(INDEX_op_shlv_vec, type, vece, tcgv_vec_arg(t1),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(t1));
+        vec_gen_3(INDEX_op_shrv_vec, type, vece, tcgv_vec_arg(v0),
+                  tcgv_vec_arg(v1), tcgv_vec_arg(v2));
+        tcg_gen_or_vec(vece, v0, v0, t1);
+        tcg_temp_free_vec(t1);
+        break;
     default:
         g_assert_not_reached();
     }
@@ -2597,8 +2667,15 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_sari_vec:
     case INDEX_op_cmpsel_vec:
         return 1;
+    case INDEX_op_rotls_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
+    case INDEX_op_rotli_vec:
     case INDEX_op_cmp_vec:
         return -1;
     default:
@@ -2753,6 +2830,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
         return C_O1_I1(v, r);
     case INDEX_op_neg_vec:
     case INDEX_op_not_vec:
+    case INDEX_op_rotli_vec:
+    case INDEX_op_shli_vec:
+    case INDEX_op_shri_vec:
+    case INDEX_op_sari_vec:
         return C_O1_I1(v, v);
     case INDEX_op_add_vec:
     case INDEX_op_sub_vec:
@@ -2771,10 +2852,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_shlv_vec:
     case INDEX_op_shrv_vec:
     case INDEX_op_sarv_vec:
+    case INDEX_op_rotlv_vec:
+    case INDEX_op_rotrv_vec:
         return C_O1_I2(v, v, v);
     case INDEX_op_shls_vec:
     case INDEX_op_shrs_vec:
     case INDEX_op_sars_vec:
+    case INDEX_op_rotls_vec:
         return C_O1_I2(v, v, r);
     case INDEX_op_cmp_vec:
         return C_O1_I2(v, v, vL);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 3bd8b811ef..082942d858 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -154,10 +154,10 @@ typedef enum {
 #define TCG_TARGET_HAS_not_vec          1
 #define TCG_TARGET_HAS_neg_vec          1
 #define TCG_TARGET_HAS_abs_vec          0
-#define TCG_TARGET_HAS_roti_vec         0
-#define TCG_TARGET_HAS_rots_vec         0
-#define TCG_TARGET_HAS_rotv_vec         0
-#define TCG_TARGET_HAS_shi_vec          0
+#define TCG_TARGET_HAS_roti_vec         -1
+#define TCG_TARGET_HAS_rots_vec         -1
+#define TCG_TARGET_HAS_rotv_vec         -1
+#define TCG_TARGET_HAS_shi_vec          1
 #define TCG_TARGET_HAS_shs_vec          1
 #define TCG_TARGET_HAS_shv_vec          1
 #define TCG_TARGET_HAS_mul_vec          1
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v3 14/14] tcg/riscv: Enable native vector support for TCG host
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (12 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
@ 2024-09-04 14:27 ` LIU Zhiwei
  2024-09-05 23:46 ` [PATCH v3 00/14] Add support for vector Alistair Francis
  14 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-04 14:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
	bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng

From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/riscv/tcg-target.h | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 082942d858..099b7aa705 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -143,9 +143,11 @@ typedef enum {
 #define TCG_TARGET_HAS_tst              0
 
 /* vector instructions */
-#define TCG_TARGET_HAS_v64              0
-#define TCG_TARGET_HAS_v128             0
-#define TCG_TARGET_HAS_v256             0
+#define have_rvv    (cpuinfo & CPUINFO_ZVE64X)
+
+#define TCG_TARGET_HAS_v64              have_rvv
+#define TCG_TARGET_HAS_v128             have_rvv
+#define TCG_TARGET_HAS_v256             have_rvv
 #define TCG_TARGET_HAS_andc_vec         0
 #define TCG_TARGET_HAS_orc_vec          0
 #define TCG_TARGET_HAS_nand_vec         0
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/14] Add support for vector
  2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
                   ` (13 preceding siblings ...)
  2024-09-04 14:27 ` [PATCH v3 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
@ 2024-09-05 23:46 ` Alistair Francis
  2024-09-10  3:08   ` LIU Zhiwei
  14 siblings, 1 reply; 34+ messages in thread
From: Alistair Francis @ 2024-09-05 23:46 UTC (permalink / raw)
  To: LIU Zhiwei
  Cc: qemu-devel, qemu-riscv, palmer, alistair.francis, dbarboza,
	liwei1518, bmeng.cn, richard.henderson, TANG Tiancheng

On Thu, Sep 5, 2024 at 12:29 AM LIU Zhiwei <zhiwei_liu@linux.alibaba.com> wrote:
>
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>

Can you please mention RISC-V in the cover letter title. Otherwise
it's not obvious that this is RISC-V specific

Alistair

>
> This patch set introduces support for the RISC-V vector extension
> in TCG backend for RISC-V targets.
>
> v3:
>   1. Use the .insn form in cpuinfo probing.
>
>   2. Use reserved_regs to constrain the register group index instead of using constrain.
>
>   3. Avoid using macros to implement functions whenever possible.
>
>   4. Rename vtypei to vtype.
>
>   5. Move the __thread prev_vtype variable to TCGContext.
>
>   6. Support fractional LMUL setting, but since MF2 has a minimum ELEN of 32,
>     restrict fractional LMUL to cases where SEW < 64.
>
>   7. Handle vector load/store imm12 split in a different function.
>
>   8. Remove compare vx and implement INDEX_op_cmpsel_vec for INDEX_op_cmp_vec in a more concise way.
>
>   9. Move the implementation of shi_vec from tcg_expand_vec_op to tcg_out_vec_op.
>
>   10. Address some formatting issues.
>
> v2:
>   https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00679.html
>
> v1:
>   https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00205.html
>
> Swung0x48 (1):
>   tcg/riscv: Add basic support for vector
>
> TANG Tiancheng (13):
>   tcg/op-gvec: Fix iteration step in 32-bit operation
>   util: Add RISC-V vector extension probe in cpuinfo
>   tcg/riscv: Add riscv vset{i}vli support
>   tcg/riscv: Implement vector load/store
>   tcg/riscv: Implement vector mov/dup{m/i}
>   tcg/riscv: Add support for basic vector opcodes
>   tcg/riscv: Implement vector cmp ops
>   tcg/riscv: Implement vector neg ops
>   tcg/riscv: Implement vector sat/mul ops
>   tcg/riscv: Implement vector min/max ops
>   tcg/riscv: Implement vector shs/v ops
>   tcg/riscv: Implement vector roti/v/x shi ops
>   tcg/riscv: Enable native vector support for TCG host
>
>  host/include/riscv/host/cpuinfo.h |    3 +
>  include/tcg/tcg.h                 |    3 +
>  tcg/riscv/tcg-target-con-set.h    |    7 +
>  tcg/riscv/tcg-target-con-str.h    |    3 +
>  tcg/riscv/tcg-target.c.inc        | 1047 ++++++++++++++++++++++++++---
>  tcg/riscv/tcg-target.h            |   80 ++-
>  tcg/riscv/tcg-target.opc.h        |   12 +
>  tcg/tcg-internal.h                |    2 +
>  tcg/tcg-op-gvec.c                 |    2 +-
>  tcg/tcg-op-vec.c                  |    2 +-
>  util/cpuinfo-riscv.c              |   26 +-
>  11 files changed, 1062 insertions(+), 125 deletions(-)
>  create mode 100644 tcg/riscv/tcg-target.opc.h
>
> --
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 00/14] Add support for vector
  2024-09-05 23:46 ` [PATCH v3 00/14] Add support for vector Alistair Francis
@ 2024-09-10  3:08   ` LIU Zhiwei
  0 siblings, 0 replies; 34+ messages in thread
From: LIU Zhiwei @ 2024-09-10  3:08 UTC (permalink / raw)
  To: Alistair Francis
  Cc: qemu-devel, qemu-riscv, palmer, alistair.francis, dbarboza,
	liwei1518, bmeng.cn, richard.henderson, TANG Tiancheng


On 2024/9/6 7:46, Alistair Francis wrote:
> On Thu, Sep 5, 2024 at 12:29 AM LIU Zhiwei <zhiwei_liu@linux.alibaba.com> wrote:
>> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Can you please mention RISC-V in the cover letter title. Otherwise
> it's not obvious that this is RISC-V specific
Sorry. I miss it.

Thanks,
Zhiwei

>
> Alistair
>
>> This patch set introduces support for the RISC-V vector extension
>> in TCG backend for RISC-V targets.
>>
>> v3:
>>    1. Use the .insn form in cpuinfo probing.
>>
>>    2. Use reserved_regs to constrain the register group index instead of using constrain.
>>
>>    3. Avoid using macros to implement functions whenever possible.
>>
>>    4. Rename vtypei to vtype.
>>
>>    5. Move the __thread prev_vtype variable to TCGContext.
>>
>>    6. Support fractional LMUL setting, but since MF2 has a minimum ELEN of 32,
>>      restrict fractional LMUL to cases where SEW < 64.
>>
>>    7. Handle vector load/store imm12 split in a different function.
>>
>>    8. Remove compare vx and implement INDEX_op_cmpsel_vec for INDEX_op_cmp_vec in a more concise way.
>>
>>    9. Move the implementation of shi_vec from tcg_expand_vec_op to tcg_out_vec_op.
>>
>>    10. Address some formatting issues.
>>
>> v2:
>>    https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00679.html
>>
>> v1:
>>    https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00205.html
>>
>> Swung0x48 (1):
>>    tcg/riscv: Add basic support for vector
>>
>> TANG Tiancheng (13):
>>    tcg/op-gvec: Fix iteration step in 32-bit operation
>>    util: Add RISC-V vector extension probe in cpuinfo
>>    tcg/riscv: Add riscv vset{i}vli support
>>    tcg/riscv: Implement vector load/store
>>    tcg/riscv: Implement vector mov/dup{m/i}
>>    tcg/riscv: Add support for basic vector opcodes
>>    tcg/riscv: Implement vector cmp ops
>>    tcg/riscv: Implement vector neg ops
>>    tcg/riscv: Implement vector sat/mul ops
>>    tcg/riscv: Implement vector min/max ops
>>    tcg/riscv: Implement vector shs/v ops
>>    tcg/riscv: Implement vector roti/v/x shi ops
>>    tcg/riscv: Enable native vector support for TCG host
>>
>>   host/include/riscv/host/cpuinfo.h |    3 +
>>   include/tcg/tcg.h                 |    3 +
>>   tcg/riscv/tcg-target-con-set.h    |    7 +
>>   tcg/riscv/tcg-target-con-str.h    |    3 +
>>   tcg/riscv/tcg-target.c.inc        | 1047 ++++++++++++++++++++++++++---
>>   tcg/riscv/tcg-target.h            |   80 ++-
>>   tcg/riscv/tcg-target.opc.h        |   12 +
>>   tcg/tcg-internal.h                |    2 +
>>   tcg/tcg-op-gvec.c                 |    2 +-
>>   tcg/tcg-op-vec.c                  |    2 +-
>>   util/cpuinfo-riscv.c              |   26 +-
>>   11 files changed, 1062 insertions(+), 125 deletions(-)
>>   create mode 100644 tcg/riscv/tcg-target.opc.h
>>
>> --
>> 2.43.0
>>
>>


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-09-10  7:05 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-04 14:27 [PATCH v3 00/14] Add support for vector LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 01/14] tcg/op-gvec: Fix iteration step in 32-bit operation LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 02/14] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
2024-09-05  3:34   ` Richard Henderson
2024-09-09  7:18     ` LIU Zhiwei
2024-09-09 15:45       ` Richard Henderson
2024-09-10  2:47         ` LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 03/14] tcg/riscv: Add basic support for vector LIU Zhiwei
2024-09-05  4:05   ` Richard Henderson
2024-09-10  2:49     ` LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 04/14] tcg/riscv: Add riscv vset{i}vli support LIU Zhiwei
2024-09-05  6:03   ` Richard Henderson
2024-09-10  2:46     ` LIU Zhiwei
2024-09-10  4:34       ` Richard Henderson
2024-09-10  7:03         ` LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 05/14] tcg/riscv: Implement vector load/store LIU Zhiwei
2024-09-05  6:39   ` Richard Henderson
2024-09-10  3:04     ` LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 06/14] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
2024-09-05  6:56   ` Richard Henderson
2024-09-10  1:13     ` LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 07/14] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
2024-09-05  6:57   ` Richard Henderson
2024-09-04 14:27 ` [PATCH v3 08/14] tcg/riscv: Implement vector cmp ops LIU Zhiwei
2024-09-05  7:12   ` Richard Henderson
2024-09-10  1:17     ` LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 09/14] tcg/riscv: Implement vector neg ops LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 10/14] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 11/14] tcg/riscv: Implement vector min/max ops LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 12/14] tcg/riscv: Implement vector shs/v ops LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 13/14] tcg/riscv: Implement vector roti/v/x shi ops LIU Zhiwei
2024-09-04 14:27 ` [PATCH v3 14/14] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
2024-09-05 23:46 ` [PATCH v3 00/14] Add support for vector Alistair Francis
2024-09-10  3:08   ` LIU Zhiwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).