* [PATCH v4 00/12] tcg/riscv: Add support for vector
@ 2024-09-11 13:26 LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
` (11 more replies)
0 siblings, 12 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
This patch set introduces support for the RISC-V vector extension
in TCG backend for RISC-V targets.
v4:
1. Move the implementation of roti/s/v_vec from tcg_expand_vec_op to
tcg_out_vec_op, not just shi_vec.
2. Put shi and shs/v in the same patch.
3. Put load/store and vset in the same patch.
4. Change riscv_vlenb to riscv_lg2_vlenb and simplify the probe.
5. Provide stubs for the required functions and merge the functions'
usage and definitions into one patch.
6. Replace riscv_host_vtype with riscv_cur_vsew and riscv_cur_type,
and improve the setting of vtype.
7. Call separate functions(tcg_out_vec_ldst and tcg_out_ldst)
in tcg_out_ld and tcg_out_st.
8. Optimize dupi_vec for cases where arg = 0 and arg = -1.
9. Use tcg_out_cmpsel instead of the switch statement.
10. Ensure that every single patch can compile.
11. Remove "tcg/op-gvec: Fix iteration step in 32-bit operation" as
it has been incorporated into "tcg: Improve support for cmpsel_vec"
(https://lists.gnu.org/archive/html/qemu-devel/2024-09/msg01281.html).
This patch set depends on that patch set.
v3:
https://lists.gnu.org/archive/html/qemu-riscv/2024-09/msg00060.html
v2:
https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00679.html
v1:
https://lists.gnu.org/archive/html/qemu-riscv/2024-08/msg00205.html
Swung0x48 (1):
tcg/riscv: Add basic support for vector
TANG Tiancheng (11):
util: Add RISC-V vector extension probe in cpuinfo
tcg/riscv: Add vset{i}vli and ld/st vec ops
tcg/riscv: Implement vector mov/dup{m/i}
tcg/riscv: Add support for basic vector opcodes
tcg/riscv: Implement vector cmp/cmpsel ops
tcg/riscv: Implement vector neg ops
tcg/riscv: Implement vector sat/mul ops
tcg/riscv: Implement vector min/max ops
tcg/riscv: Implement vector shi/s/v ops
tcg/riscv: Implement vector roti/v/x ops
tcg/riscv: Enable native vector support for TCG host
host/include/riscv/host/cpuinfo.h | 2 +
include/tcg/tcg.h | 7 +
tcg/riscv/tcg-target-con-set.h | 7 +
tcg/riscv/tcg-target-con-str.h | 3 +
tcg/riscv/tcg-target.c.inc | 950 +++++++++++++++++++++++++++---
tcg/riscv/tcg-target.h | 80 +--
tcg/riscv/tcg-target.opc.h | 12 +
util/cpuinfo-riscv.c | 24 +-
8 files changed, 966 insertions(+), 119 deletions(-)
create mode 100644 tcg/riscv/tcg-target.opc.h
--
2.43.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 18:34 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 02/12] tcg/riscv: Add basic support for vector LIU Zhiwei
` (10 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Add support for probing RISC-V vector extension availability in
the backend. This information will be used when deciding whether
to use vector instructions in code generation.
Cache lg2(vlenb) for the backend. The storing of lg2(vlenb) means
we can convert all of the division into subtraction.
While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
we use RISCV_HWPROBE_IMA_V instead.
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
host/include/riscv/host/cpuinfo.h | 2 ++
util/cpuinfo-riscv.c | 24 ++++++++++++++++++++++--
2 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/host/include/riscv/host/cpuinfo.h b/host/include/riscv/host/cpuinfo.h
index 2b00660e36..cdc784e7b6 100644
--- a/host/include/riscv/host/cpuinfo.h
+++ b/host/include/riscv/host/cpuinfo.h
@@ -10,9 +10,11 @@
#define CPUINFO_ZBA (1u << 1)
#define CPUINFO_ZBB (1u << 2)
#define CPUINFO_ZICOND (1u << 3)
+#define CPUINFO_ZVE64X (1u << 4)
/* Initialized with a constructor. */
extern unsigned cpuinfo;
+extern unsigned riscv_lg2_vlenb;
/*
* We cannot rely on constructor ordering, so other constructors must
diff --git a/util/cpuinfo-riscv.c b/util/cpuinfo-riscv.c
index 497ce12680..bab782745b 100644
--- a/util/cpuinfo-riscv.c
+++ b/util/cpuinfo-riscv.c
@@ -4,6 +4,7 @@
*/
#include "qemu/osdep.h"
+#include "qemu/host-utils.h"
#include "host/cpuinfo.h"
#ifdef CONFIG_ASM_HWPROBE_H
@@ -12,6 +13,7 @@
#endif
unsigned cpuinfo;
+unsigned riscv_lg2_vlenb;
static volatile sig_atomic_t got_sigill;
static void sigill_handler(int signo, siginfo_t *si, void *data)
@@ -33,7 +35,7 @@ static void sigill_handler(int signo, siginfo_t *si, void *data)
/* Called both as constructor and (possibly) via other constructors. */
unsigned __attribute__((constructor)) cpuinfo_init(void)
{
- unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND;
+ unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND | CPUINFO_ZVE64X;
unsigned info = cpuinfo;
if (info) {
@@ -49,6 +51,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
#endif
#if defined(__riscv_arch_test) && defined(__riscv_zicond)
info |= CPUINFO_ZICOND;
+#endif
+#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
+ info |= CPUINFO_ZVE64X;
#endif
left &= ~info;
@@ -64,7 +69,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
&& pair.key >= 0) {
info |= pair.value & RISCV_HWPROBE_EXT_ZBA ? CPUINFO_ZBA : 0;
info |= pair.value & RISCV_HWPROBE_EXT_ZBB ? CPUINFO_ZBB : 0;
- left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
+ info |= pair.value & RISCV_HWPROBE_IMA_V ? CPUINFO_ZVE64X : 0;
+ left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
#ifdef RISCV_HWPROBE_EXT_ZICOND
info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ? CPUINFO_ZICOND : 0;
left &= ~CPUINFO_ZICOND;
@@ -112,6 +118,20 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
assert(left == 0);
}
+ if (info & CPUINFO_ZVE64X) {
+ /*
+ * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
+ * We are guaranteed by Zve64x that VLEN >= 64, and that
+ * EEW of {8,16,32,64} are supported.
+ *
+ * Cache VLEN in a convenient form.
+ */
+ unsigned long vlenb;
+ /* Read csr "vlenb" with "csrr %0, vlenb" : "=r"(vlenb) */
+ asm volatile(".insn i 0x73, 0x2, %0, zero, -990" : "=r"(vlenb));
+ riscv_lg2_vlenb = ctz32(vlenb);
+ }
+
info |= CPUINFO_ALWAYS;
cpuinfo = info;
return info;
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 18:41 ` Richard Henderson
2024-09-20 11:26 ` Daniel Henrique Barboza
2024-09-11 13:26 ` [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops LIU Zhiwei
` (9 subsequent siblings)
11 siblings, 2 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, Swung0x48,
TANG Tiancheng
From: Swung0x48 <swung0x48@outlook.com>
The RISC-V vector instruction set utilizes the LMUL field to group
multiple registers, enabling variable-length vector registers. This
implementation uses only the first register number of each group while
reserving the other register numbers within the group.
In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
host runtime needs to adjust LMUL based on the type to use different
register groups.
This presents challenges for TCG's register allocation. Currently, we
avoid modifying the register allocation part of TCG and only expose the
minimum number of vector registers.
For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
LMUL equal to 4, we use 4 vector registers as one register group. We can
use a maximum of 8 register groups, but the V0 register number is reserved
as a mask register, so we can effectively use at most 7 register groups.
Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
forced to be used. This is because TCG cannot yet dynamically constrain
registers with type; likewise, when the host vlen is 128 bits and
TCG_TYPE_V256, we can use at most 15 registers.
There is not much pressure on vector register allocation in TCG now, so
using 7 registers is feasible and will not have a major impact on code
generation.
This patch:
1. Reserves vector register 0 for use as a mask register.
2. When using register groups, reserves the additional registers within
each group.
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Co-authored-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
tcg/riscv/tcg-target-con-str.h | 1 +
tcg/riscv/tcg-target.c.inc | 126 ++++++++++++++++++++++++---------
tcg/riscv/tcg-target.h | 78 +++++++++++---------
tcg/riscv/tcg-target.opc.h | 12 ++++
4 files changed, 151 insertions(+), 66 deletions(-)
create mode 100644 tcg/riscv/tcg-target.opc.h
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index d5c419dff1..b2b3211bcb 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -9,6 +9,7 @@
* REGS(letter, register_mask)
*/
REGS('r', ALL_GENERAL_REGS)
+REGS('v', ALL_VECTOR_REGS)
/*
* Define constraint letters for constants:
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index d334857226..966d1ad981 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -32,38 +32,14 @@
#ifdef CONFIG_DEBUG_TCG
static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
- "zero",
- "ra",
- "sp",
- "gp",
- "tp",
- "t0",
- "t1",
- "t2",
- "s0",
- "s1",
- "a0",
- "a1",
- "a2",
- "a3",
- "a4",
- "a5",
- "a6",
- "a7",
- "s2",
- "s3",
- "s4",
- "s5",
- "s6",
- "s7",
- "s8",
- "s9",
- "s10",
- "s11",
- "t3",
- "t4",
- "t5",
- "t6"
+ "zero", "ra", "sp", "gp", "tp", "t0", "t1", "t2",
+ "s0", "s1", "a0", "a1", "a2", "a3", "a4", "a5",
+ "a6", "a7", "s2", "s3", "s4", "s5", "s6", "s7",
+ "s8", "s9", "s10", "s11", "t3", "t4", "t5", "t6",
+ "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7",
+ "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15",
+ "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",
+ "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",
};
#endif
@@ -100,6 +76,16 @@ static const int tcg_target_reg_alloc_order[] = {
TCG_REG_A5,
TCG_REG_A6,
TCG_REG_A7,
+
+ /* Vector registers and TCG_REG_V0 reserved for mask. */
+ TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, TCG_REG_V4,
+ TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, TCG_REG_V8,
+ TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, TCG_REG_V12,
+ TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, TCG_REG_V16,
+ TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, TCG_REG_V20,
+ TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, TCG_REG_V24,
+ TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, TCG_REG_V28,
+ TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
};
static const int tcg_target_call_iarg_regs[] = {
@@ -127,6 +113,9 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
#define TCG_CT_CONST_J12 0x1000
#define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32)
+#define ALL_VECTOR_REGS MAKE_64BIT_MASK(32, 32)
+#define ALL_DVECTOR_REG_GROUPS 0x5555555500000000
+#define ALL_QVECTOR_REG_GROUPS 0x1111111100000000
#define sextreg sextract64
@@ -766,6 +755,23 @@ static void tcg_out_addsub2(TCGContext *s,
}
}
+static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg dst, TCGReg src)
+{
+ return false;
+}
+
+static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg dst, TCGReg base, intptr_t offset)
+{
+ return false;
+}
+
+static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
+ TCGReg dst, int64_t arg)
+{
+}
+
static const struct {
RISCVInsn op;
bool swap;
@@ -1881,6 +1887,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
}
}
+static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
+ unsigned vecl, unsigned vece,
+ const TCGArg args[TCG_MAX_OP_ARGS],
+ const int const_args[TCG_MAX_OP_ARGS])
+{
+ switch (opc) {
+ case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */
+ case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */
+ default:
+ g_assert_not_reached();
+ }
+}
+
+void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
+ TCGArg a0, ...)
+{
+ switch (opc) {
+ default:
+ g_assert_not_reached();
+ }
+}
+
+int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
+{
+ switch (opc) {
+ default:
+ return 0;
+ }
+}
+
static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
{
switch (op) {
@@ -2100,6 +2136,30 @@ static void tcg_target_init(TCGContext *s)
{
tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
+ s->reserved_regs = 0;
+
+ switch (riscv_lg2_vlenb) {
+ case TCG_TYPE_V64:
+ tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+ tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
+ tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
+ s->reserved_regs |= (~ALL_QVECTOR_REG_GROUPS & ALL_VECTOR_REGS);
+ break;
+ case TCG_TYPE_V128:
+ tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+ tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+ tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
+ s->reserved_regs |= (~ALL_DVECTOR_REG_GROUPS & ALL_VECTOR_REGS);
+ break;
+ default:
+ /* Guaranteed by Zve64x. */
+ tcg_debug_assert(riscv_lg2_vlenb >= TCG_TYPE_V256);
+
+ tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
+ tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
+ tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
+ break;
+ }
tcg_target_call_clobber_regs = -1u;
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S0);
@@ -2115,7 +2175,6 @@ static void tcg_target_init(TCGContext *s)
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S10);
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S11);
- s->reserved_regs = 0;
tcg_regset_set_reg(s->reserved_regs, TCG_REG_ZERO);
tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP0);
tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1);
@@ -2123,6 +2182,7 @@ static void tcg_target_init(TCGContext *s)
tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);
tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
+ tcg_regset_set_reg(s->reserved_regs, TCG_REG_V0);
}
typedef struct {
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1a347eaf6e..12a7a37aaa 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -28,42 +28,28 @@
#include "host/cpuinfo.h"
#define TCG_TARGET_INSN_UNIT_SIZE 4
-#define TCG_TARGET_NB_REGS 32
+#define TCG_TARGET_NB_REGS 64
#define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
typedef enum {
- TCG_REG_ZERO,
- TCG_REG_RA,
- TCG_REG_SP,
- TCG_REG_GP,
- TCG_REG_TP,
- TCG_REG_T0,
- TCG_REG_T1,
- TCG_REG_T2,
- TCG_REG_S0,
- TCG_REG_S1,
- TCG_REG_A0,
- TCG_REG_A1,
- TCG_REG_A2,
- TCG_REG_A3,
- TCG_REG_A4,
- TCG_REG_A5,
- TCG_REG_A6,
- TCG_REG_A7,
- TCG_REG_S2,
- TCG_REG_S3,
- TCG_REG_S4,
- TCG_REG_S5,
- TCG_REG_S6,
- TCG_REG_S7,
- TCG_REG_S8,
- TCG_REG_S9,
- TCG_REG_S10,
- TCG_REG_S11,
- TCG_REG_T3,
- TCG_REG_T4,
- TCG_REG_T5,
- TCG_REG_T6,
+ TCG_REG_ZERO, TCG_REG_RA, TCG_REG_SP, TCG_REG_GP,
+ TCG_REG_TP, TCG_REG_T0, TCG_REG_T1, TCG_REG_T2,
+ TCG_REG_S0, TCG_REG_S1, TCG_REG_A0, TCG_REG_A1,
+ TCG_REG_A2, TCG_REG_A3, TCG_REG_A4, TCG_REG_A5,
+ TCG_REG_A6, TCG_REG_A7, TCG_REG_S2, TCG_REG_S3,
+ TCG_REG_S4, TCG_REG_S5, TCG_REG_S6, TCG_REG_S7,
+ TCG_REG_S8, TCG_REG_S9, TCG_REG_S10, TCG_REG_S11,
+ TCG_REG_T3, TCG_REG_T4, TCG_REG_T5, TCG_REG_T6,
+
+ /* RISC-V V Extension registers */
+ TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
+ TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
+ TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
+ TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
+ TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
+ TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
+ TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
+ TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
/* aliases */
TCG_AREG0 = TCG_REG_S0,
@@ -156,6 +142,32 @@ typedef enum {
#define TCG_TARGET_HAS_tst 0
+/* vector instructions */
+#define TCG_TARGET_HAS_v64 0
+#define TCG_TARGET_HAS_v128 0
+#define TCG_TARGET_HAS_v256 0
+#define TCG_TARGET_HAS_andc_vec 0
+#define TCG_TARGET_HAS_orc_vec 0
+#define TCG_TARGET_HAS_nand_vec 0
+#define TCG_TARGET_HAS_nor_vec 0
+#define TCG_TARGET_HAS_eqv_vec 0
+#define TCG_TARGET_HAS_not_vec 0
+#define TCG_TARGET_HAS_neg_vec 0
+#define TCG_TARGET_HAS_abs_vec 0
+#define TCG_TARGET_HAS_roti_vec 0
+#define TCG_TARGET_HAS_rots_vec 0
+#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_shi_vec 0
+#define TCG_TARGET_HAS_shs_vec 0
+#define TCG_TARGET_HAS_shv_vec 0
+#define TCG_TARGET_HAS_mul_vec 0
+#define TCG_TARGET_HAS_sat_vec 0
+#define TCG_TARGET_HAS_minmax_vec 0
+#define TCG_TARGET_HAS_bitsel_vec 0
+#define TCG_TARGET_HAS_cmpsel_vec 0
+
+#define TCG_TARGET_HAS_tst_vec 0
+
#define TCG_TARGET_DEFAULT_MO (0)
#define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
new file mode 100644
index 0000000000..b80b39e1e5
--- /dev/null
+++ b/tcg/riscv/tcg-target.opc.h
@@ -0,0 +1,12 @@
+/*
+ * Copyright (c) C-SKY Microsystems Co., Ltd.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.
+ *
+ * See the COPYING file in the top-level directory for details.
+ *
+ * Target-specific opcodes for host vector expansion. These will be
+ * emitted by tcg_expand_vec_op. For those familiar with GCC internals,
+ * consider these to be UNSPEC with names.
+ */
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 02/12] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 22:57 ` Richard Henderson
2024-09-22 4:46 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 04/12] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
` (8 subsequent siblings)
11 siblings, 2 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
In RISC-V, vector operations require initial vtype and vl using
the vset{i}vl{i} instruction.
This instruction:
1. Sets the vector length (vl) in bytes
2. Configures the vtype register, which includes:
SEW (Single Element Width)
LMUL (vector register group multiplier)
Other vector operation parameters
This configuration is crucial for defining subsequent vector
operation behavior. To optimize performance, the configuration
process is managed dynamically:
1. Reconfiguration using vset{i}vl{i} is necessary when SEW
or TCG_Type changes.
2. The vset instruction can be omitted when configuration
remains unchanged.
This optimization is only effective within a single TB.
Each TB requires reconfiguration at its start, as the current
state cannot be obtained from hardware.
We save the TCGType and SEW in TCGContext, so that it matches
the multi-threaded TCG.
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: Weiwei Li <liwei1518@gmail.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
include/tcg/tcg.h | 7 +
tcg/riscv/tcg-target-con-set.h | 2 +
tcg/riscv/tcg-target.c.inc | 269 ++++++++++++++++++++++++++++++++-
3 files changed, 274 insertions(+), 4 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 21d5884741..93aa9c30ee 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -544,6 +544,12 @@ struct TCGContext {
struct qemu_plugin_insn *plugin_insn;
#endif
+ /* For host-specific values. */
+#ifdef __riscv
+ MemOp riscv_cur_vsew;
+ TCGType riscv_cur_type;
+#endif
+
GHashTable *const_table[TCG_TYPE_COUNT];
TCGTempSet free_temps[TCG_TYPE_COUNT];
TCGTemp temps[TCG_MAX_TEMPS]; /* globals first, temps after */
@@ -566,6 +572,7 @@ struct TCGContext {
/* Exit to translator on overflow. */
sigjmp_buf jmp_trans;
+
};
static inline bool temp_readonly(TCGTemp *ts)
diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index aac5ceee2b..d73a62b0f2 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -21,3 +21,5 @@ C_O1_I2(r, rZ, rZ)
C_N1_I2(r, r, rM)
C_O1_I4(r, r, rI, rM, rM)
C_O2_I4(r, r, rZ, rZ, rM, rM)
+C_O0_I2(v, r)
+C_O1_I1(v, r)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 966d1ad981..47f4e35237 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -165,6 +165,31 @@ static bool tcg_target_const_match(int64_t val, int ct,
* RISC-V Base ISA opcodes (IM)
*/
+#define V_OPIVV (0x0 << 12)
+#define V_OPFVV (0x1 << 12)
+#define V_OPMVV (0x2 << 12)
+#define V_OPIVI (0x3 << 12)
+#define V_OPIVX (0x4 << 12)
+#define V_OPFVF (0x5 << 12)
+#define V_OPMVX (0x6 << 12)
+#define V_OPCFG (0x7 << 12)
+
+/* NF <= 7 && NF >= 0 */
+#define V_NF(x) (x << 29)
+#define V_UNIT_STRIDE (0x0 << 20)
+#define V_UNIT_STRIDE_WHOLE_REG (0x8 << 20)
+
+typedef enum {
+ VLMUL_M1 = 0, /* LMUL=1 */
+ VLMUL_M2, /* LMUL=2 */
+ VLMUL_M4, /* LMUL=4 */
+ VLMUL_M8, /* LMUL=8 */
+ VLMUL_RESERVED,
+ VLMUL_MF8, /* LMUL=1/8 */
+ VLMUL_MF4, /* LMUL=1/4 */
+ VLMUL_MF2, /* LMUL=1/2 */
+} RISCVVlmul;
+
typedef enum {
OPC_ADD = 0x33,
OPC_ADDI = 0x13,
@@ -260,6 +285,30 @@ typedef enum {
/* Zicond: integer conditional operations */
OPC_CZERO_EQZ = 0x0e005033,
OPC_CZERO_NEZ = 0x0e007033,
+
+ /* V: Vector extension 1.0 */
+ OPC_VSETVLI = 0x57 | V_OPCFG,
+ OPC_VSETIVLI = 0xc0000057 | V_OPCFG,
+ OPC_VSETVL = 0x80000057 | V_OPCFG,
+
+ OPC_VLE8_V = 0x7 | V_UNIT_STRIDE,
+ OPC_VLE16_V = 0x5007 | V_UNIT_STRIDE,
+ OPC_VLE32_V = 0x6007 | V_UNIT_STRIDE,
+ OPC_VLE64_V = 0x7007 | V_UNIT_STRIDE,
+ OPC_VSE8_V = 0x27 | V_UNIT_STRIDE,
+ OPC_VSE16_V = 0x5027 | V_UNIT_STRIDE,
+ OPC_VSE32_V = 0x6027 | V_UNIT_STRIDE,
+ OPC_VSE64_V = 0x7027 | V_UNIT_STRIDE,
+
+ OPC_VL1RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(0),
+ OPC_VL2RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
+ OPC_VL4RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
+ OPC_VL8RE64_V = 0x2007007 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+
+ OPC_VS1R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(0),
+ OPC_VS2R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
+ OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
+ OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
} RISCVInsn;
/*
@@ -352,6 +401,35 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
}
+/* Type-OPIVV/OPMVV/OPIVX/OPMVX, Vector load and store */
+
+static int32_t encode_v(RISCVInsn opc, TCGReg d, TCGReg s1,
+ TCGReg s2, bool vm)
+{
+ return opc | (d & 0x1f) << 7 | (s1 & 0x1f) << 15 |
+ (s2 & 0x1f) << 20 | (vm << 25);
+}
+
+/* Vector vtype */
+
+static uint32_t encode_vtype(bool vta, bool vma,
+ MemOp vsew, RISCVVlmul vlmul)
+{
+ return vma << 7 | vta << 6 | vsew << 3 | vlmul;
+}
+
+static int32_t encode_vset(RISCVInsn opc, TCGReg rd,
+ TCGArg rs1, uint32_t vtype)
+{
+ return opc | (rd & 0x1f) << 7 | (rs1 & 0x1f) << 15 | (vtype & 0x7ff) << 20;
+}
+
+static int32_t encode_vseti(RISCVInsn opc, TCGReg rd,
+ uint32_t uimm, uint32_t vtype)
+{
+ return opc | (rd & 0x1f) << 7 | (uimm & 0x1f) << 15 | (vtype & 0x3ff) << 20;
+}
+
/*
* RISC-V instruction emitters
*/
@@ -464,6 +542,88 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
}
}
+/*
+ * RISC-V vector instruction emitters
+ */
+
+/*
+ * Only unit-stride addressing implemented; may extend in future.
+ */
+static void tcg_out_opc_ldst_vec(TCGContext *s, RISCVInsn opc, TCGReg data,
+ TCGReg rs1, bool vm)
+{
+ tcg_out32(s, encode_v(opc, data, rs1, 0, vm));
+}
+
+static bool lmul_check(int lmul, MemOp vsew)
+{
+ /*
+ * For a given supported fractional LMUL setting, implementations must
+ * support SEW settings between SEW_MIN and LMUL * ELEN, inclusive.
+ * So if ELEN = 64, LMUL = 1/2, then SEW will support e8, e16, e32,
+ * but e64 may not be supported.
+ */
+ if (lmul < 0) {
+ return (8 << vsew) <= (64 / (1 << (-lmul)));
+ } else {
+ return true;
+ }
+}
+
+static void set_vtype(TCGContext *s, TCGType type, MemOp vsew)
+{
+ unsigned vtype, insn, avl;
+ int lmul;
+ RISCVVlmul vlmul;
+ bool lmul_eq_avl;
+
+ s->riscv_cur_type = type;
+ s->riscv_cur_vsew = vsew;
+
+ /* Match riscv_lg2_vlenb to TCG_TYPE_V64. */
+ QEMU_BUILD_BUG_ON(TCG_TYPE_V64 != 3);
+
+ lmul = type - riscv_lg2_vlenb;
+ if (lmul < -3) {
+ /* Host VLEN >= 1024 bits. */
+ vlmul = VLMUL_M1;
+ lmul_eq_avl = false;
+ } else if (lmul < 3) {
+ /* 1/8, 1/4, 1/2, 1, 2, 4 */
+ if (lmul_check(lmul, vsew)) {
+ vlmul = lmul & 7;
+ } else {
+ vlmul = VLMUL_M1;
+ }
+ lmul_eq_avl = true;
+ } else {
+ /* Guaranteed by Zve64x. */
+ g_assert_not_reached();
+ }
+
+ avl = tcg_type_size(type) >> vsew;
+ vtype = encode_vtype(true, true, vsew, vlmul);
+
+ if (avl < 32) {
+ insn = encode_vseti(OPC_VSETIVLI, TCG_REG_ZERO, avl, vtype);
+ } else if (lmul_eq_avl) {
+ /* rd != 0 and rs1 == 0 uses vlmax */
+ insn = encode_vset(OPC_VSETVLI, TCG_REG_TMP0, TCG_REG_ZERO, vtype);
+ } else {
+ tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, TCG_REG_ZERO, avl);
+ insn = encode_vset(OPC_VSETVLI, TCG_REG_ZERO, TCG_REG_TMP0, vtype);
+ }
+ tcg_out32(s, insn);
+}
+
+static MemOp set_vtype_len(TCGContext *s, TCGType type)
+{
+ if (type != s->riscv_cur_type) {
+ set_vtype(s, type, MO_64);
+ }
+ return s->riscv_cur_vsew;
+}
+
/*
* TCG intrinsics
*/
@@ -670,18 +830,101 @@ static void tcg_out_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
}
}
+static void tcg_out_vec_ldst(TCGContext *s, RISCVInsn opc, TCGReg data,
+ TCGReg addr, intptr_t offset)
+{
+ tcg_debug_assert(data >= TCG_REG_V0);
+ tcg_debug_assert(addr < TCG_REG_V0);
+
+ if (offset) {
+ tcg_debug_assert(addr != TCG_REG_ZERO);
+ if (offset == sextreg(offset, 0, 12)) {
+ tcg_out_opc_imm(s, OPC_ADDI, TCG_REG_TMP0, addr, offset);
+ } else {
+ tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_TMP0, offset);
+ tcg_out_opc_reg(s, OPC_ADD, TCG_REG_TMP0, TCG_REG_TMP0, addr);
+ }
+ addr = TCG_REG_TMP0;
+ }
+ tcg_out_opc_ldst_vec(s, opc, data, addr, true);
+}
+
static void tcg_out_ld(TCGContext *s, TCGType type, TCGReg arg,
TCGReg arg1, intptr_t arg2)
{
- RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_LW : OPC_LD;
- tcg_out_ldst(s, insn, arg, arg1, arg2);
+ RISCVInsn insn;
+
+ switch (type) {
+ case TCG_TYPE_I32:
+ tcg_out_ldst(s, OPC_LW, arg, arg1, arg2);
+ break;
+ case TCG_TYPE_I64:
+ tcg_out_ldst(s, OPC_LD, arg, arg1, arg2);
+ break;
+ case TCG_TYPE_V64:
+ case TCG_TYPE_V128:
+ case TCG_TYPE_V256:
+ if (type >= riscv_lg2_vlenb) {
+ static const RISCVInsn whole_reg_ld[] = {
+ OPC_VL1RE64_V, OPC_VL2RE64_V, OPC_VL4RE64_V, OPC_VL8RE64_V
+ };
+ unsigned idx = type - riscv_lg2_vlenb;
+
+ tcg_debug_assert(idx < sizeof(whole_reg_ld));
+ insn = whole_reg_ld[idx];
+ } else {
+ static const RISCVInsn unit_stride_ld[] = {
+ OPC_VLE8_V, OPC_VLE16_V, OPC_VLE32_V, OPC_VLE64_V
+ };
+ MemOp prev_vsew = set_vtype_len(s, type);
+
+ tcg_debug_assert(prev_vsew < sizeof(unit_stride_ld));
+ insn = unit_stride_ld[prev_vsew];
+ }
+ tcg_out_vec_ldst(s, insn, arg, arg1, arg2);
+ break;
+ default:
+ g_assert_not_reached();
+ }
}
static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
TCGReg arg1, intptr_t arg2)
{
- RISCVInsn insn = type == TCG_TYPE_I32 ? OPC_SW : OPC_SD;
- tcg_out_ldst(s, insn, arg, arg1, arg2);
+ RISCVInsn insn;
+
+ switch (type) {
+ case TCG_TYPE_I32:
+ tcg_out_ldst(s, OPC_SW, arg, arg1, arg2);
+ break;
+ case TCG_TYPE_I64:
+ tcg_out_ldst(s, OPC_SD, arg, arg1, arg2);
+ break;
+ case TCG_TYPE_V64:
+ case TCG_TYPE_V128:
+ case TCG_TYPE_V256:
+ if (type >= riscv_lg2_vlenb) {
+ static const RISCVInsn whole_reg_st[] = {
+ OPC_VS1R_V, OPC_VS2R_V, OPC_VS4R_V, OPC_VS8R_V
+ };
+ unsigned idx = type - riscv_lg2_vlenb;
+
+ tcg_debug_assert(idx < sizeof(whole_reg_st));
+ insn = whole_reg_st[idx];
+ } else {
+ static const RISCVInsn unit_stride_st[] = {
+ OPC_VSE8_V, OPC_VSE16_V, OPC_VSE32_V, OPC_VSE64_V
+ };
+ MemOp prev_vsew = set_vtype_len(s, type);
+
+ tcg_debug_assert(prev_vsew < sizeof(unit_stride_st));
+ insn = unit_stride_st[prev_vsew];
+ }
+ tcg_out_vec_ldst(s, insn, arg, arg1, arg2);
+ break;
+ default:
+ g_assert_not_reached();
+ }
}
static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
@@ -1892,7 +2135,20 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
const TCGArg args[TCG_MAX_OP_ARGS],
const int const_args[TCG_MAX_OP_ARGS])
{
+ TCGType type = vecl + TCG_TYPE_V64;
+ TCGArg a0, a1, a2;
+
+ a0 = args[0];
+ a1 = args[1];
+ a2 = args[2];
+
switch (opc) {
+ case INDEX_op_ld_vec:
+ tcg_out_ld(s, type, a0, a1, a2);
+ break;
+ case INDEX_op_st_vec:
+ tcg_out_st(s, type, a0, a1, a2);
+ break;
case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */
case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */
default:
@@ -2056,6 +2312,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_qemu_st_a64_i64:
return C_O0_I2(rZ, r);
+ case INDEX_op_st_vec:
+ return C_O0_I2(v, r);
+ case INDEX_op_ld_vec:
+ return C_O1_I1(v, r);
default:
g_assert_not_reached();
}
@@ -2129,6 +2389,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
static void tcg_out_tb_start(TCGContext *s)
{
+ s->riscv_cur_type = TCG_TYPE_COUNT;
/* nothing to do */
}
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 04/12] tcg/riscv: Implement vector mov/dup{m/i}
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (2 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 23:07 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 05/12] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
` (7 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
tcg/riscv/tcg-target.c.inc | 73 ++++++++++++++++++++++++++++++++++++--
1 file changed, 71 insertions(+), 2 deletions(-)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 47f4e35237..3a745ea3b4 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -309,6 +309,12 @@ typedef enum {
OPC_VS2R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(1),
OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+
+ OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
+ OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
+ OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
+
+ OPC_VMVNR_V = 0x9e000057 | V_OPIVI,
} RISCVInsn;
/*
@@ -401,6 +407,16 @@ static int32_t encode_uj(RISCVInsn opc, TCGReg rd, uint32_t imm)
return opc | (rd & 0x1f) << 7 | encode_ujimm20(imm);
}
+
+/* Type-OPIVI */
+
+static int32_t encode_vi(RISCVInsn opc, TCGReg rd, int32_t imm,
+ TCGReg vs2, bool vm)
+{
+ return opc | (rd & 0x1f) << 7 | (imm & 0x1f) << 15 |
+ (vs2 & 0x1f) << 20 | (vm << 25);
+}
+
/* Type-OPIVV/OPMVV/OPIVX/OPMVX, Vector load and store */
static int32_t encode_v(RISCVInsn opc, TCGReg d, TCGReg s1,
@@ -546,6 +562,24 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
* RISC-V vector instruction emitters
*/
+/*
+ * Vector registers uses the same 5 lower bits as GPR registers,
+ * and vm=0 (vm = false) means vector masking ENABLED.
+ * With RVV 1.0, vs2 is the first operand, while rs1/imm is the
+ * second operand.
+ */
+static void tcg_out_opc_vx(TCGContext *s, RISCVInsn opc, TCGReg vd,
+ TCGReg vs2, TCGReg rs1, bool vm)
+{
+ tcg_out32(s, encode_v(opc, vd, rs1, vs2, vm));
+}
+
+static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, TCGReg vd,
+ TCGReg vs2, int32_t imm, bool vm)
+{
+ tcg_out32(s, encode_vi(opc, vd, imm, vs2, vm));
+}
+
/*
* Only unit-stride addressing implemented; may extend in future.
*/
@@ -624,6 +658,13 @@ static MemOp set_vtype_len(TCGContext *s, TCGType type)
return s->riscv_cur_vsew;
}
+static void set_vtype_len_sew(TCGContext *s, TCGType type, MemOp vsew)
+{
+ if (type != s->riscv_cur_type || vsew != s->riscv_cur_vsew) {
+ set_vtype(s, type, vsew);
+ }
+}
+
/*
* TCG intrinsics
*/
@@ -638,6 +679,15 @@ static bool tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
case TCG_TYPE_I64:
tcg_out_opc_imm(s, OPC_ADDI, ret, arg, 0);
break;
+ case TCG_TYPE_V64:
+ case TCG_TYPE_V128:
+ case TCG_TYPE_V256:
+ {
+ int lmul = type - riscv_lg2_vlenb;
+ int nf = 1 << MAX(lmul, 0);
+ tcg_out_opc_vi(s, OPC_VMVNR_V, ret, arg, nf - 1, true);
+ }
+ break;
default:
g_assert_not_reached();
}
@@ -1001,18 +1051,32 @@ static void tcg_out_addsub2(TCGContext *s,
static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
TCGReg dst, TCGReg src)
{
- return false;
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vx(s, OPC_VMV_V_X, dst, TCG_REG_V0, src, true);
+ return true;
}
static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
TCGReg dst, TCGReg base, intptr_t offset)
{
- return false;
+ tcg_out_ld(s, TCG_TYPE_REG, TCG_REG_TMP0, base, offset);
+ return tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
}
static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
TCGReg dst, int64_t arg)
{
+ if (arg >= -16 && arg < 16) {
+ if (arg == 0 || arg == -1) {
+ set_vtype_len(s, type);
+ } else {
+ set_vtype_len_sew(s, type, vece);
+ }
+ tcg_out_opc_vi(s, OPC_VMV_V_I, dst, TCG_REG_V0, arg, true);
+ return;
+ }
+ tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_TMP0, arg);
+ tcg_out_dup_vec(s, type, vece, dst, TCG_REG_TMP0);
}
static const struct {
@@ -2143,6 +2207,9 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
a2 = args[2];
switch (opc) {
+ case INDEX_op_dupm_vec:
+ tcg_out_dupm_vec(s, type, vece, a0, a1, a2);
+ break;
case INDEX_op_ld_vec:
tcg_out_ld(s, type, a0, a1, a2);
break;
@@ -2314,6 +2381,8 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_st_vec:
return C_O0_I2(v, r);
+ case INDEX_op_dup_vec:
+ case INDEX_op_dupm_vec:
case INDEX_op_ld_vec:
return C_O1_I1(v, r);
default:
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 05/12] tcg/riscv: Add support for basic vector opcodes
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (3 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 04/12] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 06/12] tcg/riscv: Implement vector cmp/cmpsel ops LIU Zhiwei
` (6 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/riscv/tcg-target-con-set.h | 2 ++
tcg/riscv/tcg-target.c.inc | 52 ++++++++++++++++++++++++++++++++++
tcg/riscv/tcg-target.h | 2 +-
3 files changed, 55 insertions(+), 1 deletion(-)
diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index d73a62b0f2..4c4bc99355 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -23,3 +23,5 @@ C_O1_I4(r, r, rI, rM, rM)
C_O2_I4(r, r, rZ, rZ, rM, rM)
C_O0_I2(v, r)
C_O1_I1(v, r)
+C_O1_I1(v, v)
+C_O1_I2(v, v, v)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 3a745ea3b4..9b325295b7 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -310,6 +310,13 @@ typedef enum {
OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+ OPC_VADD_VV = 0x57 | V_OPIVV,
+ OPC_VSUB_VV = 0x8000057 | V_OPIVV,
+ OPC_VAND_VV = 0x24000057 | V_OPIVV,
+ OPC_VOR_VV = 0x28000057 | V_OPIVV,
+ OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
+ OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
+
OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -568,6 +575,12 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
* With RVV 1.0, vs2 is the first operand, while rs1/imm is the
* second operand.
*/
+static void tcg_out_opc_vv(TCGContext *s, RISCVInsn opc, TCGReg vd,
+ TCGReg vs2, TCGReg vs1, bool vm)
+{
+ tcg_out32(s, encode_v(opc, vd, vs1, vs2, vm));
+}
+
static void tcg_out_opc_vx(TCGContext *s, RISCVInsn opc, TCGReg vd,
TCGReg vs2, TCGReg rs1, bool vm)
{
@@ -2216,6 +2229,30 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
case INDEX_op_st_vec:
tcg_out_st(s, type, a0, a1, a2);
break;
+ case INDEX_op_add_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VADD_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_sub_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSUB_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_and_vec:
+ set_vtype_len(s, type);
+ tcg_out_opc_vv(s, OPC_VAND_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_or_vec:
+ set_vtype_len(s, type);
+ tcg_out_opc_vv(s, OPC_VOR_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_xor_vec:
+ set_vtype_len(s, type);
+ tcg_out_opc_vv(s, OPC_VXOR_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_not_vec:
+ set_vtype_len(s, type);
+ tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
+ break;
case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */
case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */
default:
@@ -2235,6 +2272,13 @@ void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
{
switch (opc) {
+ case INDEX_op_add_vec:
+ case INDEX_op_sub_vec:
+ case INDEX_op_and_vec:
+ case INDEX_op_or_vec:
+ case INDEX_op_xor_vec:
+ case INDEX_op_not_vec:
+ return 1;
default:
return 0;
}
@@ -2385,6 +2429,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_dupm_vec:
case INDEX_op_ld_vec:
return C_O1_I1(v, r);
+ case INDEX_op_not_vec:
+ return C_O1_I1(v, v);
+ case INDEX_op_add_vec:
+ case INDEX_op_sub_vec:
+ case INDEX_op_and_vec:
+ case INDEX_op_or_vec:
+ case INDEX_op_xor_vec:
+ return C_O1_I2(v, v, v);
default:
g_assert_not_reached();
}
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 12a7a37aaa..acb8dfdf16 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -151,7 +151,7 @@ typedef enum {
#define TCG_TARGET_HAS_nand_vec 0
#define TCG_TARGET_HAS_nor_vec 0
#define TCG_TARGET_HAS_eqv_vec 0
-#define TCG_TARGET_HAS_not_vec 0
+#define TCG_TARGET_HAS_not_vec 1
#define TCG_TARGET_HAS_neg_vec 0
#define TCG_TARGET_HAS_abs_vec 0
#define TCG_TARGET_HAS_roti_vec 0
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 06/12] tcg/riscv: Implement vector cmp/cmpsel ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (4 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 05/12] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 23:14 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 07/12] tcg/riscv: Implement vector neg ops LIU Zhiwei
` (5 subsequent siblings)
11 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
comparison instructions.
2.Extend comparison results from mask registers to SEW-width elements,
following recommendations in The RISC-V SPEC Volume I (Version 20240411).
This aligns with TCG's cmp_vec behavior by expanding compare results to
full element width: all 1s for true, all 0s for false.
Expand cmp with cmpsel.
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
tcg/riscv/tcg-target-con-set.h | 2 +
tcg/riscv/tcg-target-con-str.h | 2 +
tcg/riscv/tcg-target.c.inc | 251 +++++++++++++++++++++++++++------
tcg/riscv/tcg-target.h | 2 +-
4 files changed, 209 insertions(+), 48 deletions(-)
diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index 4c4bc99355..cc06102ccf 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -25,3 +25,5 @@ C_O0_I2(v, r)
C_O1_I1(v, r)
C_O1_I1(v, v)
C_O1_I2(v, v, v)
+C_O1_I2(v, v, vL)
+C_O1_I4(v, v, vL, vK, vK)
diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
index b2b3211bcb..089efe96ca 100644
--- a/tcg/riscv/tcg-target-con-str.h
+++ b/tcg/riscv/tcg-target-con-str.h
@@ -17,6 +17,8 @@ REGS('v', ALL_VECTOR_REGS)
*/
CONST('I', TCG_CT_CONST_S12)
CONST('J', TCG_CT_CONST_J12)
+CONST('K', TCG_CT_CONST_S5)
+CONST('L', TCG_CT_CONST_CMP_VI)
CONST('N', TCG_CT_CONST_N12)
CONST('M', TCG_CT_CONST_M12)
CONST('Z', TCG_CT_CONST_ZERO)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 9b325295b7..2ddfb3738a 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -106,11 +106,13 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
return TCG_REG_A0 + slot;
}
-#define TCG_CT_CONST_ZERO 0x100
-#define TCG_CT_CONST_S12 0x200
-#define TCG_CT_CONST_N12 0x400
-#define TCG_CT_CONST_M12 0x800
-#define TCG_CT_CONST_J12 0x1000
+#define TCG_CT_CONST_ZERO 0x100
+#define TCG_CT_CONST_S12 0x200
+#define TCG_CT_CONST_N12 0x400
+#define TCG_CT_CONST_M12 0x800
+#define TCG_CT_CONST_J12 0x1000
+#define TCG_CT_CONST_S5 0x2000
+#define TCG_CT_CONST_CMP_VI 0x4000
#define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32)
#define ALL_VECTOR_REGS MAKE_64BIT_MASK(32, 32)
@@ -119,48 +121,6 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
#define sextreg sextract64
-/* test if a constant matches the constraint */
-static bool tcg_target_const_match(int64_t val, int ct,
- TCGType type, TCGCond cond, int vece)
-{
- if (ct & TCG_CT_CONST) {
- return 1;
- }
- if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
- return 1;
- }
- /*
- * Sign extended from 12 bits: [-0x800, 0x7ff].
- * Used for most arithmetic, as this is the isa field.
- */
- if ((ct & TCG_CT_CONST_S12) && val >= -0x800 && val <= 0x7ff) {
- return 1;
- }
- /*
- * Sign extended from 12 bits, negated: [-0x7ff, 0x800].
- * Used for subtraction, where a constant must be handled by ADDI.
- */
- if ((ct & TCG_CT_CONST_N12) && val >= -0x7ff && val <= 0x800) {
- return 1;
- }
- /*
- * Sign extended from 12 bits, +/- matching: [-0x7ff, 0x7ff].
- * Used by addsub2 and movcond, which may need the negative value,
- * and requires the modified constant to be representable.
- */
- if ((ct & TCG_CT_CONST_M12) && val >= -0x7ff && val <= 0x7ff) {
- return 1;
- }
- /*
- * Inverse of sign extended from 12 bits: ~[-0x800, 0x7ff].
- * Used to map ANDN back to ANDI, etc.
- */
- if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
- return 1;
- }
- return 0;
-}
-
/*
* RISC-V Base ISA opcodes (IM)
*/
@@ -310,6 +270,9 @@ typedef enum {
OPC_VS4R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(3),
OPC_VS8R_V = 0x2000027 | V_UNIT_STRIDE_WHOLE_REG | V_NF(7),
+ OPC_VMERGE_VIM = 0x5c000057 | V_OPIVI,
+ OPC_VMERGE_VVM = 0x5c000057 | V_OPIVV,
+
OPC_VADD_VV = 0x57 | V_OPIVV,
OPC_VSUB_VV = 0x8000057 | V_OPIVV,
OPC_VAND_VV = 0x24000057 | V_OPIVV,
@@ -317,6 +280,29 @@ typedef enum {
OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
+ OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
+ OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
+ OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
+ OPC_VMSNE_VV = 0x64000057 | V_OPIVV,
+ OPC_VMSNE_VI = 0x64000057 | V_OPIVI,
+ OPC_VMSNE_VX = 0x64000057 | V_OPIVX,
+
+ OPC_VMSLTU_VV = 0x68000057 | V_OPIVV,
+ OPC_VMSLTU_VX = 0x68000057 | V_OPIVX,
+ OPC_VMSLT_VV = 0x6c000057 | V_OPIVV,
+ OPC_VMSLT_VX = 0x6c000057 | V_OPIVX,
+ OPC_VMSLEU_VV = 0x70000057 | V_OPIVV,
+ OPC_VMSLEU_VX = 0x70000057 | V_OPIVX,
+ OPC_VMSLE_VV = 0x74000057 | V_OPIVV,
+ OPC_VMSLE_VX = 0x74000057 | V_OPIVX,
+
+ OPC_VMSLEU_VI = 0x70000057 | V_OPIVI,
+ OPC_VMSLE_VI = 0x74000057 | V_OPIVI,
+ OPC_VMSGTU_VI = 0x78000057 | V_OPIVI,
+ OPC_VMSGTU_VX = 0x78000057 | V_OPIVX,
+ OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
+ OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
+
OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -324,6 +310,97 @@ typedef enum {
OPC_VMVNR_V = 0x9e000057 | V_OPIVI,
} RISCVInsn;
+static const struct {
+ RISCVInsn op;
+ bool swap;
+} tcg_cmpcond_to_rvv_vv[] = {
+ [TCG_COND_EQ] = { OPC_VMSEQ_VV, false },
+ [TCG_COND_NE] = { OPC_VMSNE_VV, false },
+ [TCG_COND_LT] = { OPC_VMSLT_VV, false },
+ [TCG_COND_GE] = { OPC_VMSLE_VV, true },
+ [TCG_COND_GT] = { OPC_VMSLT_VV, true },
+ [TCG_COND_LE] = { OPC_VMSLE_VV, false },
+ [TCG_COND_LTU] = { OPC_VMSLTU_VV, false },
+ [TCG_COND_GEU] = { OPC_VMSLEU_VV, true },
+ [TCG_COND_GTU] = { OPC_VMSLTU_VV, true },
+ [TCG_COND_LEU] = { OPC_VMSLEU_VV, false }
+};
+
+static const struct {
+ RISCVInsn op;
+ int min;
+ int max;
+ bool adjust;
+} tcg_cmpcond_to_rvv_vi[] = {
+ [TCG_COND_EQ] = { OPC_VMSEQ_VI, -16, 15, false },
+ [TCG_COND_NE] = { OPC_VMSNE_VI, -16, 15, false },
+ [TCG_COND_GT] = { OPC_VMSGT_VI, -16, 15, false },
+ [TCG_COND_LE] = { OPC_VMSLE_VI, -16, 15, false },
+ [TCG_COND_LT] = { OPC_VMSLE_VI, -15, 16, true },
+ [TCG_COND_GE] = { OPC_VMSGT_VI, -15, 16, true },
+ [TCG_COND_LEU] = { OPC_VMSLEU_VI, 0, 15, false },
+ [TCG_COND_GTU] = { OPC_VMSGTU_VI, 0, 15, false },
+ [TCG_COND_LTU] = { OPC_VMSLEU_VI, 1, 16, true },
+ [TCG_COND_GEU] = { OPC_VMSGTU_VI, 1, 16, true },
+};
+
+/* test if a constant matches the constraint */
+static bool tcg_target_const_match(int64_t val, int ct,
+ TCGType type, TCGCond cond, int vece)
+{
+ if (ct & TCG_CT_CONST) {
+ return 1;
+ }
+ if ((ct & TCG_CT_CONST_ZERO) && val == 0) {
+ return 1;
+ }
+ /*
+ * Sign extended from 12 bits: [-0x800, 0x7ff].
+ * Used for most arithmetic, as this is the isa field.
+ */
+ if ((ct & TCG_CT_CONST_S12) && val >= -0x800 && val <= 0x7ff) {
+ return 1;
+ }
+ /*
+ * Sign extended from 12 bits, negated: [-0x7ff, 0x800].
+ * Used for subtraction, where a constant must be handled by ADDI.
+ */
+ if ((ct & TCG_CT_CONST_N12) && val >= -0x7ff && val <= 0x800) {
+ return 1;
+ }
+ /*
+ * Sign extended from 12 bits, +/- matching: [-0x7ff, 0x7ff].
+ * Used by addsub2 and movcond, which may need the negative value,
+ * and requires the modified constant to be representable.
+ */
+ if ((ct & TCG_CT_CONST_M12) && val >= -0x7ff && val <= 0x7ff) {
+ return 1;
+ }
+ /*
+ * Inverse of sign extended from 12 bits: ~[-0x800, 0x7ff].
+ * Used to map ANDN back to ANDI, etc.
+ */
+ if ((ct & TCG_CT_CONST_J12) && ~val >= -0x800 && ~val <= 0x7ff) {
+ return 1;
+ }
+ /*
+ * Sign extended from 5 bits: [-0x10, 0x0f].
+ * Used for vector-immediate.
+ */
+ if ((ct & TCG_CT_CONST_S5) && val >= -0x10 && val <= 0x0f) {
+ return 1;
+ }
+ /*
+ * Used for vector compare OPIVI instructions.
+ */
+ if ((ct & TCG_CT_CONST_CMP_VI) &&
+ val >= tcg_cmpcond_to_rvv_vi[cond].min &&
+ val <= tcg_cmpcond_to_rvv_vi[cond].max) {
+ return true;
+ }
+ return 0;
+}
+
/*
* RISC-V immediate and instruction encoders (excludes 16-bit RVC)
*/
@@ -593,6 +670,18 @@ static void tcg_out_opc_vi(TCGContext *s, RISCVInsn opc, TCGReg vd,
tcg_out32(s, encode_vi(opc, vd, imm, vs2, vm));
}
+static void tcg_out_opc_vim_mask(TCGContext *s, RISCVInsn opc, TCGReg vd,
+ TCGReg vs2, int32_t imm)
+{
+ tcg_out32(s, encode_vi(opc, vd, imm, vs2, false));
+}
+
+static void tcg_out_opc_vvm_mask(TCGContext *s, RISCVInsn opc, TCGReg vd,
+ TCGReg vs2, TCGReg vs1)
+{
+ tcg_out32(s, encode_v(opc, vd, vs1, vs2, false));
+}
+
/*
* Only unit-stride addressing implemented; may extend in future.
*/
@@ -1430,6 +1519,49 @@ static void tcg_out_cltz(TCGContext *s, TCGType type, RISCVInsn insn,
}
}
+static void tcg_out_cmpsel(TCGContext *s, TCGType type, unsigned vece,
+ TCGCond cond, TCGReg ret,
+ TCGReg cmp1, TCGReg cmp2, bool c_cmp2,
+ TCGReg val1, bool c_val1,
+ TCGReg val2, bool c_val2)
+{
+ set_vtype_len_sew(s, type, vece);
+
+ /* Use only vmerge_vim if possible, by inverting the test. */
+ if (c_val2 && !c_val1) {
+ TCGArg temp = val1;
+ cond = tcg_invert_cond(cond);
+ val1 = val2;
+ val2 = temp;
+ c_val1 = true;
+ c_val2 = false;
+ }
+
+ /* Perform the comparison into V0 mask. */
+ if (c_cmp2) {
+ tcg_out_opc_vi(s, tcg_cmpcond_to_rvv_vi[cond].op,
+ TCG_REG_V0, cmp1,
+ cmp2 - tcg_cmpcond_to_rvv_vi[cond].adjust, true);
+ } else if (tcg_cmpcond_to_rvv_vv[cond].swap) {
+ tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op,
+ TCG_REG_V0, cmp2, cmp1, true);
+ } else {
+ tcg_out_opc_vv(s, tcg_cmpcond_to_rvv_vv[cond].op,
+ TCG_REG_V0, cmp1, cmp2, true);
+ }
+ if (c_val1) {
+ if (c_val2) {
+ tcg_out_opc_vi(s, OPC_VMV_V_I, ret, TCG_REG_V0, val2, true);
+ val2 = ret;
+ }
+ /* vd[i] == v0.mask[i] ? imm : vs2[i] */
+ tcg_out_opc_vim_mask(s, OPC_VMERGE_VIM, ret, val2, val1);
+ } else {
+ /* vd[i] == v0.mask[i] ? vs1[i] : vs2[i] */
+ tcg_out_opc_vvm_mask(s, OPC_VMERGE_VVM, ret, val2, val1);
+ }
+}
+
static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *arg, bool tail)
{
TCGReg link = tail ? TCG_REG_ZERO : TCG_REG_RA;
@@ -2253,6 +2385,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
set_vtype_len(s, type);
tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
break;
+ case INDEX_op_cmpsel_vec:
+ tcg_out_cmpsel(s, type, vece, args[5], a0, a1, a2, const_args[2],
+ args[3], const_args[3], args[4], const_args[4]);
+ break;
case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */
case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */
default:
@@ -2263,7 +2399,21 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
TCGArg a0, ...)
{
+ va_list va;
+ TCGArg a1, a2, a3;
+
+ va_start(va, a0);
+ a1 = va_arg(va, TCGArg);
+ a2 = va_arg(va, TCGArg);
+ va_end(va);
+
switch (opc) {
+ case INDEX_op_cmp_vec:
+ a3 = va_arg(va, TCGArg);
+ vec_gen_6(INDEX_op_cmpsel_vec, type, vece, a0, a1, a2,
+ tcgv_i64_arg(tcg_constant_i64(-1)),
+ tcgv_i64_arg(tcg_constant_i64(0)), a3);
+ break;
default:
g_assert_not_reached();
}
@@ -2278,7 +2428,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
case INDEX_op_or_vec:
case INDEX_op_xor_vec:
case INDEX_op_not_vec:
+ case INDEX_op_cmpsel_vec:
return 1;
+ case INDEX_op_cmp_vec:
+ return -1;
default:
return 0;
}
@@ -2437,6 +2590,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_or_vec:
case INDEX_op_xor_vec:
return C_O1_I2(v, v, v);
+ case INDEX_op_cmp_vec:
+ return C_O1_I2(v, v, vL);
+ case INDEX_op_cmpsel_vec:
+ return C_O1_I4(v, v, vL, vK, vK);
default:
g_assert_not_reached();
}
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index acb8dfdf16..94034504b2 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -164,7 +164,7 @@ typedef enum {
#define TCG_TARGET_HAS_sat_vec 0
#define TCG_TARGET_HAS_minmax_vec 0
#define TCG_TARGET_HAS_bitsel_vec 0
-#define TCG_TARGET_HAS_cmpsel_vec 0
+#define TCG_TARGET_HAS_cmpsel_vec 1
#define TCG_TARGET_HAS_tst_vec 0
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 07/12] tcg/riscv: Implement vector neg ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (5 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 06/12] tcg/riscv: Implement vector cmp/cmpsel ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 08/12] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
` (4 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/riscv/tcg-target.c.inc | 7 +++++++
tcg/riscv/tcg-target.h | 2 +-
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 2ddfb3738a..cf86f3729c 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -280,6 +280,7 @@ typedef enum {
OPC_VXOR_VV = 0x2c000057 | V_OPIVV,
OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
+ OPC_VRSUB_VI = 0xc000057 | V_OPIVI,
OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2385,6 +2386,10 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
set_vtype_len(s, type);
tcg_out_opc_vi(s, OPC_VXOR_VI, a0, a1, -1, true);
break;
+ case INDEX_op_neg_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vi(s, OPC_VRSUB_VI, a0, a1, 0, true);
+ break;
case INDEX_op_cmpsel_vec:
tcg_out_cmpsel(s, type, vece, args[5], a0, a1, a2, const_args[2],
args[3], const_args[3], args[4], const_args[4]);
@@ -2428,6 +2433,7 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
case INDEX_op_or_vec:
case INDEX_op_xor_vec:
case INDEX_op_not_vec:
+ case INDEX_op_neg_vec:
case INDEX_op_cmpsel_vec:
return 1;
case INDEX_op_cmp_vec:
@@ -2582,6 +2588,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_dupm_vec:
case INDEX_op_ld_vec:
return C_O1_I1(v, r);
+ case INDEX_op_neg_vec:
case INDEX_op_not_vec:
return C_O1_I1(v, v);
case INDEX_op_add_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 94034504b2..ae10381e02 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -152,7 +152,7 @@ typedef enum {
#define TCG_TARGET_HAS_nor_vec 0
#define TCG_TARGET_HAS_eqv_vec 0
#define TCG_TARGET_HAS_not_vec 1
-#define TCG_TARGET_HAS_neg_vec 0
+#define TCG_TARGET_HAS_neg_vec 1
#define TCG_TARGET_HAS_abs_vec 0
#define TCG_TARGET_HAS_roti_vec 0
#define TCG_TARGET_HAS_rots_vec 0
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 08/12] tcg/riscv: Implement vector sat/mul ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (6 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 07/12] tcg/riscv: Implement vector neg ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 09/12] tcg/riscv: Implement vector min/max ops LIU Zhiwei
` (3 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/riscv/tcg-target.c.inc | 36 ++++++++++++++++++++++++++++++++++++
tcg/riscv/tcg-target.h | 4 ++--
2 files changed, 38 insertions(+), 2 deletions(-)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index cf86f3729c..0a091c12d2 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -281,6 +281,12 @@ typedef enum {
OPC_VXOR_VI = 0x2c000057 | V_OPIVI,
OPC_VRSUB_VI = 0xc000057 | V_OPIVI,
+ OPC_VMUL_VV = 0x94000057 | V_OPMVV,
+ OPC_VSADD_VV = 0x84000057 | V_OPIVV,
+ OPC_VSSUB_VV = 0x8c000057 | V_OPIVV,
+ OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
+ OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
+
OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2390,6 +2396,26 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
set_vtype_len_sew(s, type, vece);
tcg_out_opc_vi(s, OPC_VRSUB_VI, a0, a1, 0, true);
break;
+ case INDEX_op_mul_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VMUL_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_ssadd_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSADD_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_sssub_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSSUB_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_usadd_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSADDU_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_ussub_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
+ break;
case INDEX_op_cmpsel_vec:
tcg_out_cmpsel(s, type, vece, args[5], a0, a1, a2, const_args[2],
args[3], const_args[3], args[4], const_args[4]);
@@ -2434,6 +2460,11 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
case INDEX_op_xor_vec:
case INDEX_op_not_vec:
case INDEX_op_neg_vec:
+ case INDEX_op_mul_vec:
+ case INDEX_op_ssadd_vec:
+ case INDEX_op_sssub_vec:
+ case INDEX_op_usadd_vec:
+ case INDEX_op_ussub_vec:
case INDEX_op_cmpsel_vec:
return 1;
case INDEX_op_cmp_vec:
@@ -2596,6 +2627,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_and_vec:
case INDEX_op_or_vec:
case INDEX_op_xor_vec:
+ case INDEX_op_mul_vec:
+ case INDEX_op_ssadd_vec:
+ case INDEX_op_sssub_vec:
+ case INDEX_op_usadd_vec:
+ case INDEX_op_ussub_vec:
return C_O1_I2(v, v, v);
case INDEX_op_cmp_vec:
return C_O1_I2(v, v, vL);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index ae10381e02..1d4d8878ce 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -160,8 +160,8 @@ typedef enum {
#define TCG_TARGET_HAS_shi_vec 0
#define TCG_TARGET_HAS_shs_vec 0
#define TCG_TARGET_HAS_shv_vec 0
-#define TCG_TARGET_HAS_mul_vec 0
-#define TCG_TARGET_HAS_sat_vec 0
+#define TCG_TARGET_HAS_mul_vec 1
+#define TCG_TARGET_HAS_sat_vec 1
#define TCG_TARGET_HAS_minmax_vec 0
#define TCG_TARGET_HAS_bitsel_vec 0
#define TCG_TARGET_HAS_cmpsel_vec 1
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 09/12] tcg/riscv: Implement vector min/max ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (7 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 08/12] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 10/12] tcg/riscv: Implement vector shi/s/v ops LIU Zhiwei
` (2 subsequent siblings)
11 siblings, 0 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/riscv/tcg-target.c.inc | 29 +++++++++++++++++++++++++++++
tcg/riscv/tcg-target.h | 2 +-
2 files changed, 30 insertions(+), 1 deletion(-)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 0a091c12d2..c068d83a97 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -287,6 +287,11 @@ typedef enum {
OPC_VSADDU_VV = 0x80000057 | V_OPIVV,
OPC_VSSUBU_VV = 0x88000057 | V_OPIVV,
+ OPC_VMAX_VV = 0x1c000057 | V_OPIVV,
+ OPC_VMAXU_VV = 0x18000057 | V_OPIVV,
+ OPC_VMIN_VV = 0x14000057 | V_OPIVV,
+ OPC_VMINU_VV = 0x10000057 | V_OPIVV,
+
OPC_VMSEQ_VV = 0x60000057 | V_OPIVV,
OPC_VMSEQ_VI = 0x60000057 | V_OPIVI,
OPC_VMSEQ_VX = 0x60000057 | V_OPIVX,
@@ -2416,6 +2421,22 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
set_vtype_len_sew(s, type, vece);
tcg_out_opc_vv(s, OPC_VSSUBU_VV, a0, a1, a2, true);
break;
+ case INDEX_op_smax_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VMAX_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_smin_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VMIN_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_umax_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VMAXU_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_umin_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
+ break;
case INDEX_op_cmpsel_vec:
tcg_out_cmpsel(s, type, vece, args[5], a0, a1, a2, const_args[2],
args[3], const_args[3], args[4], const_args[4]);
@@ -2465,6 +2486,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
case INDEX_op_sssub_vec:
case INDEX_op_usadd_vec:
case INDEX_op_ussub_vec:
+ case INDEX_op_smax_vec:
+ case INDEX_op_smin_vec:
+ case INDEX_op_umax_vec:
+ case INDEX_op_umin_vec:
case INDEX_op_cmpsel_vec:
return 1;
case INDEX_op_cmp_vec:
@@ -2632,6 +2657,10 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_sssub_vec:
case INDEX_op_usadd_vec:
case INDEX_op_ussub_vec:
+ case INDEX_op_smax_vec:
+ case INDEX_op_smin_vec:
+ case INDEX_op_umax_vec:
+ case INDEX_op_umin_vec:
return C_O1_I2(v, v, v);
case INDEX_op_cmp_vec:
return C_O1_I2(v, v, vL);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 1d4d8878ce..7005099810 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -162,7 +162,7 @@ typedef enum {
#define TCG_TARGET_HAS_shv_vec 0
#define TCG_TARGET_HAS_mul_vec 1
#define TCG_TARGET_HAS_sat_vec 1
-#define TCG_TARGET_HAS_minmax_vec 0
+#define TCG_TARGET_HAS_minmax_vec 1
#define TCG_TARGET_HAS_bitsel_vec 0
#define TCG_TARGET_HAS_cmpsel_vec 1
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 10/12] tcg/riscv: Implement vector shi/s/v ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (8 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 09/12] tcg/riscv: Implement vector min/max ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 23:15 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 11/12] tcg/riscv: Implement vector roti/v/x ops LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 12/12] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
11 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
tcg/riscv/tcg-target-con-set.h | 1 +
tcg/riscv/tcg-target.c.inc | 76 ++++++++++++++++++++++++++++++++++
tcg/riscv/tcg-target.h | 6 +--
3 files changed, 80 insertions(+), 3 deletions(-)
diff --git a/tcg/riscv/tcg-target-con-set.h b/tcg/riscv/tcg-target-con-set.h
index cc06102ccf..f40de70001 100644
--- a/tcg/riscv/tcg-target-con-set.h
+++ b/tcg/riscv/tcg-target-con-set.h
@@ -24,6 +24,7 @@ C_O2_I4(r, r, rZ, rZ, rM, rM)
C_O0_I2(v, r)
C_O1_I1(v, r)
C_O1_I1(v, v)
+C_O1_I2(v, v, r)
C_O1_I2(v, v, v)
C_O1_I2(v, v, vL)
C_O1_I4(v, v, vL, vK, vK)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index c068d83a97..16785ebe8e 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -315,6 +315,16 @@ typedef enum {
OPC_VMSGT_VI = 0x7c000057 | V_OPIVI,
OPC_VMSGT_VX = 0x7c000057 | V_OPIVX,
+ OPC_VSLL_VV = 0x94000057 | V_OPIVV,
+ OPC_VSLL_VI = 0x94000057 | V_OPIVI,
+ OPC_VSLL_VX = 0x94000057 | V_OPIVX,
+ OPC_VSRL_VV = 0xa0000057 | V_OPIVV,
+ OPC_VSRL_VI = 0xa0000057 | V_OPIVI,
+ OPC_VSRL_VX = 0xa0000057 | V_OPIVX,
+ OPC_VSRA_VV = 0xa4000057 | V_OPIVV,
+ OPC_VSRA_VI = 0xa4000057 | V_OPIVI,
+ OPC_VSRA_VX = 0xa4000057 | V_OPIVX,
+
OPC_VMV_V_V = 0x5e000057 | V_OPIVV,
OPC_VMV_V_I = 0x5e000057 | V_OPIVI,
OPC_VMV_V_X = 0x5e000057 | V_OPIVX,
@@ -1574,6 +1584,17 @@ static void tcg_out_cmpsel(TCGContext *s, TCGType type, unsigned vece,
}
}
+static void tcg_out_vshifti(TCGContext *s, RISCVInsn opc_vi, RISCVInsn opc_vx,
+ TCGReg dst, TCGReg src, unsigned imm)
+{
+ if (imm < 32) {
+ tcg_out_opc_vi(s, opc_vi, dst, src, imm, true);
+ } else {
+ tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_TMP0, imm);
+ tcg_out_opc_vx(s, opc_vx, dst, src, TCG_REG_TMP0, true);
+ }
+}
+
static void tcg_out_call_int(TCGContext *s, const tcg_insn_unit *arg, bool tail)
{
TCGReg link = tail ? TCG_REG_ZERO : TCG_REG_RA;
@@ -2437,6 +2458,42 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
set_vtype_len_sew(s, type, vece);
tcg_out_opc_vv(s, OPC_VMINU_VV, a0, a1, a2, true);
break;
+ case INDEX_op_shls_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vx(s, OPC_VSLL_VX, a0, a1, a2, true);
+ break;
+ case INDEX_op_shrs_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, a2, true);
+ break;
+ case INDEX_op_sars_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vx(s, OPC_VSRA_VX, a0, a1, a2, true);
+ break;
+ case INDEX_op_shlv_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSLL_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_shrv_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_sarv_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSRA_VV, a0, a1, a2, true);
+ break;
+ case INDEX_op_shli_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, a0, a1, a2);
+ break;
+ case INDEX_op_shri_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_vshifti(s, OPC_VSRL_VI, OPC_VSRL_VX, a0, a1, a2);
+ break;
+ case INDEX_op_sari_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_vshifti(s, OPC_VSRA_VI, OPC_VSRA_VX, a0, a1, a2);
+ break;
case INDEX_op_cmpsel_vec:
tcg_out_cmpsel(s, type, vece, args[5], a0, a1, a2, const_args[2],
args[3], const_args[3], args[4], const_args[4]);
@@ -2490,6 +2547,15 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
case INDEX_op_smin_vec:
case INDEX_op_umax_vec:
case INDEX_op_umin_vec:
+ case INDEX_op_shls_vec:
+ case INDEX_op_shrs_vec:
+ case INDEX_op_sars_vec:
+ case INDEX_op_shlv_vec:
+ case INDEX_op_shrv_vec:
+ case INDEX_op_sarv_vec:
+ case INDEX_op_shri_vec:
+ case INDEX_op_shli_vec:
+ case INDEX_op_sari_vec:
case INDEX_op_cmpsel_vec:
return 1;
case INDEX_op_cmp_vec:
@@ -2646,6 +2712,9 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
return C_O1_I1(v, r);
case INDEX_op_neg_vec:
case INDEX_op_not_vec:
+ case INDEX_op_shli_vec:
+ case INDEX_op_shri_vec:
+ case INDEX_op_sari_vec:
return C_O1_I1(v, v);
case INDEX_op_add_vec:
case INDEX_op_sub_vec:
@@ -2661,7 +2730,14 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_smin_vec:
case INDEX_op_umax_vec:
case INDEX_op_umin_vec:
+ case INDEX_op_shlv_vec:
+ case INDEX_op_shrv_vec:
+ case INDEX_op_sarv_vec:
return C_O1_I2(v, v, v);
+ case INDEX_op_shls_vec:
+ case INDEX_op_shrs_vec:
+ case INDEX_op_sars_vec:
+ return C_O1_I2(v, v, r);
case INDEX_op_cmp_vec:
return C_O1_I2(v, v, vL);
case INDEX_op_cmpsel_vec:
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 7005099810..76d30e789b 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -157,9 +157,9 @@ typedef enum {
#define TCG_TARGET_HAS_roti_vec 0
#define TCG_TARGET_HAS_rots_vec 0
#define TCG_TARGET_HAS_rotv_vec 0
-#define TCG_TARGET_HAS_shi_vec 0
-#define TCG_TARGET_HAS_shs_vec 0
-#define TCG_TARGET_HAS_shv_vec 0
+#define TCG_TARGET_HAS_shi_vec 1
+#define TCG_TARGET_HAS_shs_vec 1
+#define TCG_TARGET_HAS_shv_vec 1
#define TCG_TARGET_HAS_mul_vec 1
#define TCG_TARGET_HAS_sat_vec 1
#define TCG_TARGET_HAS_minmax_vec 1
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 11/12] tcg/riscv: Implement vector roti/v/x ops
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (9 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 10/12] tcg/riscv: Implement vector shi/s/v ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
2024-09-11 23:24 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 12/12] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
11 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
---
tcg/riscv/tcg-target.c.inc | 35 +++++++++++++++++++++++++++++++++++
tcg/riscv/tcg-target.h | 6 +++---
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
index 16785ebe8e..afc9747780 100644
--- a/tcg/riscv/tcg-target.c.inc
+++ b/tcg/riscv/tcg-target.c.inc
@@ -2494,6 +2494,33 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
set_vtype_len_sew(s, type, vece);
tcg_out_vshifti(s, OPC_VSRA_VI, OPC_VSRA_VX, a0, a1, a2);
break;
+ case INDEX_op_rotli_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, TCG_REG_V0, a1, a2);
+ tcg_out_vshifti(s, OPC_VSRL_VI, OPC_VSRL_VX, a0, a1, -a2);
+ tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
+ break;
+ case INDEX_op_rotls_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vx(s, OPC_VSLL_VX, TCG_REG_V0, a1, a2, true);
+ tcg_out_opc_reg(s, OPC_SUBW, a2, TCG_REG_ZERO, a2);
+ tcg_out_opc_vx(s, OPC_VSRL_VX, a0, a1, a2, true);
+ tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
+ break;
+ case INDEX_op_rotlv_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vv(s, OPC_VSLL_VV, TCG_REG_V0, a1, a2, true);
+ tcg_out_opc_vi(s, OPC_VRSUB_VI, TCG_REG_V0, a2, 0, true);
+ tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, TCG_REG_V0, true);
+ tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
+ break;
+ case INDEX_op_rotrv_vec:
+ set_vtype_len_sew(s, type, vece);
+ tcg_out_opc_vi(s, OPC_VRSUB_VI, TCG_REG_V0, a2, 0, true);
+ tcg_out_opc_vv(s, OPC_VSLL_VV, TCG_REG_V0, a1, TCG_REG_V0, true);
+ tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, a2, true);
+ tcg_out_opc_vv(s, OPC_VOR_VV, a0, a0, TCG_REG_V0, true);
+ break;
case INDEX_op_cmpsel_vec:
tcg_out_cmpsel(s, type, vece, args[5], a0, a1, a2, const_args[2],
args[3], const_args[3], args[4], const_args[4]);
@@ -2556,6 +2583,10 @@ int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
case INDEX_op_shri_vec:
case INDEX_op_shli_vec:
case INDEX_op_sari_vec:
+ case INDEX_op_rotls_vec:
+ case INDEX_op_rotlv_vec:
+ case INDEX_op_rotrv_vec:
+ case INDEX_op_rotli_vec:
case INDEX_op_cmpsel_vec:
return 1;
case INDEX_op_cmp_vec:
@@ -2715,6 +2746,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_shli_vec:
case INDEX_op_shri_vec:
case INDEX_op_sari_vec:
+ case INDEX_op_rotli_vec:
return C_O1_I1(v, v);
case INDEX_op_add_vec:
case INDEX_op_sub_vec:
@@ -2733,10 +2765,13 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_shlv_vec:
case INDEX_op_shrv_vec:
case INDEX_op_sarv_vec:
+ case INDEX_op_rotlv_vec:
+ case INDEX_op_rotrv_vec:
return C_O1_I2(v, v, v);
case INDEX_op_shls_vec:
case INDEX_op_shrs_vec:
case INDEX_op_sars_vec:
+ case INDEX_op_rotls_vec:
return C_O1_I2(v, v, r);
case INDEX_op_cmp_vec:
return C_O1_I2(v, v, vL);
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index 76d30e789b..e6d66cd1b9 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -154,9 +154,9 @@ typedef enum {
#define TCG_TARGET_HAS_not_vec 1
#define TCG_TARGET_HAS_neg_vec 1
#define TCG_TARGET_HAS_abs_vec 0
-#define TCG_TARGET_HAS_roti_vec 0
-#define TCG_TARGET_HAS_rots_vec 0
-#define TCG_TARGET_HAS_rotv_vec 0
+#define TCG_TARGET_HAS_roti_vec 1
+#define TCG_TARGET_HAS_rots_vec 1
+#define TCG_TARGET_HAS_rotv_vec 1
#define TCG_TARGET_HAS_shi_vec 1
#define TCG_TARGET_HAS_shs_vec 1
#define TCG_TARGET_HAS_shv_vec 1
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v4 12/12] tcg/riscv: Enable native vector support for TCG host
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
` (10 preceding siblings ...)
2024-09-11 13:26 ` [PATCH v4 11/12] tcg/riscv: Implement vector roti/v/x ops LIU Zhiwei
@ 2024-09-11 13:26 ` LIU Zhiwei
11 siblings, 0 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-11 13:26 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, zhiwei_liu, richard.henderson, TANG Tiancheng
From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/riscv/tcg-target.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
index e6d66cd1b9..d27007f2e6 100644
--- a/tcg/riscv/tcg-target.h
+++ b/tcg/riscv/tcg-target.h
@@ -143,9 +143,11 @@ typedef enum {
#define TCG_TARGET_HAS_tst 0
/* vector instructions */
-#define TCG_TARGET_HAS_v64 0
-#define TCG_TARGET_HAS_v128 0
-#define TCG_TARGET_HAS_v256 0
+#define have_rvv (cpuinfo & CPUINFO_ZVE64X)
+
+#define TCG_TARGET_HAS_v64 have_rvv
+#define TCG_TARGET_HAS_v128 have_rvv
+#define TCG_TARGET_HAS_v256 have_rvv
#define TCG_TARGET_HAS_andc_vec 0
#define TCG_TARGET_HAS_orc_vec 0
#define TCG_TARGET_HAS_nand_vec 0
--
2.43.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo
2024-09-11 13:26 ` [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
@ 2024-09-11 18:34 ` Richard Henderson
2024-09-18 5:14 ` LIU Zhiwei
0 siblings, 1 reply; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 18:34 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
> we use RISCV_HWPROBE_IMA_V instead.
Language is incorrect here. The compiler has nothing to do with it.
Perhaps "If the installed kernel header files do not support...".
However, if you use only RISCV_HWPROBE_IMA_V, then you do not have any of the additional
guarantees of Zve64x.
The kernel api for RISCV_HWPROBE_EXT_ZVE64X was introduced in 6.10.
If that is acceptable as a minimum, the simplest solution is
#ifndef RISCV_HWPROBE_EXT_ZVE64X
#define RISCV_HWPROBE_EXT_ZVE64X (1ULL << 39)
#endif
If the running kernel is old, then the bit will not be set and we will not attempt to use RVV.
If we need to support older kernels, then we'll have to go back to probing with vsetvl to
determine if all of the additional guarantees of Zve64x are met.
r~
>
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
> host/include/riscv/host/cpuinfo.h | 2 ++
> util/cpuinfo-riscv.c | 24 ++++++++++++++++++++++--
> 2 files changed, 24 insertions(+), 2 deletions(-)
>
> diff --git a/host/include/riscv/host/cpuinfo.h b/host/include/riscv/host/cpuinfo.h
> index 2b00660e36..cdc784e7b6 100644
> --- a/host/include/riscv/host/cpuinfo.h
> +++ b/host/include/riscv/host/cpuinfo.h
> @@ -10,9 +10,11 @@
> #define CPUINFO_ZBA (1u << 1)
> #define CPUINFO_ZBB (1u << 2)
> #define CPUINFO_ZICOND (1u << 3)
> +#define CPUINFO_ZVE64X (1u << 4)
>
> /* Initialized with a constructor. */
> extern unsigned cpuinfo;
> +extern unsigned riscv_lg2_vlenb;
>
> /*
> * We cannot rely on constructor ordering, so other constructors must
> diff --git a/util/cpuinfo-riscv.c b/util/cpuinfo-riscv.c
> index 497ce12680..bab782745b 100644
> --- a/util/cpuinfo-riscv.c
> +++ b/util/cpuinfo-riscv.c
> @@ -4,6 +4,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qemu/host-utils.h"
> #include "host/cpuinfo.h"
>
> #ifdef CONFIG_ASM_HWPROBE_H
> @@ -12,6 +13,7 @@
> #endif
>
> unsigned cpuinfo;
> +unsigned riscv_lg2_vlenb;
> static volatile sig_atomic_t got_sigill;
>
> static void sigill_handler(int signo, siginfo_t *si, void *data)
> @@ -33,7 +35,7 @@ static void sigill_handler(int signo, siginfo_t *si, void *data)
> /* Called both as constructor and (possibly) via other constructors. */
> unsigned __attribute__((constructor)) cpuinfo_init(void)
> {
> - unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND;
> + unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND | CPUINFO_ZVE64X;
> unsigned info = cpuinfo;
>
> if (info) {
> @@ -49,6 +51,9 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
> #endif
> #if defined(__riscv_arch_test) && defined(__riscv_zicond)
> info |= CPUINFO_ZICOND;
> +#endif
> +#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
> + info |= CPUINFO_ZVE64X;
> #endif
> left &= ~info;
>
> @@ -64,7 +69,8 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
> && pair.key >= 0) {
> info |= pair.value & RISCV_HWPROBE_EXT_ZBA ? CPUINFO_ZBA : 0;
> info |= pair.value & RISCV_HWPROBE_EXT_ZBB ? CPUINFO_ZBB : 0;
> - left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
> + info |= pair.value & RISCV_HWPROBE_IMA_V ? CPUINFO_ZVE64X : 0;
> + left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
> #ifdef RISCV_HWPROBE_EXT_ZICOND
> info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ? CPUINFO_ZICOND : 0;
> left &= ~CPUINFO_ZICOND;
> @@ -112,6 +118,20 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
> assert(left == 0);
> }
>
> + if (info & CPUINFO_ZVE64X) {
> + /*
> + * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
> + * We are guaranteed by Zve64x that VLEN >= 64, and that
> + * EEW of {8,16,32,64} are supported.
> + *
> + * Cache VLEN in a convenient form.
> + */
> + unsigned long vlenb;
> + /* Read csr "vlenb" with "csrr %0, vlenb" : "=r"(vlenb) */
> + asm volatile(".insn i 0x73, 0x2, %0, zero, -990" : "=r"(vlenb));
> + riscv_lg2_vlenb = ctz32(vlenb);
> + }
> +
> info |= CPUINFO_ALWAYS;
> cpuinfo = info;
> return info;
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-11 13:26 ` [PATCH v4 02/12] tcg/riscv: Add basic support for vector LIU Zhiwei
@ 2024-09-11 18:41 ` Richard Henderson
2024-09-18 5:17 ` LIU Zhiwei
2024-09-20 11:26 ` Daniel Henrique Barboza
1 sibling, 1 reply; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 18:41 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, Swung0x48, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> From: Swung0x48<swung0x48@outlook.com>
>
> The RISC-V vector instruction set utilizes the LMUL field to group
> multiple registers, enabling variable-length vector registers. This
> implementation uses only the first register number of each group while
> reserving the other register numbers within the group.
>
> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
> host runtime needs to adjust LMUL based on the type to use different
> register groups.
>
> This presents challenges for TCG's register allocation. Currently, we
> avoid modifying the register allocation part of TCG and only expose the
> minimum number of vector registers.
>
> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
> LMUL equal to 4, we use 4 vector registers as one register group. We can
> use a maximum of 8 register groups, but the V0 register number is reserved
> as a mask register, so we can effectively use at most 7 register groups.
> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
> forced to be used. This is because TCG cannot yet dynamically constrain
> registers with type; likewise, when the host vlen is 128 bits and
> TCG_TYPE_V256, we can use at most 15 registers.
>
> There is not much pressure on vector register allocation in TCG now, so
> using 7 registers is feasible and will not have a major impact on code
> generation.
>
> This patch:
> 1. Reserves vector register 0 for use as a mask register.
> 2. When using register groups, reserves the additional registers within
> each group.
>
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
If there is a co-author, there should be another Signed-off-by.
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
> tcg/riscv/tcg-target-con-str.h | 1 +
> tcg/riscv/tcg-target.c.inc | 126 ++++++++++++++++++++++++---------
> tcg/riscv/tcg-target.h | 78 +++++++++++---------
> tcg/riscv/tcg-target.opc.h | 12 ++++
> 4 files changed, 151 insertions(+), 66 deletions(-)
> create mode 100644 tcg/riscv/tcg-target.opc.h
Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops
2024-09-11 13:26 ` [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops LIU Zhiwei
@ 2024-09-11 22:57 ` Richard Henderson
2024-09-22 4:46 ` Richard Henderson
1 sibling, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 22:57 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> +static bool lmul_check(int lmul, MemOp vsew)
> +{
> + /*
> + * For a given supported fractional LMUL setting, implementations must
> + * support SEW settings between SEW_MIN and LMUL * ELEN, inclusive.
> + * So if ELEN = 64, LMUL = 1/2, then SEW will support e8, e16, e32,
> + * but e64 may not be supported.
> + */
> + if (lmul < 0) {
> + return (8 << vsew) <= (64 / (1 << (-lmul)));
> + } else {
> + return true;
> + }
> +}
While the spec uses language like "may not be supported", but it then goes on to use an
example of VLEN=32 and LMUL=1/8 not being valid because that leaves only one 4 bit element.
In our case...
> +
> +static void set_vtype(TCGContext *s, TCGType type, MemOp vsew)
> +{
> + unsigned vtype, insn, avl;
> + int lmul;
> + RISCVVlmul vlmul;
> + bool lmul_eq_avl;
> +
> + s->riscv_cur_type = type;
> + s->riscv_cur_vsew = vsew;
> +
> + /* Match riscv_lg2_vlenb to TCG_TYPE_V64. */
> + QEMU_BUILD_BUG_ON(TCG_TYPE_V64 != 3);
> +
> + lmul = type - riscv_lg2_vlenb;
We know VLEN, and LMUL is bounded by TCG_TYPE_V64. Since SEW=64 will never be smaller
than LMUL*VLEN, I expect the lmul_check function to be entirely unneeded: all SEW should
always work.
If for some strange reason that is not the case, the correct solution not to *assume* that
it might not work, as you are doing, but to *probe* for it at startup. For instance, it
would be easy to loop over each SEW to find the minimal LMUL for which VSETVL returns a
positive VL, i.e. VILL not set.
> + if (lmul < -3) {
> + /* Host VLEN >= 1024 bits. */
> + vlmul = VLMUL_M1;
> + lmul_eq_avl = false;
> + } else if (lmul < 3) {
> + /* 1/8, 1/4, 1/2, 1, 2, 4 */
> + if (lmul_check(lmul, vsew)) {
> + vlmul = lmul & 7;
> + } else {
> + vlmul = VLMUL_M1;
> + }
> + lmul_eq_avl = true;
lmul_eq_avl incorrectly set here for !lmul_check.
> + if (type >= riscv_lg2_vlenb) {
> + static const RISCVInsn whole_reg_ld[] = {
> + OPC_VL1RE64_V, OPC_VL2RE64_V, OPC_VL4RE64_V, OPC_VL8RE64_V
> + };
> + unsigned idx = type - riscv_lg2_vlenb;
> +
> + tcg_debug_assert(idx < sizeof(whole_reg_ld));
> + insn = whole_reg_ld[idx];
> + } else {
> + static const RISCVInsn unit_stride_ld[] = {
> + OPC_VLE8_V, OPC_VLE16_V, OPC_VLE32_V, OPC_VLE64_V
> + };
> + MemOp prev_vsew = set_vtype_len(s, type);
> +
> + tcg_debug_assert(prev_vsew < sizeof(unit_stride_ld));
Both sizeof are incorrect; you need ARRAY_SIZE().
Likewise in tcg_out_st.
> static void tcg_out_tb_start(TCGContext *s)
> {
> + s->riscv_cur_type = TCG_TYPE_COUNT;
> /* nothing to do */
> }
Remove the out-of-date comment.
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 04/12] tcg/riscv: Implement vector mov/dup{m/i}
2024-09-11 13:26 ` [PATCH v4 04/12] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
@ 2024-09-11 23:07 ` Richard Henderson
0 siblings, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 23:07 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
> tcg/riscv/tcg-target.c.inc | 73 ++++++++++++++++++++++++++++++++++++--
> 1 file changed, 71 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 06/12] tcg/riscv: Implement vector cmp/cmpsel ops
2024-09-11 13:26 ` [PATCH v4 06/12] tcg/riscv: Implement vector cmp/cmpsel ops LIU Zhiwei
@ 2024-09-11 23:14 ` Richard Henderson
0 siblings, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 23:14 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>
> 1.Address immediate value constraints in RISC-V Vector Extension 1.0 for
> comparison instructions.
>
> 2.Extend comparison results from mask registers to SEW-width elements,
> following recommendations in The RISC-V SPEC Volume I (Version 20240411).
>
> This aligns with TCG's cmp_vec behavior by expanding compare results to
> full element width: all 1s for true, all 0s for false.
>
> Expand cmp with cmpsel.
>
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
> tcg/riscv/tcg-target-con-set.h | 2 +
> tcg/riscv/tcg-target-con-str.h | 2 +
> tcg/riscv/tcg-target.c.inc | 251 +++++++++++++++++++++++++++------
> tcg/riscv/tcg-target.h | 2 +-
> 4 files changed, 209 insertions(+), 48 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 10/12] tcg/riscv: Implement vector shi/s/v ops
2024-09-11 13:26 ` [PATCH v4 10/12] tcg/riscv: Implement vector shi/s/v ops LIU Zhiwei
@ 2024-09-11 23:15 ` Richard Henderson
0 siblings, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 23:15 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> From: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
> ---
> tcg/riscv/tcg-target-con-set.h | 1 +
> tcg/riscv/tcg-target.c.inc | 76 ++++++++++++++++++++++++++++++++++
> tcg/riscv/tcg-target.h | 6 +--
> 3 files changed, 80 insertions(+), 3 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 11/12] tcg/riscv: Implement vector roti/v/x ops
2024-09-11 13:26 ` [PATCH v4 11/12] tcg/riscv: Implement vector roti/v/x ops LIU Zhiwei
@ 2024-09-11 23:24 ` Richard Henderson
0 siblings, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-11 23:24 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 06:26, LIU Zhiwei wrote:
> From: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
> tcg/riscv/tcg-target.c.inc | 35 +++++++++++++++++++++++++++++++++++
> tcg/riscv/tcg-target.h | 6 +++---
> 2 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index 16785ebe8e..afc9747780 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -2494,6 +2494,33 @@ static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
> set_vtype_len_sew(s, type, vece);
> tcg_out_vshifti(s, OPC_VSRA_VI, OPC_VSRA_VX, a0, a1, a2);
> break;
> + case INDEX_op_rotli_vec:
> + set_vtype_len_sew(s, type, vece);
> + tcg_out_vshifti(s, OPC_VSLL_VI, OPC_VSLL_VX, TCG_REG_V0, a1, a2);
> + tcg_out_vshifti(s, OPC_VSRL_VI, OPC_VSRL_VX, a0, a1, -a2);
You will want to mask -a2, because otherwise it will always fail to match imm < 32 within
tcg_out_vshifti:
-a2 & ((8 << vece) - 1)
> + case INDEX_op_rotlv_vec:
> + set_vtype_len_sew(s, type, vece);
> + tcg_out_opc_vv(s, OPC_VSLL_VV, TCG_REG_V0, a1, a2, true);
> + tcg_out_opc_vi(s, OPC_VRSUB_VI, TCG_REG_V0, a2, 0, true);
> + tcg_out_opc_vv(s, OPC_VSRL_VV, a0, a1, TCG_REG_V0, true);
You have written to V0 twice, clobbering the result.
Need to swap the shifts:
vrsub.vi v0, a2, 0
vsrl.vv v0, a1, v0
vsll.vv a0, a1, a2
vor.vv a0, a0, v0
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo
2024-09-11 18:34 ` Richard Henderson
@ 2024-09-18 5:14 ` LIU Zhiwei
2024-09-18 10:14 ` Richard Henderson
0 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-18 5:14 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 2024/9/12 2:34, Richard Henderson wrote:
> On 9/11/24 06:26, LIU Zhiwei wrote:
>> While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
>> we use RISCV_HWPROBE_IMA_V instead.
>
> Language is incorrect here. The compiler has nothing to do with it.
> Perhaps "If the installed kernel header files do not support...".
OK. Thanks.
>
> However, if you use only RISCV_HWPROBE_IMA_V, then you do not have any
> of the additional guarantees of Zve64x.
IMHO, RISCV_HWPROBE_IMA_V is more strictly constrainted than
RISCV_HWPROBE_EXT_ZVE64X.
At least in current QEMU implemenation, the V vector extension depends
on the Zve64d extension.
Thanks,
Zhiwei
> The kernel api for RISCV_HWPROBE_EXT_ZVE64X was introduced in 6.10.
> If that is acceptable as a minimum, the simplest solution is
>
> #ifndef RISCV_HWPROBE_EXT_ZVE64X
> #define RISCV_HWPROBE_EXT_ZVE64X (1ULL << 39)
> #endif
>
> If the running kernel is old, then the bit will not be set and we will
> not attempt to use RVV.
>
> If we need to support older kernels, then we'll have to go back to
> probing with vsetvl to determine if all of the additional guarantees
> of Zve64x are met.
>
>
> r~
>
>
>>
>> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
>> ---
>> host/include/riscv/host/cpuinfo.h | 2 ++
>> util/cpuinfo-riscv.c | 24 ++++++++++++++++++++++--
>> 2 files changed, 24 insertions(+), 2 deletions(-)
>>
>> diff --git a/host/include/riscv/host/cpuinfo.h
>> b/host/include/riscv/host/cpuinfo.h
>> index 2b00660e36..cdc784e7b6 100644
>> --- a/host/include/riscv/host/cpuinfo.h
>> +++ b/host/include/riscv/host/cpuinfo.h
>> @@ -10,9 +10,11 @@
>> #define CPUINFO_ZBA (1u << 1)
>> #define CPUINFO_ZBB (1u << 2)
>> #define CPUINFO_ZICOND (1u << 3)
>> +#define CPUINFO_ZVE64X (1u << 4)
>> /* Initialized with a constructor. */
>> extern unsigned cpuinfo;
>> +extern unsigned riscv_lg2_vlenb;
>> /*
>> * We cannot rely on constructor ordering, so other constructors must
>> diff --git a/util/cpuinfo-riscv.c b/util/cpuinfo-riscv.c
>> index 497ce12680..bab782745b 100644
>> --- a/util/cpuinfo-riscv.c
>> +++ b/util/cpuinfo-riscv.c
>> @@ -4,6 +4,7 @@
>> */
>> #include "qemu/osdep.h"
>> +#include "qemu/host-utils.h"
>> #include "host/cpuinfo.h"
>> #ifdef CONFIG_ASM_HWPROBE_H
>> @@ -12,6 +13,7 @@
>> #endif
>> unsigned cpuinfo;
>> +unsigned riscv_lg2_vlenb;
>> static volatile sig_atomic_t got_sigill;
>> static void sigill_handler(int signo, siginfo_t *si, void *data)
>> @@ -33,7 +35,7 @@ static void sigill_handler(int signo, siginfo_t
>> *si, void *data)
>> /* Called both as constructor and (possibly) via other
>> constructors. */
>> unsigned __attribute__((constructor)) cpuinfo_init(void)
>> {
>> - unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND;
>> + unsigned left = CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZICOND |
>> CPUINFO_ZVE64X;
>> unsigned info = cpuinfo;
>> if (info) {
>> @@ -49,6 +51,9 @@ unsigned __attribute__((constructor))
>> cpuinfo_init(void)
>> #endif
>> #if defined(__riscv_arch_test) && defined(__riscv_zicond)
>> info |= CPUINFO_ZICOND;
>> +#endif
>> +#if defined(__riscv_arch_test) && defined(__riscv_zve64x)
>> + info |= CPUINFO_ZVE64X;
>> #endif
>> left &= ~info;
>> @@ -64,7 +69,8 @@ unsigned __attribute__((constructor))
>> cpuinfo_init(void)
>> && pair.key >= 0) {
>> info |= pair.value & RISCV_HWPROBE_EXT_ZBA ?
>> CPUINFO_ZBA : 0;
>> info |= pair.value & RISCV_HWPROBE_EXT_ZBB ?
>> CPUINFO_ZBB : 0;
>> - left &= ~(CPUINFO_ZBA | CPUINFO_ZBB);
>> + info |= pair.value & RISCV_HWPROBE_IMA_V ?
>> CPUINFO_ZVE64X : 0;
>> + left &= ~(CPUINFO_ZBA | CPUINFO_ZBB | CPUINFO_ZVE64X);
>> #ifdef RISCV_HWPROBE_EXT_ZICOND
>> info |= pair.value & RISCV_HWPROBE_EXT_ZICOND ?
>> CPUINFO_ZICOND : 0;
>> left &= ~CPUINFO_ZICOND;
>> @@ -112,6 +118,20 @@ unsigned __attribute__((constructor))
>> cpuinfo_init(void)
>> assert(left == 0);
>> }
>> + if (info & CPUINFO_ZVE64X) {
>> + /*
>> + * We are guaranteed by RVV-1.0 that VLEN is a power of 2.
>> + * We are guaranteed by Zve64x that VLEN >= 64, and that
>> + * EEW of {8,16,32,64} are supported.
>> + *
>> + * Cache VLEN in a convenient form.
>> + */
>> + unsigned long vlenb;
>> + /* Read csr "vlenb" with "csrr %0, vlenb" : "=r"(vlenb) */
>> + asm volatile(".insn i 0x73, 0x2, %0, zero, -990" :
>> "=r"(vlenb));
>> + riscv_lg2_vlenb = ctz32(vlenb);
>> + }
>> +
>> info |= CPUINFO_ALWAYS;
>> cpuinfo = info;
>> return info;
>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-11 18:41 ` Richard Henderson
@ 2024-09-18 5:17 ` LIU Zhiwei
2024-09-18 10:11 ` Richard Henderson
0 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-18 5:17 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, Swung0x48, TANG Tiancheng
On 2024/9/12 2:41, Richard Henderson wrote:
> On 9/11/24 06:26, LIU Zhiwei wrote:
>> From: Swung0x48<swung0x48@outlook.com>
>>
>> The RISC-V vector instruction set utilizes the LMUL field to group
>> multiple registers, enabling variable-length vector registers. This
>> implementation uses only the first register number of each group while
>> reserving the other register numbers within the group.
>>
>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>> host runtime needs to adjust LMUL based on the type to use different
>> register groups.
>>
>> This presents challenges for TCG's register allocation. Currently, we
>> avoid modifying the register allocation part of TCG and only expose the
>> minimum number of vector registers.
>>
>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256,
>> with
>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>> use a maximum of 8 register groups, but the V0 register number is
>> reserved
>> as a mask register, so we can effectively use at most 7 register groups.
>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>> forced to be used. This is because TCG cannot yet dynamically constrain
>> registers with type; likewise, when the host vlen is 128 bits and
>> TCG_TYPE_V256, we can use at most 15 registers.
>>
>> There is not much pressure on vector register allocation in TCG now, so
>> using 7 registers is feasible and will not have a major impact on code
>> generation.
>>
>> This patch:
>> 1. Reserves vector register 0 for use as a mask register.
>> 2. When using register groups, reserves the additional registers within
>> each group.
>>
>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>
> If there is a co-author, there should be another Signed-off-by.
This patch has added a tag:
Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
Do you mean we should add the same tag twice?
Thanks,
Zhiwei
>
>> Reviewed-by: Liu Zhiwei<zhiwei_liu@linux.alibaba.com>
>> ---
>> tcg/riscv/tcg-target-con-str.h | 1 +
>> tcg/riscv/tcg-target.c.inc | 126 ++++++++++++++++++++++++---------
>> tcg/riscv/tcg-target.h | 78 +++++++++++---------
>> tcg/riscv/tcg-target.opc.h | 12 ++++
>> 4 files changed, 151 insertions(+), 66 deletions(-)
>> create mode 100644 tcg/riscv/tcg-target.opc.h
>
> Anyway,
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
>
> r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-18 5:17 ` LIU Zhiwei
@ 2024-09-18 10:11 ` Richard Henderson
2024-09-18 10:43 ` LIU Zhiwei
0 siblings, 1 reply; 36+ messages in thread
From: Richard Henderson @ 2024-09-18 10:11 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, Swung0x48, TANG Tiancheng
On 9/18/24 07:17, LIU Zhiwei wrote:
>
> On 2024/9/12 2:41, Richard Henderson wrote:
>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>> From: Swung0x48<swung0x48@outlook.com>
>>>
>>> The RISC-V vector instruction set utilizes the LMUL field to group
>>> multiple registers, enabling variable-length vector registers. This
>>> implementation uses only the first register number of each group while
>>> reserving the other register numbers within the group.
>>>
>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>>> host runtime needs to adjust LMUL based on the type to use different
>>> register groups.
>>>
>>> This presents challenges for TCG's register allocation. Currently, we
>>> avoid modifying the register allocation part of TCG and only expose the
>>> minimum number of vector registers.
>>>
>>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
>>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>>> use a maximum of 8 register groups, but the V0 register number is reserved
>>> as a mask register, so we can effectively use at most 7 register groups.
>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>>> forced to be used. This is because TCG cannot yet dynamically constrain
>>> registers with type; likewise, when the host vlen is 128 bits and
>>> TCG_TYPE_V256, we can use at most 15 registers.
>>>
>>> There is not much pressure on vector register allocation in TCG now, so
>>> using 7 registers is feasible and will not have a major impact on code
>>> generation.
>>>
>>> This patch:
>>> 1. Reserves vector register 0 for use as a mask register.
>>> 2. When using register groups, reserves the additional registers within
>>> each group.
>>>
>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>
>> If there is a co-author, there should be another Signed-off-by.
>
> This patch has added a tag:
>
> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>
>
> Do you mean we should add the same tag twice?
The from line is "Swung0x48 <swung0x48@outlook.com>".
If this is an alternate email for TANG Tiancheng, then please fix the patch --author.
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo
2024-09-18 5:14 ` LIU Zhiwei
@ 2024-09-18 10:14 ` Richard Henderson
0 siblings, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-18 10:14 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/18/24 07:14, LIU Zhiwei wrote:
>
> On 2024/9/12 2:34, Richard Henderson wrote:
>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>> While the compiler doesn't support RISCV_HWPROBE_EXT_ZVE64X,
>>> we use RISCV_HWPROBE_IMA_V instead.
>>
>> Language is incorrect here. The compiler has nothing to do with it.
>> Perhaps "If the installed kernel header files do not support...".
> OK. Thanks.
>>
>> However, if you use only RISCV_HWPROBE_IMA_V, then you do not have any of the additional
>> guarantees of Zve64x.
>
> IMHO, RISCV_HWPROBE_IMA_V is more strictly constrainted than RISCV_HWPROBE_EXT_ZVE64X.
> At least in current QEMU implemenation, the V vector extension depends on the Zve64d
> extension.
Ok. That is a better explanation for the patch description.
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-18 10:11 ` Richard Henderson
@ 2024-09-18 10:43 ` LIU Zhiwei
2024-09-18 14:27 ` Richard Henderson
0 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-18 10:43 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, Swung0x48, TANG Tiancheng
On 2024/9/18 18:11, Richard Henderson wrote:
> On 9/18/24 07:17, LIU Zhiwei wrote:
>>
>> On 2024/9/12 2:41, Richard Henderson wrote:
>>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>>> From: Swung0x48<swung0x48@outlook.com>
>>>>
>>>> The RISC-V vector instruction set utilizes the LMUL field to group
>>>> multiple registers, enabling variable-length vector registers. This
>>>> implementation uses only the first register number of each group while
>>>> reserving the other register numbers within the group.
>>>>
>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>>>> host runtime needs to adjust LMUL based on the type to use different
>>>> register groups.
>>>>
>>>> This presents challenges for TCG's register allocation. Currently, we
>>>> avoid modifying the register allocation part of TCG and only expose
>>>> the
>>>> minimum number of vector registers.
>>>>
>>>> For example, when the host vlen is 64 bits and type is
>>>> TCG_TYPE_V256, with
>>>> LMUL equal to 4, we use 4 vector registers as one register group.
>>>> We can
>>>> use a maximum of 8 register groups, but the V0 register number is
>>>> reserved
>>>> as a mask register, so we can effectively use at most 7 register
>>>> groups.
>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers
>>>> are
>>>> forced to be used. This is because TCG cannot yet dynamically
>>>> constrain
>>>> registers with type; likewise, when the host vlen is 128 bits and
>>>> TCG_TYPE_V256, we can use at most 15 registers.
>>>>
>>>> There is not much pressure on vector register allocation in TCG
>>>> now, so
>>>> using 7 registers is feasible and will not have a major impact on code
>>>> generation.
>>>>
>>>> This patch:
>>>> 1. Reserves vector register 0 for use as a mask register.
>>>> 2. When using register groups, reserves the additional registers
>>>> within
>>>> each group.
>>>>
>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>
>>> If there is a co-author, there should be another Signed-off-by.
>>
>> This patch has added a tag:
>>
>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>
>>
>> Do you mean we should add the same tag twice?
>
> The from line is "Swung0x48 <swung0x48@outlook.com>".
> If this is an alternate email for TANG Tiancheng,
No, Swung0x48 is another author.
Thanks,
Zhiwei
> then please fix the patch --author.
>
>
> r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-18 10:43 ` LIU Zhiwei
@ 2024-09-18 14:27 ` Richard Henderson
2024-09-20 4:01 ` 0x48 Swung
0 siblings, 1 reply; 36+ messages in thread
From: Richard Henderson @ 2024-09-18 14:27 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, Swung0x48, TANG Tiancheng
On 9/18/24 12:43, LIU Zhiwei wrote:
>
> On 2024/9/18 18:11, Richard Henderson wrote:
>> On 9/18/24 07:17, LIU Zhiwei wrote:
>>>
>>> On 2024/9/12 2:41, Richard Henderson wrote:
>>>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>>>> From: Swung0x48<swung0x48@outlook.com>
>>>>>
>>>>> The RISC-V vector instruction set utilizes the LMUL field to group
>>>>> multiple registers, enabling variable-length vector registers. This
>>>>> implementation uses only the first register number of each group while
>>>>> reserving the other register numbers within the group.
>>>>>
>>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>>>>> host runtime needs to adjust LMUL based on the type to use different
>>>>> register groups.
>>>>>
>>>>> This presents challenges for TCG's register allocation. Currently, we
>>>>> avoid modifying the register allocation part of TCG and only expose the
>>>>> minimum number of vector registers.
>>>>>
>>>>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
>>>>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>>>>> use a maximum of 8 register groups, but the V0 register number is reserved
>>>>> as a mask register, so we can effectively use at most 7 register groups.
>>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>>>>> forced to be used. This is because TCG cannot yet dynamically constrain
>>>>> registers with type; likewise, when the host vlen is 128 bits and
>>>>> TCG_TYPE_V256, we can use at most 15 registers.
>>>>>
>>>>> There is not much pressure on vector register allocation in TCG now, so
>>>>> using 7 registers is feasible and will not have a major impact on code
>>>>> generation.
>>>>>
>>>>> This patch:
>>>>> 1. Reserves vector register 0 for use as a mask register.
>>>>> 2. When using register groups, reserves the additional registers within
>>>>> each group.
>>>>>
>>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>>
>>>> If there is a co-author, there should be another Signed-off-by.
>>>
>>> This patch has added a tag:
>>>
>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>
>>>
>>> Do you mean we should add the same tag twice?
>>
>> The from line is "Swung0x48 <swung0x48@outlook.com>".
>> If this is an alternate email for TANG Tiancheng,
>
> No, Swung0x48 is another author.
Then we need a proper Signed-off-by line from that author.
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-18 14:27 ` Richard Henderson
@ 2024-09-20 4:01 ` 0x48 Swung
2024-09-20 4:27 ` LIU Zhiwei
2024-09-20 14:26 ` LIU Zhiwei
0 siblings, 2 replies; 36+ messages in thread
From: 0x48 Swung @ 2024-09-20 4:01 UTC (permalink / raw)
To: Richard Henderson, LIU Zhiwei, qemu-devel@nongnu.org
Cc: qemu-riscv@nongnu.org, palmer@dabbelt.com,
alistair.francis@wdc.com, dbarboza@ventanamicro.com,
liwei1518@gmail.com, bmeng.cn@gmail.com, TANG Tiancheng
[-- Attachment #1: Type: text/plain, Size: 4913 bytes --]
Hey everyone! Late to the party. Life happens sometimes ;)
Just discovered this patch and this mail list, and I'd like to provide some background story here.
<https://github.com/plctlab/plct-qemu/tree/plct-riscv-backend-rvv>I originally provided my initial implementation in a downstream repo last year, namely https://github.com/plctlab/plct-qemu/tree/plct-riscv-backend-rvv .
I'm new to contributing to qemu and also take part in the open-source community upstreaming process as a whole, so I may make mistakes in my following claims, but I see some confusion here:
1. The PLCT branch (which includes my original commits) is open-sourced using GPLv2, which follows QEMU's upstream repo. So according to the license, my modification should be EXPLICITLY shown in the patch, but I haven't seen any.
2. I do consent upstreaming my patch last year, in the form of a patch submitted with modifications from T-head, and on behalf of them. And it was agreed back in the days that I can be mentioned as one of the authors. But it turns out that there's no "sign-off", "author", "co-author" line mentioning me. If I don't speak out in this situation, does it imply that this patch is purely LIU Zhiwei's work and have nothing to do with me?
I'd like LIU to separate my patch and his modification to two separate patches, and explicitly name where are those patches coming from, so that this patch can comply to GPLv2 license and can we clarify those misunderstandings.
I don't want to take it personally , but I do smell something's wrong going on here...
Best Regards,
Swung0x48 (aka. Huang Shiyuan)
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Richard Henderson <richard.henderson@linaro.org>
Sent: Wednesday, September 18, 2024 10:27:16 PM
To: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>
Cc: qemu-riscv@nongnu.org <qemu-riscv@nongnu.org>; palmer@dabbelt.com <palmer@dabbelt.com>; alistair.francis@wdc.com <alistair.francis@wdc.com>; dbarboza@ventanamicro.com <dbarboza@ventanamicro.com>; liwei1518@gmail.com <liwei1518@gmail.com>; bmeng.cn@gmail.com <bmeng.cn@gmail.com>; Swung0x48 <swung0x48@outlook.com>; TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
Subject: Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
On 9/18/24 12:43, LIU Zhiwei wrote:
>
> On 2024/9/18 18:11, Richard Henderson wrote:
>> On 9/18/24 07:17, LIU Zhiwei wrote:
>>>
>>> On 2024/9/12 2:41, Richard Henderson wrote:
>>>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>>>> From: Swung0x48<swung0x48@outlook.com>
>>>>>
>>>>> The RISC-V vector instruction set utilizes the LMUL field to group
>>>>> multiple registers, enabling variable-length vector registers. This
>>>>> implementation uses only the first register number of each group while
>>>>> reserving the other register numbers within the group.
>>>>>
>>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>>>>> host runtime needs to adjust LMUL based on the type to use different
>>>>> register groups.
>>>>>
>>>>> This presents challenges for TCG's register allocation. Currently, we
>>>>> avoid modifying the register allocation part of TCG and only expose the
>>>>> minimum number of vector registers.
>>>>>
>>>>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
>>>>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>>>>> use a maximum of 8 register groups, but the V0 register number is reserved
>>>>> as a mask register, so we can effectively use at most 7 register groups.
>>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>>>>> forced to be used. This is because TCG cannot yet dynamically constrain
>>>>> registers with type; likewise, when the host vlen is 128 bits and
>>>>> TCG_TYPE_V256, we can use at most 15 registers.
>>>>>
>>>>> There is not much pressure on vector register allocation in TCG now, so
>>>>> using 7 registers is feasible and will not have a major impact on code
>>>>> generation.
>>>>>
>>>>> This patch:
>>>>> 1. Reserves vector register 0 for use as a mask register.
>>>>> 2. When using register groups, reserves the additional registers within
>>>>> each group.
>>>>>
>>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>>
>>>> If there is a co-author, there should be another Signed-off-by.
>>>
>>> This patch has added a tag:
>>>
>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>>
>>>
>>> Do you mean we should add the same tag twice?
>>
>> The from line is "Swung0x48 <swung0x48@outlook.com>".
>> If this is an alternate email for TANG Tiancheng,
>
> No, Swung0x48 is another author.
Then we need a proper Signed-off-by line from that author.
r~
[-- Attachment #2: Type: text/html, Size: 8031 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-20 4:01 ` 0x48 Swung
@ 2024-09-20 4:27 ` LIU Zhiwei
2024-09-20 14:26 ` LIU Zhiwei
1 sibling, 0 replies; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-20 4:27 UTC (permalink / raw)
To: 0x48 Swung, Richard Henderson, qemu-devel@nongnu.org
Cc: qemu-riscv@nongnu.org, palmer@dabbelt.com,
alistair.francis@wdc.com, dbarboza@ventanamicro.com,
liwei1518@gmail.com, bmeng.cn@gmail.com, TANG Tiancheng
[-- Attachment #1: Type: text/plain, Size: 5584 bytes --]
On 2024/9/20 12:01, 0x48 Swung wrote:
> Hey everyone! Late to the party. Life happens sometimes ;)
> Just discovered this patch and this mail list, and I'd like to provide
> some background story here.
> I originally provided my initial implementation in a downstream repo
> last year, namely
> https://github.com/plctlab/plct-qemu/tree/plct-riscv-backend-rvv .
> I'm new to contributing to qemu and also take part in the open-source
> community upstreaming process as a whole, so I may make mistakes in my
> following claims, but I see some confusion here:
> 1. The PLCT branch (which includes my original commits) is
> open-sourced using GPLv2, which follows QEMU's upstream repo. So
> according to the license, my modification should be EXPLICITLY shown
> in the patch, but I haven't seen any.
I think I have carefully processed it.
> 2. I do consent upstreaming my patch last year, in the form of a patch
> submitted with modifications from T-head, and on behalf of them. And
> it was agreed back in the days that I can be mentioned as one of the
> authors. But it turnsout that there's no "sign-off", "author",
> "co-author" line mentioning me.
The author of this patch is you. You can see it from the "From:
Swung0x48<swung0x48@outlook.com>" in the patch.
In V4, TianCheng thinks he also have done some contribution to this
patch. Thus he adds himself as a co-author.
> If I don't speak out in this situation, does it imply that this patch
> is purely LIU Zhiwei's work and have nothing to do with me?
No. I just review this patch set and sent it to the mail list. None of
this patch set belong to me.
>
> I'd like LIU to separate my patch and his modification to two separate
> patches, and explicitly name where are those patches coming from, so
> that this patch can comply to GPLv2 license and can we clarify those
> misunderstandings.
I think we have done it. Welcome to point out my mistake if you find some.
Thanks,
Zhiwei
>
> I don't want to take it personally , but I do smell something's wrong
> going on here...
>
> Best Regards,
> Swung0x48 (aka. Huang Shiyuan)
>
> Get Outlook for Android <https://aka.ms/AAb9ysg>
> ------------------------------------------------------------------------
> *From:* Richard Henderson <richard.henderson@linaro.org>
> *Sent:* Wednesday, September 18, 2024 10:27:16 PM
> *To:* LIU Zhiwei <zhiwei_liu@linux.alibaba.com>; qemu-devel@nongnu.org
> <qemu-devel@nongnu.org>
> *Cc:* qemu-riscv@nongnu.org <qemu-riscv@nongnu.org>;
> palmer@dabbelt.com <palmer@dabbelt.com>; alistair.francis@wdc.com
> <alistair.francis@wdc.com>; dbarboza@ventanamicro.com
> <dbarboza@ventanamicro.com>; liwei1518@gmail.com
> <liwei1518@gmail.com>; bmeng.cn@gmail.com <bmeng.cn@gmail.com>;
> Swung0x48 <swung0x48@outlook.com>; TANG Tiancheng
> <tangtiancheng.ttc@alibaba-inc.com>
> *Subject:* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
> On 9/18/24 12:43, LIU Zhiwei wrote:
> >
> > On 2024/9/18 18:11, Richard Henderson wrote:
> >> On 9/18/24 07:17, LIU Zhiwei wrote:
> >>>
> >>> On 2024/9/12 2:41, Richard Henderson wrote:
> >>>> On 9/11/24 06:26, LIU Zhiwei wrote:
> >>>>> From: Swung0x48<swung0x48@outlook.com>
> >>>>>
> >>>>> The RISC-V vector instruction set utilizes the LMUL field to group
> >>>>> multiple registers, enabling variable-length vector registers. This
> >>>>> implementation uses only the first register number of each group
> while
> >>>>> reserving the other register numbers within the group.
> >>>>>
> >>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
> >>>>> host runtime needs to adjust LMUL based on the type to use different
> >>>>> register groups.
> >>>>>
> >>>>> This presents challenges for TCG's register allocation.
> Currently, we
> >>>>> avoid modifying the register allocation part of TCG and only
> expose the
> >>>>> minimum number of vector registers.
> >>>>>
> >>>>> For example, when the host vlen is 64 bits and type is
> TCG_TYPE_V256, with
> >>>>> LMUL equal to 4, we use 4 vector registers as one register
> group. We can
> >>>>> use a maximum of 8 register groups, but the V0 register number
> is reserved
> >>>>> as a mask register, so we can effectively use at most 7 register
> groups.
> >>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7
> registers are
> >>>>> forced to be used. This is because TCG cannot yet dynamically
> constrain
> >>>>> registers with type; likewise, when the host vlen is 128 bits and
> >>>>> TCG_TYPE_V256, we can use at most 15 registers.
> >>>>>
> >>>>> There is not much pressure on vector register allocation in TCG
> now, so
> >>>>> using 7 registers is feasible and will not have a major impact
> on code
> >>>>> generation.
> >>>>>
> >>>>> This patch:
> >>>>> 1. Reserves vector register 0 for use as a mask register.
> >>>>> 2. When using register groups, reserves the additional registers
> within
> >>>>> each group.
> >>>>>
> >>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> >>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> >>>>
> >>>> If there is a co-author, there should be another Signed-off-by.
> >>>
> >>> This patch has added a tag:
> >>>
> >>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> >>>
> >>>
> >>> Do you mean we should add the same tag twice?
> >>
> >> The from line is "Swung0x48 <swung0x48@outlook.com>".
> >> If this is an alternate email for TANG Tiancheng,
> >
> > No, Swung0x48 is another author.
>
> Then we need a proper Signed-off-by line from that author.
>
>
> r~
[-- Attachment #2: Type: text/html, Size: 13175 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-11 13:26 ` [PATCH v4 02/12] tcg/riscv: Add basic support for vector LIU Zhiwei
2024-09-11 18:41 ` Richard Henderson
@ 2024-09-20 11:26 ` Daniel Henrique Barboza
2024-09-20 11:37 ` Markus Armbruster
1 sibling, 1 reply; 36+ messages in thread
From: Daniel Henrique Barboza @ 2024-09-20 11:26 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, liwei1518, bmeng.cn,
richard.henderson, Swung0x48, TANG Tiancheng
Hi Zhiwei,
On 9/11/24 10:26 AM, LIU Zhiwei wrote:
> From: Swung0x48 <swung0x48@outlook.com>
>
> The RISC-V vector instruction set utilizes the LMUL field to group
> multiple registers, enabling variable-length vector registers. This
> implementation uses only the first register number of each group while
> reserving the other register numbers within the group.
>
> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
> host runtime needs to adjust LMUL based on the type to use different
> register groups.
>
> This presents challenges for TCG's register allocation. Currently, we
> avoid modifying the register allocation part of TCG and only expose the
> minimum number of vector registers.
>
> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
> LMUL equal to 4, we use 4 vector registers as one register group. We can
> use a maximum of 8 register groups, but the V0 register number is reserved
> as a mask register, so we can effectively use at most 7 register groups.
> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
> forced to be used. This is because TCG cannot yet dynamically constrain
> registers with type; likewise, when the host vlen is 128 bits and
> TCG_TYPE_V256, we can use at most 15 registers.
>
> There is not much pressure on vector register allocation in TCG now, so
> using 7 registers is feasible and will not have a major impact on code
> generation.
>
> This patch:
> 1. Reserves vector register 0 for use as a mask register.
> 2. When using register groups, reserves the additional registers within
> each group.
>
> Signed-off-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Co-authored-by: TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
> Reviewed-by: Liu Zhiwei <zhiwei_liu@linux.alibaba.com>
> ---
As Rixchard already pointed out, we must have a "Signed-off-by" tag with the "author" of
the patch, and it must be the exact spelling. So in this case:
Signed-off-by: Swung0x48 <swung0x48@outlook.com>
More info here:
https://www.qemu.org/docs/master/devel/submitting-a-patch.html
-----
Your patches must include a Signed-off-by: line. This is a hard requirement
because it’s how you say “I’m legally okay to contribute this and happy for it
to go into QEMU”. The process is modelled after the Linux kernel policy.
If you wrote the patch, make sure your "From:" and "Signed-off-by:"
lines use the same spelling. It's okay if you subscribe or contribute to
the list via more than one address, but using multiple addresses in one
commit just confuses things. If someone else wrote the patch, git will
include a "From:" line in the body of the email (different from your
envelope From:) that will give credit to the correct author; but again,
that author's Signed-off-by: line is mandatory, with the same spelling.
-----
However, you can't just amend this tag in the patch though since you're not Swung0x48.
We need Swung0x48 to reply here ack indicating that it is ok to add the Signed-off-by
as required, as a indication that Swung0x48 is ok with the legal implications of
doing so.
Thanks,
Daniel
> tcg/riscv/tcg-target-con-str.h | 1 +
> tcg/riscv/tcg-target.c.inc | 126 ++++++++++++++++++++++++---------
> tcg/riscv/tcg-target.h | 78 +++++++++++---------
> tcg/riscv/tcg-target.opc.h | 12 ++++
> 4 files changed, 151 insertions(+), 66 deletions(-)
> create mode 100644 tcg/riscv/tcg-target.opc.h
>
> diff --git a/tcg/riscv/tcg-target-con-str.h b/tcg/riscv/tcg-target-con-str.h
> index d5c419dff1..b2b3211bcb 100644
> --- a/tcg/riscv/tcg-target-con-str.h
> +++ b/tcg/riscv/tcg-target-con-str.h
> @@ -9,6 +9,7 @@
> * REGS(letter, register_mask)
> */
> REGS('r', ALL_GENERAL_REGS)
> +REGS('v', ALL_VECTOR_REGS)
>
> /*
> * Define constraint letters for constants:
> diff --git a/tcg/riscv/tcg-target.c.inc b/tcg/riscv/tcg-target.c.inc
> index d334857226..966d1ad981 100644
> --- a/tcg/riscv/tcg-target.c.inc
> +++ b/tcg/riscv/tcg-target.c.inc
> @@ -32,38 +32,14 @@
>
> #ifdef CONFIG_DEBUG_TCG
> static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
> - "zero",
> - "ra",
> - "sp",
> - "gp",
> - "tp",
> - "t0",
> - "t1",
> - "t2",
> - "s0",
> - "s1",
> - "a0",
> - "a1",
> - "a2",
> - "a3",
> - "a4",
> - "a5",
> - "a6",
> - "a7",
> - "s2",
> - "s3",
> - "s4",
> - "s5",
> - "s6",
> - "s7",
> - "s8",
> - "s9",
> - "s10",
> - "s11",
> - "t3",
> - "t4",
> - "t5",
> - "t6"
> + "zero", "ra", "sp", "gp", "tp", "t0", "t1", "t2",
> + "s0", "s1", "a0", "a1", "a2", "a3", "a4", "a5",
> + "a6", "a7", "s2", "s3", "s4", "s5", "s6", "s7",
> + "s8", "s9", "s10", "s11", "t3", "t4", "t5", "t6",
> + "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7",
> + "v8", "v9", "v10", "v11", "v12", "v13", "v14", "v15",
> + "v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23",
> + "v24", "v25", "v26", "v27", "v28", "v29", "v30", "v31",
> };
> #endif
>
> @@ -100,6 +76,16 @@ static const int tcg_target_reg_alloc_order[] = {
> TCG_REG_A5,
> TCG_REG_A6,
> TCG_REG_A7,
> +
> + /* Vector registers and TCG_REG_V0 reserved for mask. */
> + TCG_REG_V1, TCG_REG_V2, TCG_REG_V3, TCG_REG_V4,
> + TCG_REG_V5, TCG_REG_V6, TCG_REG_V7, TCG_REG_V8,
> + TCG_REG_V9, TCG_REG_V10, TCG_REG_V11, TCG_REG_V12,
> + TCG_REG_V13, TCG_REG_V14, TCG_REG_V15, TCG_REG_V16,
> + TCG_REG_V17, TCG_REG_V18, TCG_REG_V19, TCG_REG_V20,
> + TCG_REG_V21, TCG_REG_V22, TCG_REG_V23, TCG_REG_V24,
> + TCG_REG_V25, TCG_REG_V26, TCG_REG_V27, TCG_REG_V28,
> + TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
> };
>
> static const int tcg_target_call_iarg_regs[] = {
> @@ -127,6 +113,9 @@ static TCGReg tcg_target_call_oarg_reg(TCGCallReturnKind kind, int slot)
> #define TCG_CT_CONST_J12 0x1000
>
> #define ALL_GENERAL_REGS MAKE_64BIT_MASK(0, 32)
> +#define ALL_VECTOR_REGS MAKE_64BIT_MASK(32, 32)
> +#define ALL_DVECTOR_REG_GROUPS 0x5555555500000000
> +#define ALL_QVECTOR_REG_GROUPS 0x1111111100000000
>
> #define sextreg sextract64
>
> @@ -766,6 +755,23 @@ static void tcg_out_addsub2(TCGContext *s,
> }
> }
>
> +static bool tcg_out_dup_vec(TCGContext *s, TCGType type, unsigned vece,
> + TCGReg dst, TCGReg src)
> +{
> + return false;
> +}
> +
> +static bool tcg_out_dupm_vec(TCGContext *s, TCGType type, unsigned vece,
> + TCGReg dst, TCGReg base, intptr_t offset)
> +{
> + return false;
> +}
> +
> +static void tcg_out_dupi_vec(TCGContext *s, TCGType type, unsigned vece,
> + TCGReg dst, int64_t arg)
> +{
> +}
> +
> static const struct {
> RISCVInsn op;
> bool swap;
> @@ -1881,6 +1887,36 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
> }
> }
>
> +static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
> + unsigned vecl, unsigned vece,
> + const TCGArg args[TCG_MAX_OP_ARGS],
> + const int const_args[TCG_MAX_OP_ARGS])
> +{
> + switch (opc) {
> + case INDEX_op_mov_vec: /* Always emitted via tcg_out_mov. */
> + case INDEX_op_dup_vec: /* Always emitted via tcg_out_dup_vec. */
> + default:
> + g_assert_not_reached();
> + }
> +}
> +
> +void tcg_expand_vec_op(TCGOpcode opc, TCGType type, unsigned vece,
> + TCGArg a0, ...)
> +{
> + switch (opc) {
> + default:
> + g_assert_not_reached();
> + }
> +}
> +
> +int tcg_can_emit_vec_op(TCGOpcode opc, TCGType type, unsigned vece)
> +{
> + switch (opc) {
> + default:
> + return 0;
> + }
> +}
> +
> static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
> {
> switch (op) {
> @@ -2100,6 +2136,30 @@ static void tcg_target_init(TCGContext *s)
> {
> tcg_target_available_regs[TCG_TYPE_I32] = 0xffffffff;
> tcg_target_available_regs[TCG_TYPE_I64] = 0xffffffff;
> + s->reserved_regs = 0;
> +
> + switch (riscv_lg2_vlenb) {
> + case TCG_TYPE_V64:
> + tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> + tcg_target_available_regs[TCG_TYPE_V128] = ALL_DVECTOR_REG_GROUPS;
> + tcg_target_available_regs[TCG_TYPE_V256] = ALL_QVECTOR_REG_GROUPS;
> + s->reserved_regs |= (~ALL_QVECTOR_REG_GROUPS & ALL_VECTOR_REGS);
> + break;
> + case TCG_TYPE_V128:
> + tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> + tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> + tcg_target_available_regs[TCG_TYPE_V256] = ALL_DVECTOR_REG_GROUPS;
> + s->reserved_regs |= (~ALL_DVECTOR_REG_GROUPS & ALL_VECTOR_REGS);
> + break;
> + default:
> + /* Guaranteed by Zve64x. */
> + tcg_debug_assert(riscv_lg2_vlenb >= TCG_TYPE_V256);
> +
> + tcg_target_available_regs[TCG_TYPE_V64] = ALL_VECTOR_REGS;
> + tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> + tcg_target_available_regs[TCG_TYPE_V256] = ALL_VECTOR_REGS;
> + break;
> + }
>
> tcg_target_call_clobber_regs = -1u;
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S0);
> @@ -2115,7 +2175,6 @@ static void tcg_target_init(TCGContext *s)
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S10);
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S11);
>
> - s->reserved_regs = 0;
> tcg_regset_set_reg(s->reserved_regs, TCG_REG_ZERO);
> tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP0);
> tcg_regset_set_reg(s->reserved_regs, TCG_REG_TMP1);
> @@ -2123,6 +2182,7 @@ static void tcg_target_init(TCGContext *s)
> tcg_regset_set_reg(s->reserved_regs, TCG_REG_SP);
> tcg_regset_set_reg(s->reserved_regs, TCG_REG_GP);
> tcg_regset_set_reg(s->reserved_regs, TCG_REG_TP);
> + tcg_regset_set_reg(s->reserved_regs, TCG_REG_V0);
> }
>
> typedef struct {
> diff --git a/tcg/riscv/tcg-target.h b/tcg/riscv/tcg-target.h
> index 1a347eaf6e..12a7a37aaa 100644
> --- a/tcg/riscv/tcg-target.h
> +++ b/tcg/riscv/tcg-target.h
> @@ -28,42 +28,28 @@
> #include "host/cpuinfo.h"
>
> #define TCG_TARGET_INSN_UNIT_SIZE 4
> -#define TCG_TARGET_NB_REGS 32
> +#define TCG_TARGET_NB_REGS 64
> #define MAX_CODE_GEN_BUFFER_SIZE ((size_t)-1)
>
> typedef enum {
> - TCG_REG_ZERO,
> - TCG_REG_RA,
> - TCG_REG_SP,
> - TCG_REG_GP,
> - TCG_REG_TP,
> - TCG_REG_T0,
> - TCG_REG_T1,
> - TCG_REG_T2,
> - TCG_REG_S0,
> - TCG_REG_S1,
> - TCG_REG_A0,
> - TCG_REG_A1,
> - TCG_REG_A2,
> - TCG_REG_A3,
> - TCG_REG_A4,
> - TCG_REG_A5,
> - TCG_REG_A6,
> - TCG_REG_A7,
> - TCG_REG_S2,
> - TCG_REG_S3,
> - TCG_REG_S4,
> - TCG_REG_S5,
> - TCG_REG_S6,
> - TCG_REG_S7,
> - TCG_REG_S8,
> - TCG_REG_S9,
> - TCG_REG_S10,
> - TCG_REG_S11,
> - TCG_REG_T3,
> - TCG_REG_T4,
> - TCG_REG_T5,
> - TCG_REG_T6,
> + TCG_REG_ZERO, TCG_REG_RA, TCG_REG_SP, TCG_REG_GP,
> + TCG_REG_TP, TCG_REG_T0, TCG_REG_T1, TCG_REG_T2,
> + TCG_REG_S0, TCG_REG_S1, TCG_REG_A0, TCG_REG_A1,
> + TCG_REG_A2, TCG_REG_A3, TCG_REG_A4, TCG_REG_A5,
> + TCG_REG_A6, TCG_REG_A7, TCG_REG_S2, TCG_REG_S3,
> + TCG_REG_S4, TCG_REG_S5, TCG_REG_S6, TCG_REG_S7,
> + TCG_REG_S8, TCG_REG_S9, TCG_REG_S10, TCG_REG_S11,
> + TCG_REG_T3, TCG_REG_T4, TCG_REG_T5, TCG_REG_T6,
> +
> + /* RISC-V V Extension registers */
> + TCG_REG_V0, TCG_REG_V1, TCG_REG_V2, TCG_REG_V3,
> + TCG_REG_V4, TCG_REG_V5, TCG_REG_V6, TCG_REG_V7,
> + TCG_REG_V8, TCG_REG_V9, TCG_REG_V10, TCG_REG_V11,
> + TCG_REG_V12, TCG_REG_V13, TCG_REG_V14, TCG_REG_V15,
> + TCG_REG_V16, TCG_REG_V17, TCG_REG_V18, TCG_REG_V19,
> + TCG_REG_V20, TCG_REG_V21, TCG_REG_V22, TCG_REG_V23,
> + TCG_REG_V24, TCG_REG_V25, TCG_REG_V26, TCG_REG_V27,
> + TCG_REG_V28, TCG_REG_V29, TCG_REG_V30, TCG_REG_V31,
>
> /* aliases */
> TCG_AREG0 = TCG_REG_S0,
> @@ -156,6 +142,32 @@ typedef enum {
>
> #define TCG_TARGET_HAS_tst 0
>
> +/* vector instructions */
> +#define TCG_TARGET_HAS_v64 0
> +#define TCG_TARGET_HAS_v128 0
> +#define TCG_TARGET_HAS_v256 0
> +#define TCG_TARGET_HAS_andc_vec 0
> +#define TCG_TARGET_HAS_orc_vec 0
> +#define TCG_TARGET_HAS_nand_vec 0
> +#define TCG_TARGET_HAS_nor_vec 0
> +#define TCG_TARGET_HAS_eqv_vec 0
> +#define TCG_TARGET_HAS_not_vec 0
> +#define TCG_TARGET_HAS_neg_vec 0
> +#define TCG_TARGET_HAS_abs_vec 0
> +#define TCG_TARGET_HAS_roti_vec 0
> +#define TCG_TARGET_HAS_rots_vec 0
> +#define TCG_TARGET_HAS_rotv_vec 0
> +#define TCG_TARGET_HAS_shi_vec 0
> +#define TCG_TARGET_HAS_shs_vec 0
> +#define TCG_TARGET_HAS_shv_vec 0
> +#define TCG_TARGET_HAS_mul_vec 0
> +#define TCG_TARGET_HAS_sat_vec 0
> +#define TCG_TARGET_HAS_minmax_vec 0
> +#define TCG_TARGET_HAS_bitsel_vec 0
> +#define TCG_TARGET_HAS_cmpsel_vec 0
> +
> +#define TCG_TARGET_HAS_tst_vec 0
> +
> #define TCG_TARGET_DEFAULT_MO (0)
>
> #define TCG_TARGET_NEED_LDST_LABELS
> diff --git a/tcg/riscv/tcg-target.opc.h b/tcg/riscv/tcg-target.opc.h
> new file mode 100644
> index 0000000000..b80b39e1e5
> --- /dev/null
> +++ b/tcg/riscv/tcg-target.opc.h
> @@ -0,0 +1,12 @@
> +/*
> + * Copyright (c) C-SKY Microsystems Co., Ltd.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.
> + *
> + * See the COPYING file in the top-level directory for details.
> + *
> + * Target-specific opcodes for host vector expansion. These will be
> + * emitted by tcg_expand_vec_op. For those familiar with GCC internals,
> + * consider these to be UNSPEC with names.
> + */
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-20 11:26 ` Daniel Henrique Barboza
@ 2024-09-20 11:37 ` Markus Armbruster
0 siblings, 0 replies; 36+ messages in thread
From: Markus Armbruster @ 2024-09-20 11:37 UTC (permalink / raw)
To: Daniel Henrique Barboza
Cc: LIU Zhiwei, qemu-devel, qemu-riscv, palmer, alistair.francis,
liwei1518, bmeng.cn, richard.henderson, Swung0x48, TANG Tiancheng
Daniel Henrique Barboza <dbarboza@ventanamicro.com> writes:
> Hi Zhiwei,
>
> As Rixchard already pointed out, we must have a "Signed-off-by" tag with the "author" of
> the patch, and it must be the exact spelling. So in this case:
>
> Signed-off-by: Swung0x48 <swung0x48@outlook.com>
I'm afraid we need a legal name here, not a nickname.
> More info here:
>
> https://www.qemu.org/docs/master/devel/submitting-a-patch.html
[...]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-20 4:01 ` 0x48 Swung
2024-09-20 4:27 ` LIU Zhiwei
@ 2024-09-20 14:26 ` LIU Zhiwei
2024-09-21 15:56 ` 0x48 Swung
1 sibling, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-20 14:26 UTC (permalink / raw)
To: 0x48 Swung, Richard Henderson, qemu-devel@nongnu.org,
liwei1518@gmail.com
Cc: qemu-riscv@nongnu.org, palmer@dabbelt.com,
alistair.francis@wdc.com, dbarboza@ventanamicro.com,
liwei1518@gmail.com, bmeng.cn@gmail.com, TANG Tiancheng
[-- Attachment #1: Type: text/plain, Size: 5314 bytes --]
On 2024/9/20 12:01, 0x48 Swung wrote:
> Hey everyone! Late to the party. Life happens sometimes ;)
> Just discovered this patch and this mail list, and I'd like to provide
> some background story here.
> I originally provided my initial implementation in a downstream repo
> last year, namely
> https://github.com/plctlab/plct-qemu/tree/plct-riscv-backend-rvv .
> I'm new to contributing to qemu and also take part in the open-source
> community upstreaming process as a whole, so I may make mistakes in my
> following claims, but I see some confusion here:
> 1. The PLCT branch (which includes my original commits) is
> open-sourced using GPLv2, which follows QEMU's upstream repo. So
> according to the license, my modification should be EXPLICITLY shown
> in the patch, but I haven't seen any.
> 2. I do consent upstreaming my patch last year, in the form of a patch
> submitted with modifications from T-head, and on behalf of them. And
> it was agreed back in the days that I can be mentioned as one of the
> authors. But it turnsout that there's no "sign-off", "author",
> "co-author" line mentioning me. If I don't speak out in this
> situation, does it imply that this patch is purely LIU Zhiwei's work
> and have nothing to do with me?
>
> I'd like LIU to separate my patch and his modification to two separate
> patches, and explicitly name where are those patches coming from, so
> that this patch can comply to GPLv2 license and can we clarify those
> misunderstandings.
>
> I don't want to take it personally , but I do smell something's wrong
> going on here...
I think there was a misunderstanding. But I will not explain it too much
here. If you agree, please don't block this work and send the tag as
Daniel and Markus point out.
Thanks,
Zhiwei
>
> Best Regards,
> Swung0x48 (aka. Huang Shiyuan)
>
> Get Outlook for Android <https://aka.ms/AAb9ysg>
> ------------------------------------------------------------------------
> *From:* Richard Henderson <richard.henderson@linaro.org>
> *Sent:* Wednesday, September 18, 2024 10:27:16 PM
> *To:* LIU Zhiwei <zhiwei_liu@linux.alibaba.com>; qemu-devel@nongnu.org
> <qemu-devel@nongnu.org>
> *Cc:* qemu-riscv@nongnu.org <qemu-riscv@nongnu.org>;
> palmer@dabbelt.com <palmer@dabbelt.com>; alistair.francis@wdc.com
> <alistair.francis@wdc.com>; dbarboza@ventanamicro.com
> <dbarboza@ventanamicro.com>; liwei1518@gmail.com
> <liwei1518@gmail.com>; bmeng.cn@gmail.com <bmeng.cn@gmail.com>;
> Swung0x48 <swung0x48@outlook.com>; TANG Tiancheng
> <tangtiancheng.ttc@alibaba-inc.com>
> *Subject:* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
> On 9/18/24 12:43, LIU Zhiwei wrote:
> >
> > On 2024/9/18 18:11, Richard Henderson wrote:
> >> On 9/18/24 07:17, LIU Zhiwei wrote:
> >>>
> >>> On 2024/9/12 2:41, Richard Henderson wrote:
> >>>> On 9/11/24 06:26, LIU Zhiwei wrote:
> >>>>> From: Swung0x48<swung0x48@outlook.com>
> >>>>>
> >>>>> The RISC-V vector instruction set utilizes the LMUL field to group
> >>>>> multiple registers, enabling variable-length vector registers. This
> >>>>> implementation uses only the first register number of each group
> while
> >>>>> reserving the other register numbers within the group.
> >>>>>
> >>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
> >>>>> host runtime needs to adjust LMUL based on the type to use different
> >>>>> register groups.
> >>>>>
> >>>>> This presents challenges for TCG's register allocation.
> Currently, we
> >>>>> avoid modifying the register allocation part of TCG and only
> expose the
> >>>>> minimum number of vector registers.
> >>>>>
> >>>>> For example, when the host vlen is 64 bits and type is
> TCG_TYPE_V256, with
> >>>>> LMUL equal to 4, we use 4 vector registers as one register
> group. We can
> >>>>> use a maximum of 8 register groups, but the V0 register number
> is reserved
> >>>>> as a mask register, so we can effectively use at most 7 register
> groups.
> >>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7
> registers are
> >>>>> forced to be used. This is because TCG cannot yet dynamically
> constrain
> >>>>> registers with type; likewise, when the host vlen is 128 bits and
> >>>>> TCG_TYPE_V256, we can use at most 15 registers.
> >>>>>
> >>>>> There is not much pressure on vector register allocation in TCG
> now, so
> >>>>> using 7 registers is feasible and will not have a major impact
> on code
> >>>>> generation.
> >>>>>
> >>>>> This patch:
> >>>>> 1. Reserves vector register 0 for use as a mask register.
> >>>>> 2. When using register groups, reserves the additional registers
> within
> >>>>> each group.
> >>>>>
> >>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> >>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> >>>>
> >>>> If there is a co-author, there should be another Signed-off-by.
> >>>
> >>> This patch has added a tag:
> >>>
> >>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
> >>>
> >>>
> >>> Do you mean we should add the same tag twice?
> >>
> >> The from line is "Swung0x48 <swung0x48@outlook.com>".
> >> If this is an alternate email for TANG Tiancheng,
> >
> > No, Swung0x48 is another author.
>
> Then we need a proper Signed-off-by line from that author.
>
>
> r~
[-- Attachment #2: Type: text/html, Size: 12230 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-20 14:26 ` LIU Zhiwei
@ 2024-09-21 15:56 ` 0x48 Swung
2024-09-21 17:17 ` Daniel Henrique Barboza
0 siblings, 1 reply; 36+ messages in thread
From: 0x48 Swung @ 2024-09-21 15:56 UTC (permalink / raw)
To: LIU Zhiwei
Cc: Richard Henderson, qemu-devel@nongnu.org, liwei1518@gmail.com,
qemu-riscv@nongnu.org, palmer@dabbelt.com,
alistair.francis@wdc.com, dbarboza@ventanamicro.com,
bmeng.cn@gmail.com, TANG Tiancheng
[-- Attachment #1: Type: text/plain, Size: 6127 bytes --]
Signed-off-by: Huang Shiyuan <swung0x48@outlook.com<mailto:swung0x48@outlook.com>>
This is the tag. Is this fine or do I need to do something else? Thanks for the help from everybody in this list!
在 2024年9月20日,22:28,LIU Zhiwei <zhiwei_liu@linux.alibaba.com> 写道:
On 2024/9/20 12:01, 0x48 Swung wrote:
Hey everyone! Late to the party. Life happens sometimes ;)
Just discovered this patch and this mail list, and I'd like to provide some background story here.
I originally provided my initial implementation in a downstream repo last year, namely https://github.com/plctlab/plct-qemu/tree/plct-riscv-backend-rvv .
I'm new to contributing to qemu and also take part in the open-source community upstreaming process as a whole, so I may make mistakes in my following claims, but I see some confusion here:
1. The PLCT branch (which includes my original commits) is open-sourced using GPLv2, which follows QEMU's upstream repo. So according to the license, my modification should be EXPLICITLY shown in the patch, but I haven't seen any.
2. I do consent upstreaming my patch last year, in the form of a patch submitted with modifications from T-head, and on behalf of them. And it was agreed back in the days that I can be mentioned as one of the authors. But it turns out that there's no "sign-off", "author", "co-author" line mentioning me. If I don't speak out in this situation, does it imply that this patch is purely LIU Zhiwei's work and have nothing to do with me?
I'd like LIU to separate my patch and his modification to two separate patches, and explicitly name where are those patches coming from, so that this patch can comply to GPLv2 license and can we clarify those misunderstandings.
I don't want to take it personally , but I do smell something's wrong going on here...
I think there was a misunderstanding. But I will not explain it too much here. If you agree, please don't block this work and send the tag as Daniel and Markus point out.
Thanks,
Zhiwei
Best Regards,
Swung0x48 (aka. Huang Shiyuan)
Get Outlook for Android<https://aka.ms/AAb9ysg>
________________________________
From: Richard Henderson <richard.henderson@linaro.org><mailto:richard.henderson@linaro.org>
Sent: Wednesday, September 18, 2024 10:27:16 PM
To: LIU Zhiwei <zhiwei_liu@linux.alibaba.com><mailto:zhiwei_liu@linux.alibaba.com>; qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org> <qemu-devel@nongnu.org><mailto:qemu-devel@nongnu.org>
Cc: qemu-riscv@nongnu.org<mailto:qemu-riscv@nongnu.org> <qemu-riscv@nongnu.org><mailto:qemu-riscv@nongnu.org>; palmer@dabbelt.com<mailto:palmer@dabbelt.com> <palmer@dabbelt.com><mailto:palmer@dabbelt.com>; alistair.francis@wdc.com<mailto:alistair.francis@wdc.com> <alistair.francis@wdc.com><mailto:alistair.francis@wdc.com>; dbarboza@ventanamicro.com<mailto:dbarboza@ventanamicro.com> <dbarboza@ventanamicro.com><mailto:dbarboza@ventanamicro.com>; liwei1518@gmail.com<mailto:liwei1518@gmail.com> <liwei1518@gmail.com><mailto:liwei1518@gmail.com>; bmeng.cn@gmail.com<mailto:bmeng.cn@gmail.com> <bmeng.cn@gmail.com><mailto:bmeng.cn@gmail.com>; Swung0x48 <swung0x48@outlook.com><mailto:swung0x48@outlook.com>; TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com><mailto:tangtiancheng.ttc@alibaba-inc.com>
Subject: Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
On 9/18/24 12:43, LIU Zhiwei wrote:
>
> On 2024/9/18 18:11, Richard Henderson wrote:
>> On 9/18/24 07:17, LIU Zhiwei wrote:
>>>
>>> On 2024/9/12 2:41, Richard Henderson wrote:
>>>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>>>> From: Swung0x48<swung0x48@outlook.com><mailto:swung0x48@outlook.com>
>>>>>
>>>>> The RISC-V vector instruction set utilizes the LMUL field to group
>>>>> multiple registers, enabling variable-length vector registers. This
>>>>> implementation uses only the first register number of each group while
>>>>> reserving the other register numbers within the group.
>>>>>
>>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>>>>> host runtime needs to adjust LMUL based on the type to use different
>>>>> register groups.
>>>>>
>>>>> This presents challenges for TCG's register allocation. Currently, we
>>>>> avoid modifying the register allocation part of TCG and only expose the
>>>>> minimum number of vector registers.
>>>>>
>>>>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
>>>>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>>>>> use a maximum of 8 register groups, but the V0 register number is reserved
>>>>> as a mask register, so we can effectively use at most 7 register groups.
>>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>>>>> forced to be used. This is because TCG cannot yet dynamically constrain
>>>>> registers with type; likewise, when the host vlen is 128 bits and
>>>>> TCG_TYPE_V256, we can use at most 15 registers.
>>>>>
>>>>> There is not much pressure on vector register allocation in TCG now, so
>>>>> using 7 registers is feasible and will not have a major impact on code
>>>>> generation.
>>>>>
>>>>> This patch:
>>>>> 1. Reserves vector register 0 for use as a mask register.
>>>>> 2. When using register groups, reserves the additional registers within
>>>>> each group.
>>>>>
>>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com><mailto:tangtiancheng.ttc@alibaba-inc.com>
>>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com><mailto:tangtiancheng.ttc@alibaba-inc.com>
>>>>
>>>> If there is a co-author, there should be another Signed-off-by.
>>>
>>> This patch has added a tag:
>>>
>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com><mailto:tangtiancheng.ttc@alibaba-inc.com>
>>>
>>>
>>> Do you mean we should add the same tag twice?
>>
>> The from line is "Swung0x48 <swung0x48@outlook.com><mailto:swung0x48@outlook.com>".
>> If this is an alternate email for TANG Tiancheng,
>
> No, Swung0x48 is another author.
Then we need a proper Signed-off-by line from that author.
r~
[-- Attachment #2: Type: text/html, Size: 11241 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
2024-09-21 15:56 ` 0x48 Swung
@ 2024-09-21 17:17 ` Daniel Henrique Barboza
0 siblings, 0 replies; 36+ messages in thread
From: Daniel Henrique Barboza @ 2024-09-21 17:17 UTC (permalink / raw)
To: 0x48 Swung, LIU Zhiwei
Cc: Richard Henderson, qemu-devel@nongnu.org, liwei1518@gmail.com,
qemu-riscv@nongnu.org, palmer@dabbelt.com,
alistair.francis@wdc.com, bmeng.cn@gmail.com, TANG Tiancheng
On 9/21/24 12:56 PM, 0x48 Swung wrote:
> Signed-off-by: Huang Shiyuan <swung0x48@outlook.com <mailto:swung0x48@outlook.com>>
>
> This is the tag. Is this fine or do I need to do something else? Thanks for the help from everybody in this list!
Thanks! This is enough. Zhiwei can add the tag in the patch in v5.
Daniel
>
>> 在 2024年9月20日,22:28,LIU Zhiwei <zhiwei_liu@linux.alibaba.com> 写道:
>>
>>
>>
>>
>> On 2024/9/20 12:01, 0x48 Swung wrote:
>>> Hey everyone! Late to the party. Life happens sometimes ;)
>>> Just discovered this patch and this mail list, and I'd like to provide some background story here.
>>> I originally provided my initial implementation in a downstream repo last year, namely https://github.com/plctlab/plct-qemu/tree/plct-riscv-backend-rvv .
>>> I'm new to contributing to qemu and also take part in the open-source community upstreaming process as a whole, so I may make mistakes in my following claims, but I see some confusion here:
>>> 1. The PLCT branch (which includes my original commits) is open-sourced using GPLv2, which follows QEMU's upstream repo. So according to the license, my modification should be EXPLICITLY shown in the patch, but I haven't seen any.
>>> 2. I do consent upstreaming my patch last year, in the form of a patch submitted with modifications from T-head, and on behalf of them. And it was agreed back in the days that I can be mentioned as one of the authors. But it turnsout that there's no "sign-off", "author", "co-author" line mentioning me. If I don't speak out in this situation, does it imply that this patch is purely LIU Zhiwei's work and have nothing to do with me?
>>>
>>> I'd like LIU to separate my patch and his modification to two separate patches, and explicitly name where are those patches coming from, so that this patch can comply to GPLv2 license and can we clarify those misunderstandings.
>>>
>>> I don't want to take it personally , but I do smell something's wrong going on here...
>>
>> I think there was a misunderstanding. But I will not explain it too much here. If you agree, please don't block this work and send the tag as Daniel and Markus point out.
>>
>> Thanks,
>> Zhiwei
>>
>>>
>>> Best Regards,
>>> Swung0x48 (aka. Huang Shiyuan)
>>>
>>> Get Outlook for Android <https://aka.ms/AAb9ysg>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> *From:* Richard Henderson <richard.henderson@linaro.org>
>>> *Sent:* Wednesday, September 18, 2024 10:27:16 PM
>>> *To:* LIU Zhiwei <zhiwei_liu@linux.alibaba.com>; qemu-devel@nongnu.org <qemu-devel@nongnu.org>
>>> *Cc:* qemu-riscv@nongnu.org <qemu-riscv@nongnu.org>; palmer@dabbelt.com <palmer@dabbelt.com>; alistair.francis@wdc.com <alistair.francis@wdc.com>; dbarboza@ventanamicro.com <dbarboza@ventanamicro.com>; liwei1518@gmail.com <liwei1518@gmail.com>; bmeng.cn@gmail.com <bmeng.cn@gmail.com>; Swung0x48 <swung0x48@outlook.com>; TANG Tiancheng <tangtiancheng.ttc@alibaba-inc.com>
>>> *Subject:* Re: [PATCH v4 02/12] tcg/riscv: Add basic support for vector
>>> On 9/18/24 12:43, LIU Zhiwei wrote:
>>> >
>>> > On 2024/9/18 18:11, Richard Henderson wrote:
>>> >> On 9/18/24 07:17, LIU Zhiwei wrote:
>>> >>>
>>> >>> On 2024/9/12 2:41, Richard Henderson wrote:
>>> >>>> On 9/11/24 06:26, LIU Zhiwei wrote:
>>> >>>>> From: Swung0x48<swung0x48@outlook.com>
>>> >>>>>
>>> >>>>> The RISC-V vector instruction set utilizes the LMUL field to group
>>> >>>>> multiple registers, enabling variable-length vector registers. This
>>> >>>>> implementation uses only the first register number of each group while
>>> >>>>> reserving the other register numbers within the group.
>>> >>>>>
>>> >>>>> In TCG, each VEC_IR can have 3 types (TCG_TYPE_V64/128/256), and the
>>> >>>>> host runtime needs to adjust LMUL based on the type to use different
>>> >>>>> register groups.
>>> >>>>>
>>> >>>>> This presents challenges for TCG's register allocation. Currently, we
>>> >>>>> avoid modifying the register allocation part of TCG and only expose the
>>> >>>>> minimum number of vector registers.
>>> >>>>>
>>> >>>>> For example, when the host vlen is 64 bits and type is TCG_TYPE_V256, with
>>> >>>>> LMUL equal to 4, we use 4 vector registers as one register group. We can
>>> >>>>> use a maximum of 8 register groups, but the V0 register number is reserved
>>> >>>>> as a mask register, so we can effectively use at most 7 register groups.
>>> >>>>> Moreover, when type is smaller than TCG_TYPE_V256, only 7 registers are
>>> >>>>> forced to be used. This is because TCG cannot yet dynamically constrain
>>> >>>>> registers with type; likewise, when the host vlen is 128 bits and
>>> >>>>> TCG_TYPE_V256, we can use at most 15 registers.
>>> >>>>>
>>> >>>>> There is not much pressure on vector register allocation in TCG now, so
>>> >>>>> using 7 registers is feasible and will not have a major impact on code
>>> >>>>> generation.
>>> >>>>>
>>> >>>>> This patch:
>>> >>>>> 1. Reserves vector register 0 for use as a mask register.
>>> >>>>> 2. When using register groups, reserves the additional registers within
>>> >>>>> each group.
>>> >>>>>
>>> >>>>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>> >>>>> Co-authored-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>> >>>>
>>> >>>> If there is a co-author, there should be another Signed-off-by.
>>> >>>
>>> >>> This patch has added a tag:
>>> >>>
>>> >>> Signed-off-by: TANG Tiancheng<tangtiancheng.ttc@alibaba-inc.com>
>>> >>>
>>> >>>
>>> >>> Do you mean we should add the same tag twice?
>>> >>
>>> >> The from line is "Swung0x48 <swung0x48@outlook.com>".
>>> >> If this is an alternate email for TANG Tiancheng,
>>> >
>>> > No, Swung0x48 is another author.
>>>
>>> Then we need a proper Signed-off-by line from that author.
>>>
>>>
>>> r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops
2024-09-11 13:26 ` [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops LIU Zhiwei
2024-09-11 22:57 ` Richard Henderson
@ 2024-09-22 4:46 ` Richard Henderson
2024-09-23 4:46 ` LIU Zhiwei
1 sibling, 1 reply; 36+ messages in thread
From: Richard Henderson @ 2024-09-22 4:46 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/11/24 15:26, LIU Zhiwei wrote:
> @@ -2129,6 +2389,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>
> static void tcg_out_tb_start(TCGContext *s)
> {
> + s->riscv_cur_type = TCG_TYPE_COUNT;
> /* nothing to do */
> }
>
I recently realized that the vector config is call-clobbered.
We need this reset as well in tcg_out_call_int(), and prepare_host_addr().
In prepare_host_addr, place the reset just after the two calls to new_ldst_label().
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops
2024-09-22 4:46 ` Richard Henderson
@ 2024-09-23 4:46 ` LIU Zhiwei
2024-09-23 10:10 ` Richard Henderson
0 siblings, 1 reply; 36+ messages in thread
From: LIU Zhiwei @ 2024-09-23 4:46 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 2024/9/22 12:46, Richard Henderson wrote:
> On 9/11/24 15:26, LIU Zhiwei wrote:
>> @@ -2129,6 +2389,7 @@ static void tcg_target_qemu_prologue(TCGContext
>> *s)
>> static void tcg_out_tb_start(TCGContext *s)
>> {
>> + s->riscv_cur_type = TCG_TYPE_COUNT;
>> /* nothing to do */
>> }
>
> I recently realized that the vector config is call-clobbered.
> We need this reset as well in tcg_out_call_int(),
OK.
> and prepare_host_addr().
>
> In prepare_host_addr, place the reset just after the two calls to
> new_ldst_label().
As slow path will also cal tcg_out_call_init, can we only reset after
tcg_out_call_init?
Thanks,
Zhiwei
>
>
> r~
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops
2024-09-23 4:46 ` LIU Zhiwei
@ 2024-09-23 10:10 ` Richard Henderson
0 siblings, 0 replies; 36+ messages in thread
From: Richard Henderson @ 2024-09-23 10:10 UTC (permalink / raw)
To: LIU Zhiwei, qemu-devel
Cc: qemu-riscv, palmer, alistair.francis, dbarboza, liwei1518,
bmeng.cn, TANG Tiancheng
On 9/23/24 06:46, LIU Zhiwei wrote:
>
> On 2024/9/22 12:46, Richard Henderson wrote:
>> On 9/11/24 15:26, LIU Zhiwei wrote:
>>> @@ -2129,6 +2389,7 @@ static void tcg_target_qemu_prologue(TCGContext *s)
>>> static void tcg_out_tb_start(TCGContext *s)
>>> {
>>> + s->riscv_cur_type = TCG_TYPE_COUNT;
>>> /* nothing to do */
>>> }
>>
>> I recently realized that the vector config is call-clobbered.
>> We need this reset as well in tcg_out_call_int(),
> OK.
>> and prepare_host_addr().
>>
>> In prepare_host_addr, place the reset just after the two calls to new_ldst_label().
>
> As slow path will also cal tcg_out_call_init, can we only reset after tcg_out_call_init?
No, because all slow path code is emitted out-of-line at the end of the TB. When we begin
generating code for he next TCGOp, we will not yet have called tcg_out_call_init.
Therefore we must recognize this possibility when generating the branch to the slow path.
r~
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2024-09-23 10:11 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-11 13:26 [PATCH v4 00/12] tcg/riscv: Add support for vector LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 01/12] util: Add RISC-V vector extension probe in cpuinfo LIU Zhiwei
2024-09-11 18:34 ` Richard Henderson
2024-09-18 5:14 ` LIU Zhiwei
2024-09-18 10:14 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 02/12] tcg/riscv: Add basic support for vector LIU Zhiwei
2024-09-11 18:41 ` Richard Henderson
2024-09-18 5:17 ` LIU Zhiwei
2024-09-18 10:11 ` Richard Henderson
2024-09-18 10:43 ` LIU Zhiwei
2024-09-18 14:27 ` Richard Henderson
2024-09-20 4:01 ` 0x48 Swung
2024-09-20 4:27 ` LIU Zhiwei
2024-09-20 14:26 ` LIU Zhiwei
2024-09-21 15:56 ` 0x48 Swung
2024-09-21 17:17 ` Daniel Henrique Barboza
2024-09-20 11:26 ` Daniel Henrique Barboza
2024-09-20 11:37 ` Markus Armbruster
2024-09-11 13:26 ` [PATCH v4 03/12] tcg/riscv: Add vset{i}vli and ld/st vec ops LIU Zhiwei
2024-09-11 22:57 ` Richard Henderson
2024-09-22 4:46 ` Richard Henderson
2024-09-23 4:46 ` LIU Zhiwei
2024-09-23 10:10 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 04/12] tcg/riscv: Implement vector mov/dup{m/i} LIU Zhiwei
2024-09-11 23:07 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 05/12] tcg/riscv: Add support for basic vector opcodes LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 06/12] tcg/riscv: Implement vector cmp/cmpsel ops LIU Zhiwei
2024-09-11 23:14 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 07/12] tcg/riscv: Implement vector neg ops LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 08/12] tcg/riscv: Implement vector sat/mul ops LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 09/12] tcg/riscv: Implement vector min/max ops LIU Zhiwei
2024-09-11 13:26 ` [PATCH v4 10/12] tcg/riscv: Implement vector shi/s/v ops LIU Zhiwei
2024-09-11 23:15 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 11/12] tcg/riscv: Implement vector roti/v/x ops LIU Zhiwei
2024-09-11 23:24 ` Richard Henderson
2024-09-11 13:26 ` [PATCH v4 12/12] tcg/riscv: Enable native vector support for TCG host LIU Zhiwei
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).